
# 🧾 Barcode Recognition and Product Interpretation  
**Course Project – DA623 (Winter 2025)**  
*Author: Diya Arun (210102029)*

---

This notebook provides a blog-style walkthrough of my end-to-end project on barcode recognition and product interpretation, using the [Open Food Facts](https://world.openfoodfacts.org/data) dataset.

We'll cover the following:

1. Dataset Overview  
2. Barcode Generation  
3. Barcode Detection & Lookup  
4. Interactive Upload and Decode  
5. Semantic Search using Sentence Transformers  
6. Web App via FastAPI  

*Note: Code in this blog are small snippets from the actual implementation and therefore are not executable. In order to view the results, please refer to the folder 'barcode_search_app'.*

The system is built with `pyzbar`, `sentence-transformers`, and `FastAPI`.


## 📊 1. Dataset Overview

We use the `en.openfoodfacts.org.products.tsv` dataset from Open Food Facts, which contains millions of food products from around the world.

We retain only relevant columns such as product name, nutritional values, and ingredients for the first 10,000 rows. The dataset set is shortened in order to aid feasible implementation.


In [None]:
# Loading dataset
df = pd.read_csv("/kaggle/input/world-food-facts/en.openfoodfacts.org.products.tsv", 
                 sep='\t', low_memory=False)

# Keeping essential columns and shortening dataset
df = df[['code', 'product_name', 'brands', 'ingredients_text', 
         'carbohydrates_100g', 'fat_100g', 'fiber_100g', 
         'proteins_100g', 'salt_100g', 'sugars_100g']].dropna(subset=['code', 'product_name'])
df = df[df['code'].apply(lambda x: str(x).isdigit())].head(10_000)


## 🧾 2. Barcode Generation

We use the `python-barcode` library to generate EAN-13 barcode images from valid product codes. This function `generate_barcodes_from_codes` takes a DataFrame containing product codes and creates barcode images for those that are valid 13-digit EAN-13 codes. It saves each barcode as an image file in a specified output directory. After generating all the barcodes, it compresses them into a single ZIP file for easy download. It skips any codes that are non-numeric or not exactly 13 digits, ensuring only valid barcodes are processed.


In [None]:
def generate_barcodes_from_codes(df, output_dir="barcodes", zip_filename="barcodes.zip"):
    # Create the output directory if it doesn't exist
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    for idx, row in df.iterrows():
        product_code = str(row['code']).strip()
        
        # Only proceeding with numeric codes
        if not product_code.isdigit():
            #print(f"Skipping invalid code (non-numeric): {product_code}")
            continue
        
        # Ensuring the code is exactly 12 digits (required for EAN-13)
        if len(product_code) != 13:
            continue
        
        try:
            # Creating barcode
            barcode_class = barcode.get_barcode_class('ean13')
            barcode_img = barcode_class(product_code, writer=ImageWriter())
            
            # Saving barcode image
            output_path = os.path.join(output_dir, f"{product_code}")
            barcode_img.save(output_path)
            #print(f"Barcode created for: {product_code}")
        except Exception as e:
            print(f"Error for code {product_code}: {e}")

    # Step 6: Zipping the generated barcode images into a file for download
    with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for foldername, subfolders, filenames in os.walk(output_dir):
            for filename in filenames:
                file_path = os.path.join(foldername, filename)
                zipf.write(file_path, os.path.relpath(file_path, output_dir))
    
    print(f"✅ All barcodes have been zipped into {zip_filename}.")

#  Run barcode generation and zipping
generate_barcodes_from_codes(df)


## 🔍 3. Barcode Detection & Lookup

We use `OpenCV` and `pyzbar` to detect and decode barcodes from images. Then, we use the decoded value to look up product info from our dataset. This function `detect_and_lookup_barcode` processes an uploaded image to detect and read a barcode. It first converts the image bytes into an OpenCV image format, then converts it to grayscale for better barcode detection. Using the `pyzbar` library, it attempts to decode any barcode present. If found, it extracts the barcode number, highlights it on the image, and displays the result. Finally, it calls the `lookup_barcode` function to display the corresponding product information from the dataset.


In [None]:
def detect_and_lookup_barcode(image_data):
    image_array = np.frombuffer(image_data, np.uint8)
    image = cv2.imdecode(image_array, cv2.IMREAD_COLOR)

    if image is None:
        print("❌ Failed to decode image.")
        return

    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    barcodes = decode(gray)

    if not barcodes:
        print("🚫 No barcode detected.")
        return

    barcode_data = barcodes[0].data.decode("utf-8")
    print(f"📷 Detected Barcode: {barcode_data}")
    
    (x, y, w, h) = barcodes[0].rect
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
    cv2.putText(image, f"{barcode_data}", (x, y),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    plt.imshow(image_rgb)
    plt.axis('off')
    plt.show()

    lookup_barcode(barcode_data)


## 🖼️ 4. Interactive Barcode Upload (Jupyter Widgets)

Upload a barcode image using widgets and detect it live within the notebook using `ipywidgets`. This code sets up a simple interactive user interface using Jupyter widgets to allow users to upload a PNG image containing a barcode. When the user clicks the **"Decode & Lookup"** button, it triggers a function that reads the uploaded image, processes it to detect a barcode using `detect_and_lookup_barcode`, and then displays the result (including product info). The interface includes a file upload field, a button to start processing, and an output area to show results or error messages.


In [None]:
# Widgets
upload_widget = FileUpload(accept='.png', multiple=False)
process_button = Button(description="Decode & Lookup")
output = Output()

def on_button_click(b):
    with output:
        output.clear_output()
        if not upload_widget.value:
            print("❌ Please upload a PNG image.")
            return
            
        uploaded_file = upload_widget.value[0] 
        image_data = uploaded_file['content']  
        
        print(f"📤 Processing: {uploaded_file['name']}")
        detect_and_lookup_barcode(image_data)

# Attaching event
process_button.on_click(on_button_click)

# Displaying the UI
display(VBox([upload_widget, process_button, output]))


## 🧠 5. Semantic Search with Sentence Transformers

We use `sentence-transformers` to encode product descriptions and search semantically using cosine similarity. This code uses a pre-trained sentence transformer model (`all-MiniLM-L6-v2`) to generate semantic embeddings for product descriptions, enabling intelligent text-based search. First, it encodes the combined product and ingredient text (`full_text`) into vector representations (`product_embeddings`). Then, the `search_products` function takes a user query, converts it into a vector, and compares it with the product embeddings using cosine similarity to find the most relevant matches. It returns a formatted list of the top matching products along with their ingredients.


In [None]:
model = SentenceTransformer('all-MiniLM-L6-v2')
product_embeddings = model.encode(df['full_text'].tolist(), convert_to_tensor=True)

def search_products(query, top_k=5):
    query_embedding = model.encode(query, convert_to_tensor=True)
    hits = util.semantic_search(query_embedding, product_embeddings, top_k=top_k)[0]

    results = []
    for hit in hits:
        idx = hit['corpus_id']
        product = df.iloc[idx]
        code = product['code']
        name = product['product_name']
        ingredients = product['ingredients_text']
        results.append(f"- **{code}** – {name}\n  - *Ingredients*: {ingredients}")
    return results


## 🔍 6. Interactive Search (Jupyter Widgets)

We have implemented dynamic search using `ipywidgets`. Now, users can query the system during run time. This code creates an interactive user interface using Jupyter widgets that allows users to enter a natural language search query to find products. It includes a text box (`query_box`) for input, a "Search" button, and an output area. When the button is clicked, it triggers the `on_search_clicked` function, which processes the input, performs semantic search using the earlier defined `search_products` function, and displays the top matching products in a readable Markdown format. This setup makes it easy to explore the dataset through intuitive text queries.

In [None]:
query_box = widgets.Text(
    description='Query:',
    placeholder='e.g. low sugar cereal',
    layout=widgets.Layout(width='80%')
)

search_button = widgets.Button(description="Search")
output = widgets.Output()

def on_search_clicked(b):
    with output:
        output.clear_output()
        query = query_box.value.strip()
        if not query:
            print("❌ Please enter a query.")
            return
        
        results = search_products(query)
        display(Markdown(f"### 🔎 Top matches for query: *{query}*"))
        for r in results:
            display(Markdown(r))

search_button.on_click(on_search_clicked)

display(widgets.VBox([query_box, search_button, output]))


## 🌐 7. FastAPI Web App

To make the app deployable, we created a FastAPI server with:

- `/upload` for barcode image decoding
- `/search` for semantic product lookup

Frontend was written in HTML and JavaScript, allowing users to interact via a single-page web UI.

👉 [See GitHub repo for FastAPI files and frontend code.](https://github.com/DiyaArun/DA623_Project)


In [None]:
app = FastAPI()
app.mount("/static", StaticFiles(directory="app/static"), name="static")
templates = Jinja2Templates(directory="templates")

@app.get("/", response_class=HTMLResponse)
async def home(request: Request):
    return templates.TemplateResponse("index.html", {"request": request})

@app.post("/upload", response_class=HTMLResponse)
async def upload(request: Request, file: UploadFile):
    contents = await file.read()
    code = detect_barcode(contents)
    product = lookup_barcode(code) if code else None
    return templates.TemplateResponse("result.html", {"request": request, "product": product, "code": code})

@app.post("/search", response_class=HTMLResponse)
async def search(request: Request, query: str = Form(...)):
    results = search_products(df, query)
    return templates.TemplateResponse("result.html", {"request": request, "results": results, "query": query})


## ✅ Conclusion

This project demonstrates how barcodes can be used as powerful keys to query product information using computer vision and semantic search. Using OpenCV, Pyzbar, and Sentence Transformers, we built a multi-modal system that can decode, interpret, and search food products.

## 😊 Thank you for reading!