# Products Enrichment using Python
## NoSQL Final Project
Walid El Otmani - Graph Database for a Brazilian E-Commerce Store

Since we don't have products names, the goal of this notebook is to create fictive products for a subset of the whole products categories. Only a subset will be used to make the program simple and easy to understand, but this logic can be applied for the whole dataset. 
<p>The products will be then randomly mapped to the product_ids from the dataset, and be used in the recommendation system.</p>
<p>The selected products will be used as the products in the store.</p>

In [None]:

import hashlib # hashlib module for hashing functions
import random # random module for generating random numbers

# Define product templates for different categories

PRODUCT_TEMPLATES = {
    "beleza_saude": [
        {"name": "Aloe Vera Skincare Set", "price": 29.99},
        {"name": "Vitamin C Serum", "price": 19.99},
        {"name": "Herbal Hair Growth Oil", "price": 15.49},
        {"name": "Natural Lip Balm", "price": 4.99},
        {"name": "Organic Face Mask", "price": 12.99},
        {"name": "Essential Oils Collection", "price": 25.00},
    ],

    "esporte_lazer": [
        {"name": "Dumbell Set", "price": 49.99},
        {"name": "Treadmill", "price": 599.99},
        {"name": "Yoga Mat", "price": 19.99},
        {"name": "Basketball", "price": 29.99},
        {"name": "Boxing Gloves", "price": 39.99},
        {"name": "Resistance Band", "price": 9.99},
    ],

    "informatica_acessorios": [
        {"name": "Wireless Mouse", "price": 24.99},
        {"name": "Mechanical Keyboard", "price": 89.99},
        {"name": "USB-C Hub", "price": 34.99},
        {"name": "Laptop Stand", "price": 29.99},
        {"name": "External Hard Drive", "price": 79.99},
        {"name": "Webcam", "price": 49.99},
    ],

    "pet_shop": [
        {"name": "Dog Chew Toy", "price": 9.99},
        {"name": "Cat Scratching Post", "price": 29.99},
        {"name": "Pet Grooming Kit", "price": 19.99},
        {"name": "Aquarium Decor Set", "price": 24.99},
        {"name": "Bird Cage Accessories", "price": 14.99},
        {"name": "Pet Bed", "price": 39.99},
    ],

}

In [31]:
# Function to assign product template based on product ID and category
def assign_product_template(product_id, category):
    if category not in PRODUCT_TEMPLATES:
        return {"name": f"Product {product_id[:6]}", "price": 0.0}
    
    templates = PRODUCT_TEMPLATES[category]

    # stable index based on hash
    h = int(hashlib.sha256(product_id.encode()).hexdigest(), 16)
    idx = h % len(templates)

    return templates[idx]  # returns dict with name + price

In [32]:
# Testing the function
tpl = assign_product_template("abc123", "informatica_acessorios")
print(tpl, type(tpl))

{'name': 'USB-C Hub', 'price': 34.99} <class 'dict'>


In [33]:
# Save the values in Neo4j
from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))
# Helper function to run a query and return results
def run_query(query, params=None):
    with driver.session() as session:
        return [record.data() for record in session.run(query, params or {})]

# 1. Load all demo products
products = run_query("""
    MATCH (p:Product)
    WHERE p.in_demo = true
    RETURN p.product_id AS id, p.product_category_name AS category
""")

print("Number of demo products:", len(products))

# 2. Update each product with a display name + price
with driver.session() as session:
    for record in products:
        pid = record["id"]
        cat = record["category"]

        tpl = assign_product_template(pid, cat)  # returns {"name": ..., "price": ...}

        session.run("""
            MATCH (p:Product {product_id: $id})
            SET p.display_name = $name,
                p.demo_price = $price
        """, {"id": pid, "name": tpl["name"], "price": tpl["price"]})



Number of demo products: 7669


In [37]:
# Verifying the updates
with driver.session() as session:
  result = session.run("""
    MATCH (p:Product )
    WHERE p.in_demo=true
    RETURN p.product_id AS id, p.display_name AS name, p.demo_price AS price
    """
    )
  for record in result:
    print(record)

<Record id='96bd76ec8810374ed1b65e291975717f' name='Basketball' price=29.99>
<Record id='3bb7f144022e6732727d8d838a7b13b3' name='Yoga Mat' price=19.99>
<Record id='a1b71017a84f92fd8da4aeefba108a24' name='Webcam' price=49.99>
<Record id='e3e020af31d4d89d2602272b315c3f6e' name='Herbal Hair Growth Oil' price=15.49>
<Record id='c78b767da00efb70c1bcccab87c28cd5' name='USB-C Hub' price=34.99>
<Record id='a0253d43394dd4da9a5d7b1f546f1a32' name='External Hard Drive' price=79.99>
<Record id='051b9ff13dd55c0a6655a15ff296f80d' name='Basketball' price=29.99>
<Record id='ce5b91848b91118daffb3af53b747475' name='Basketball' price=29.99>
<Record id='c5d8079278e912d7e3b6beb48ecb56e8' name='Natural Lip Balm' price=4.99>
<Record id='5eaa343860dc445b3fd43d1b682809fd' name='Basketball' price=29.99>
<Record id='f8f6fd145cc00519283cf3100477b2b3' name='Resistance Band' price=9.99>
<Record id='36555a2f528d7b2a255c504191445d39' name='Aloe Vera Skincare Set' price=29.99>
<Record id='bfc1d1c62b1f7f401d8d9dda962bb

Demo products have been successfully created and loaded in the database, and are ready to be used for the recommendation system and the website.