# Advanced ML, Recomendation project : 

### Description of the General topic:  
 
A recommendation system is a class of machine learning tools designed to suggest 
relevant items to users based on their preferences, behaviors, and other users’ 
activities. They are widely used across e-commerce, streaming platforms, social 
media, and online advertising, aiming to enhance user experience by delivering 
personalized content or product suggestions.

### Flow of the Code Project (All in Python)  :  
 
**Data Preprocessing:** 
- Preprocess textual data using tokenization, stemming, and vectorization 
techniques like TF-IDF or word embeddings. 
 
**Building Collaborative Filtering Models:** 
- Implement SVD for matrix factorization. 
- Build user-based and item-based collaborative filtering models using 
libraries like Scikit-Learn. 
 
**Building Content-Based Filtering Models:**
- Create item profiles using metadata. 
- Use cosine similarity or neural embeddings to identify similar items. 
 
**Combine with Hybrid Techniques:** 
- Experiment with hybrid models (weighted hybrid, feature-augmented 
collaborative filtering, etc.) to combine collaborative and content-based 
methods. 
- Train deep hybrid models if using neural networks, concatenating 
collaborative and content-based embeddings as input. 

**Visualization:**  
- Récapitulatifs des des résultats et visualisation par des graphiques dans 
la mesure du possible 

In [2]:
# Importation des données : 

import json

file = "/Users/aminerazig/Desktop/ENSAE 3A/ADVANCED ML/Advanced ML-project/DATA/Health_and_Personal_Care.jsonl"



{'rating': 4.0, 'title': '12 mg is 12 on the periodic table people! Mg for magnesium', 'text': 'This review is more to clarify someone else’s review bc they didn’t understand understand the labeling!  It shows 1000mg as advertised & another little label says 12mg bc 12 is on the periodic table for magnesium!  I realize not everyone takes chemistry, but 4 ppl liked his review & so misinformation is spreading.  This works. If however you are on opiate level medications that are causing constipation you should talk to your pain dr or your gastrointestinal dr & ask for a medication called Linzess which works must better & must faster, but is unnecessary for most people.  If magnesium is working for you just make sure to take it with food & drink 6-8 glasses of water per day.  Staying hydrated will really help.  Before switching to Linzess I used to take one 1,000 mg pill am & pm every day with meals & always with an 8 ounce glass of water or other liquid.', 'images': [], 'asin': 'B07TDSJZM

In [6]:
# Import of users data : 

import json
file = "/Users/aminerazig/Desktop/ENSAE 3A/ADVANCED ML/Advanced ML-project/DATA/Health_and_Personal_Care.jsonl"

with open(file, 'r') as file:
    data = [json.loads(line) for line in file]

# first 1000 products
products_1000_usersdata = data[:1000]

{'rating': 4.0,
 'title': '12 mg is 12 on the periodic table people! Mg for magnesium',
 'text': 'This review is more to clarify someone else’s review bc they didn’t understand understand the labeling!  It shows 1000mg as advertised & another little label says 12mg bc 12 is on the periodic table for magnesium!  I realize not everyone takes chemistry, but 4 ppl liked his review & so misinformation is spreading.  This works. If however you are on opiate level medications that are causing constipation you should talk to your pain dr or your gastrointestinal dr & ask for a medication called Linzess which works must better & must faster, but is unnecessary for most people.  If magnesium is working for you just make sure to take it with food & drink 6-8 glasses of water per day.  Staying hydrated will really help.  Before switching to Linzess I used to take one 1,000 mg pill am & pm every day with meals & always with an 8 ounce glass of water or other liquid.',
 'images': [],
 'asin': 'B07TD

In [18]:
# import of products metadata : 

products_1000_metadata = []
file_metadata = "/Users/aminerazig/Desktop/ENSAE 3A/ADVANCED ML/Advanced ML-project/DATA/meta_Health_and_Personal_Care.jsonl"

with open(file_metadata, 'r') as file:
    for i, line in enumerate(file):
        if i >= 1000:  
            break
        products_1000_metadata.append(json.loads(line))

In [22]:
### AFFICHAGE DE QUELQUES IMAGES  : 
import json
import random
import requests
from PIL import Image
from io import BytesIO



def get_random_products_with_images(products, num_products=90):
    products_with_images = [p for p in products if p.get('images') and len(p['images']) > 0]
    return random.sample(products_with_images, min(num_products, len(products_with_images)))


def fetch_and_resize_image(url, size=(30, 30)):
    try:
        response = requests.get(url)
        response.raise_for_status()
        img = Image.open(BytesIO(response.content))
        return img.resize(size)
    except Exception as e:
        print(f"Erreur lors du téléchargement de l'image : {e}")
        return None

# mosaïque
def create_mosaic(images, grid_size=(10, 9), image_size=(30, 30)):
    mosaic = Image.new('RGB', (grid_size[0] * image_size[0], grid_size[1] * image_size[1]))
    for idx, img in enumerate(images):
        if img:
            x = (idx % grid_size[0]) * image_size[0]
            y = (idx // grid_size[0]) * image_size[1]
            mosaic.paste(img, (x, y))
    mosaic.show()
    return mosaic


selected_products = get_random_products_with_images(products_1000_metadata)
image_urls = [p['images'][0]['large'] for p in selected_products]

images = [fetch_and_resize_image(url) for url in image_urls]
mosaic = create_mosaic(images)

# Data Fields

## For User Reviews

| Field              | Type   | Explanation                                                                                                                                                     |
|--------------------|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `rating`           | float  | Rating of the product (from 1.0 to 5.0).                                                                                                                        |
| `title`            | str    | Title of the user review.                                                                                                                                       |
| `text`             | str    | Text body of the user review.                                                                                                                                   |
| `images`           | list   | Images that users post after they have received the product. Each image has different sizes (small, medium, large), represented by `small_image_url`, `medium_image_url`, and `large_image_url`. |
| `asin`             | str    | ID of the product.                                                                                                                                              |
| `parent_asin`      | str    | Parent ID of the product. Note: Products with different colors, styles, sizes usually belong to the same parent ID. The “asin” in previous Amazon datasets is actually the parent ID. Please use parent ID to find product meta. |
| `user_id`          | str    | ID of the reviewer.                                                                                                                                             |
| `timestamp`        | int    | Time of the review (unix time).                                                                                                                                 |
| `verified_purchase`| bool   | User purchase verification.                                                                                                                                     |
| `helpful_vote`     | int    | Helpful votes of the review.                                                                                                                                    |


In [17]:
import EDA_functions
image_url = "https://images-na.ssl-images-amazon.com/images/I/71DFEoJ+Z9L._SL256_.jpg"
EDA_functions.show_image(image_url)

# I - Collaborative filtering 

### a) User-Based Collaborative Filtering:

In [1]:
import pandas as pd

df_recommendation = pd.DataFrame(columns=["user_id", "product_id", "rating"])

rows = []

for dict in df:
    id = dict["asin"]
    user = dict["user_id"]
    rating = dict["rating"]
    rows.append({"user_id": user, "product_id": id, "rating": rating})
    df_recommendation = pd.concat([df_recommendation, pd.DataFrame(rows)], ignore_index=True)

print(df_recommendation)


NameError: name 'df' is not defined

### b) Item-Based Collaborative Filtering:

# II- Content based filtering 

# III - Hybrid Recommender Systems 