<a href="https://colab.research.google.com/github/Sbilalahmad/AI_ML_mentorship/blob/main/notebooks/NLP/NP_1_Amazone_recomendation_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### NP-1
# Amazone recomendation system

### Project Overview: Amazon Recommendation System (NLP-based)

This project aims to develop a recommendation system for Amazon using Natural Language Processing (NLP) techniques. The goal is to leverage customer reviews, product descriptions, and other textual data to provide personalized and relevant product recommendations.

#### Key Objectives:

*   **Data Acquisition & Preprocessing:** Collect and clean Amazon product and review data.
*   **Feature Engineering:** Extract meaningful features from text data using NLP methods (e.g., TF-IDF, Word Embeddings).
*   **Model Development:** Implement and evaluate various recommendation algorithms (e.g., collaborative filtering, content-based filtering, hybrid models).
*   **Performance Evaluation:** Assess the effectiveness of the recommendation system using appropriate metrics.

#### Technologies & Libraries:

*   **Programming Language:** Python
*   **Data Manipulation:** Pandas, NumPy
*   **NLP:** NLTK, spaCy, scikit-learn
*   **Machine Learning:** scikit-learn, TensorFlow/PyTorch (for deep learning models, if applicable)
*   **Visualization:** Matplotlib, Seaborn

Let's start by importing the necessary libraries!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity

1. **Data Acquisition & Preprocessing:** Collect and clean Amazon product and review data.

In [4]:
df=pd.read_csv('/content/amazon_product.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 668 entries, 0 to 667
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   id           668 non-null    int64 
 1   Title        668 non-null    object
 2   Description  668 non-null    object
 3   Category     668 non-null    object
dtypes: int64(1), object(3)
memory usage: 21.0+ KB


In [6]:
df.head()

Unnamed: 0,id,Title,Description,Category
0,1,Swissmar Capstore Select Storage Rack for 18-...,Swissmar's capstore select 18 storage unit kee...,Home & Kitchen Kitchen & Dining Kitchen Utens...
1,2,Gemini200 Delta CV-880 Gold Crown Livery Airc...,Welcome to the exciting world of GeminiJets! O...,Toys & Games Hobbies Models & Model Kits Pre-...
2,5,Superior Threads 10501-2172 Magnifico Cream P...,"For quilting and embroidery, this product is m...","Arts, Crafts & Sewing Sewing Thread & Floss S..."
3,6,Fashion Angels Color Rox Hair Chox Kit,Experiment with the haute trend of hair chalki...,Beauty & Personal Care Hair Care Hair Colorin...
4,8,Union Creative Giant Killing Figure 05: Daisu...,From Union Creative. Turn your display shelf i...,Toys & Games › Action Figures & Statues › Sta...


In [14]:
print("Number of Duplicated recodrs \n",df.duplicated().sum())
print("Number of null recodrs \n",df.isnull().sum())

Number of Duplicated recodrs 
 0
Number of null recodrs 
 id             0
Title          0
Description    0
Category       0
dtype: int64


In [10]:
df.drop('id',axis=1)

Unnamed: 0,Title,Description,Category
0,Swissmar Capstore Select Storage Rack for 18-...,Swissmar's capstore select 18 storage unit kee...,Home & Kitchen Kitchen & Dining Kitchen Utens...
1,Gemini200 Delta CV-880 Gold Crown Livery Airc...,Welcome to the exciting world of GeminiJets! O...,Toys & Games Hobbies Models & Model Kits Pre-...
2,Superior Threads 10501-2172 Magnifico Cream P...,"For quilting and embroidery, this product is m...","Arts, Crafts & Sewing Sewing Thread & Floss S..."
3,Fashion Angels Color Rox Hair Chox Kit,Experiment with the haute trend of hair chalki...,Beauty & Personal Care Hair Care Hair Colorin...
4,Union Creative Giant Killing Figure 05: Daisu...,From Union Creative. Turn your display shelf i...,Toys & Games › Action Figures & Statues › Sta...
...,...,...,...
663,Rosemery (Rosemary) - Box of Six 20 Stick Hex...,"Six tubes, each containing 20 sticks of incens...",Home & Kitchen Home Décor Home Fragrance Ince...
664,"InterDesign Linus Stacking Organizer Bin, Ext...",The InterDesign Linus Organizer Bins are stack...,Home & Kitchen Kitchen & Dining Storage & Org...
665,Gourmet Rubber Stamps Diagonal Stripes Stenci...,Gourmet Rubber Stamps-Stencil. This delicious ...,Toys & Games Arts & Crafts Printing & Stamping
666,Spenco RX Arch Cushion Full Length Comfort Su...,"Soft, durable arch support. consumers with gen...",Health & Household › Health Care › Foot Healt...


In [11]:
df.duplicated().sum()

np.int64(0)

2.   **Feature Engineering:** Extract meaningful features from text data using NLP methods (e.g., TF-IDF, Word Embeddings).


In [42]:
import nltk
from nltk.stem.snowball import SnowballStemmer
nltk.download('punkt','punkt_tab')

[nltk_data] Downloading package punkt to punkt_tab...
[nltk_data]   Package punkt is already up-to-date!


True

In [45]:
stemmer = SnowballStemmer("english")
def tokenizer_and_stemmer(text):
  tokens=nltk.word_tokenize(text)
  stemmed_tokens=[stemmer.stem(token) for token in tokens]
  return ' '.join(stemmed_tokens)

In [50]:
df['Prcessed_data']=df.apply(lambda row: tokenizer_and_stemmer(row['Title']+" "+row['Description']), axis=1)

In [51]:
df.head()

Unnamed: 0,id,Title,Description,Category,Prcessed_data
0,1,Swissmar Capstore Select Storage Rack for 18-...,Swissmar's capstore select 18 storage unit kee...,Home & Kitchen Kitchen & Dining Kitchen Utens...,swissmar capstor select storag rack for 18-pac...
1,2,Gemini200 Delta CV-880 Gold Crown Livery Airc...,Welcome to the exciting world of GeminiJets! O...,Toys & Games Hobbies Models & Model Kits Pre-...,gemini200 delta cv-880 gold crown liveri aircr...
2,5,Superior Threads 10501-2172 Magnifico Cream P...,"For quilting and embroidery, this product is m...","Arts, Crafts & Sewing Sewing Thread & Floss S...",superior thread 10501-2172 magnifico cream puf...
3,6,Fashion Angels Color Rox Hair Chox Kit,Experiment with the haute trend of hair chalki...,Beauty & Personal Care Hair Care Hair Colorin...,fashion angel color rox hair chox kit experi w...
4,8,Union Creative Giant Killing Figure 05: Daisu...,From Union Creative. Turn your display shelf i...,Toys & Games › Action Figures & Statues › Sta...,union creativ giant kill figur 05 : daisuk tsu...


In [52]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [54]:
tfidf=TfidfVectorizer(stop_words='english')
def cosine_similer(first,second):
  matrix=tfidf.fit_transform([first,second])
  return cosine_similarity(matrix)[0][1]

In [67]:
def recommendations(query):
  query=tokenizer_and_stemmer(query)
  df['similarity']=df.apply(lambda row: cosine_similer(row['Prcessed_data'],query), axis=1)
  res=df.sort_values('similarity',ascending=False).head(10)
  return res