#Project Name-E-Commerce Product Categorization

##Project Overview
The objective of this project is to develop a machine learning model that categorizes product descriptions into predefined categories. This can help streamline inventory management, improve search functionality, and enhance the overall shopping experience on e-commerce platforms.

##Notebook Contents
###Data Loading and Exploration


*   Load the dataset.
*   Explore the dataset to understand its structure and contents


###Data Preprocessing

*    Clean and preprocess the text data for modeling.
* Map product categories to numerical labels.
###Train-Test Split
* Split the dataset into training  and testing sets to evaluate the
*  model's performance.

###Feature Extraction

* Use TF-IDF (Term Frequency-Inverse Document Frequency) to convert
* text data into numerical features.
###Model Training and Evaluation

* Train a Random Forest classifier using the training data.
* Evaluate the model's performance on the test data.
* Print classification metrics such as precision, recall, and F1-score.

## 1.  Data Loading and Exploration




In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('Ecommerce_data.csv')

In [16]:
df.head()

Unnamed: 0,Text,label,label_num,preprocessed_text
0,Urban Ladder Eisner Low Back Study-Office Comp...,Household,0,Urban Ladder Eisner low Study Office Computer ...
1,"Contrast living Wooden Decorative Box,Painted ...",Household,0,contrast live Wooden Decorative Box Painted Bo...
2,IO Crest SY-PCI40010 PCI RAID Host Controller ...,Electronics,1,IO Crest SY PCI40010 PCI raid Host Controller ...
3,ISAKAA Baby Socks from Just Born to 8 Years- P...,Clothing & Accessories,2,ISAKAA Baby Socks bear 8 Years- Pack 4 6 8 12 ...
4,Indira Designer Women's Art Mysore Silk Saree ...,Clothing & Accessories,2,Indira Designer Women Art Mysore Silk Saree Bl...


## 2. Data Preprocessing

In [4]:
df.label.value_counts()

label
Household                 6000
Electronics               6000
Clothing & Accessories    6000
Books                     6000
Name: count, dtype: int64

In [5]:
df['label_num'] = df.label.map({
    'Household' : 0,
    'Electronics' : 1,
    'Clothing & Accessories' : 2,
    'Books' : 3
})
df.head()

Unnamed: 0,Text,label,label_num
0,Urban Ladder Eisner Low Back Study-Office Comp...,Household,0
1,"Contrast living Wooden Decorative Box,Painted ...",Household,0
2,IO Crest SY-PCI40010 PCI RAID Host Controller ...,Electronics,1
3,ISAKAA Baby Socks from Just Born to 8 Years- P...,Clothing & Accessories,2
4,Indira Designer Women's Art Mysore Silk Saree ...,Clothing & Accessories,2


In [6]:
import spacy
nlp = spacy.load('en_core_web_sm')

def preprocess(text):
    doc = nlp(text)
    filtered_tokens = []
    for token in doc:
        if token.is_stop or token.is_punct:
            continue
        filtered_tokens.append(token.lemma_)

    return " ".join(filtered_tokens)

In [7]:
df['preprocessed_text'] = df.Text.apply(preprocess)

In [8]:
df.head()

Unnamed: 0,Text,label,label_num,preprocessed_text
0,Urban Ladder Eisner Low Back Study-Office Comp...,Household,0,Urban Ladder Eisner low Study Office Computer ...
1,"Contrast living Wooden Decorative Box,Painted ...",Household,0,contrast live Wooden Decorative Box Painted Bo...
2,IO Crest SY-PCI40010 PCI RAID Host Controller ...,Electronics,1,IO Crest SY PCI40010 PCI raid Host Controller ...
3,ISAKAA Baby Socks from Just Born to 8 Years- P...,Clothing & Accessories,2,ISAKAA Baby Socks bear 8 Years- Pack 4 6 8 12 ...
4,Indira Designer Women's Art Mysore Silk Saree ...,Clothing & Accessories,2,Indira Designer Women Art Mysore Silk Saree Bl...


In [9]:
df.Text[0]

'Urban Ladder Eisner Low Back Study-Office Computer Chair(Black) A study in simple. The Eisner study chair has a firm foam cushion, which makes long hours at your desk comfortable. The flexible meshed back is designed for air-circulation and support when you lean back. The curved arms provide ergonomic forearm support. Adjust the height using the gas lift to find that comfortable position and the nylon castors make it easy to move around your space. Chrome legs refer to the images for dimension details any assembly required will be done by the UL team at the time of delivery indoor use only.'

In [10]:
df.preprocessed_text[0]

'Urban Ladder Eisner low Study Office Computer Chair(Black study simple Eisner study chair firm foam cushion make long hour desk comfortable flexible mesh design air circulation support lean curved arm provide ergonomic forearm support adjust height gas lift find comfortable position nylon castor easy space chrome leg refer image dimension detail assembly require UL team time delivery indoor use'

## 3. Train-Test Split

In [11]:
from sklearn.model_selection import train_test_split

x_tarin,x_test,y_train,y_test = train_test_split(df.preprocessed_text,df.label_num,test_size=0.2,stratify=df.label_num)

## 4. Model Training and Evaluatio

In [12]:
from sklearn.feature_extraction.text import TfidfVectorizer

TF = TfidfVectorizer()

In [15]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report

clf = Pipeline([
    ('vectorizer_tfidf',TF),
    ('Random Forest', RandomForestClassifier())
])

clf.fit(x_tarin,y_train)
y_pred = clf.predict(x_test)
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.96      0.97      0.96      1200
           1       0.98      0.97      0.98      1200
           2       0.98      0.99      0.98      1200
           3       0.98      0.97      0.97      1200

    accuracy                           0.97      4800
   macro avg       0.97      0.97      0.97      4800
weighted avg       0.97      0.97      0.97      4800

