# üß† <b>Product Category Classifier using Machine Learning</b>
### üìã <b>Project Overview</b>

This project trains a <b>machine learning model</b> that can automatically predict the <b>category of a product</b> based on its <b>title or description.</b>

We‚Äôll use a dataset of products, clean it up, turn the text into numbers with <b>TF-IDF</b>, train a <b>Logistic Regression</b> model, and finally allow the user to enter their own product names to get instant predictions!


### üß© Step 1: <b>Importing the Required Libraries</b>

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

import pickle as pkl

import pandas as pd

from sklearn.linear_model import LogisticRegression

#### üí° <b>Explanation</b>

In this step, we import all the tools we‚Äôll need:

1. <b>pandas</b> ‚Üí for working with tables and CSV files

2. <b>TfidfVectorizer</b> ‚Üí converts text into numeric form

3. <b>LogisticRegression</b> ‚Üí our machine learning model

4. <b>pickle</b> ‚Üí used to save and load our trained model later

üìù <i>Think of this step as setting up your toolbox before starting the project.</i>

### üßπ Step 2: <b>Loading and Cleaning the Data</b>

In [16]:
# Loading data
df = pd.read_csv('products.csv')

# Dropping missing values
df.dropna(axis=0, how='any', inplace=True)

# Standardizing data
df["Category_Label"] = df['Category_Label'].str.lower()

# Checking the unique amount of categories
print("Number of unique categories:", df["Category_Label"].nunique())

# Sampling the dataframe
print("\n",df.sample(5))

Number of unique categories: 13

        product_ID                                      Product_Title  \
16661       26429  bosch smi50c15gb semi integrated dishwasher si...   
7043        13120   television samsung ue65nu7172uxxh ue65nu7172uxxh   
5003        10989  grade a1 samsung qe55q8fn 55 4k ultra hd hdr q...   
21856       32220  belling fw1016 10kg 1600rpm freestanding washi...   
2972         2982              wiko view go dual sim 16gb anthracite   

       Merchant_ID    Category_Label Product_Code  Number_of_Views  \
16661          130       dishwashers   NH-1274-IJ            293.0   
7043            44               tvs   GY-6099-PV           2113.0   
5003             6               tvs   FV-0282-OA           3792.0   
21856            6  washing machines   DW-2760-AR            215.0   
2972             2     mobile phones   NR-0616-CK           4154.0   

       Merchant_Rating  Listing_Date    
16661              1.6       1/15/2024  
7043               1.6        

#### üí° <b>Explanation</b>

Here we:

1. <b>Load</b> our dataset from a CSV file.

2. <b>Remove any empty rows</b> (this helps keep the data clean).

3. <b>Make all category names lowercase</b> so similar labels (like ‚ÄúShoes‚Äù vs ‚Äúshoes‚Äù) are treated the same.

4. <b>Check how many categories</b> we have and print a few sample rows to understand what the data looks like.

üìù <i>Clean data means better learning ‚Äî just like a clean desk helps you work better.</i>

### ‚öôÔ∏è Step 3: <b>Turning Text into Numbers and Training the Model</b>

In [20]:
# Separating the data into products and categories
X = df["Product_Title"]
Y = df["Category_Label"]

# Vectorizing data
vectorizer = TfidfVectorizer()
X_tfidf = vectorizer.fit_transform(X)

# Initiating the model
model = LogisticRegression(max_iter=1000)


# Training the model
model.fit(X_tfidf, Y)

0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1.0
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,1000


#### üí° <b>Explanation</b>

1. X = product names (our input text)<br>Y = product categories (our labels)

2. <b>TF-IDF Vectorizer</b> turns each product title into a list of numbers representing how important each word is.

3. <b>Logistic Regression</b> is our classifier ‚Äî it learns the connection between words and product categories.

4. We <b>train</b> the model using the cleaned data.

üìù <i>At this point, the model has ‚Äúlearned‚Äù how to tell categories apart based on the words in product titles.</i>

### üíæ Step 4: <b>Saving and Loading the Model</b>

In [None]:
# Dumping the model into a pickle file to open in another script

with open("MLModel.pkl", "wb") as file:
    pkl.dump(model, file)

# Loading the model

with open("MLModel.pkl", "rb") as file:
    model = pkl.load(file)


#### üí° <b>Explanation</b>

We <b>save</b> the trained model so we can reuse it later without training again.
Then we <b>reload</b> it from the saved file to test that saving worked correctly.

üìù <i>Saving the model is like saving a video game ‚Äî you can continue where you left off without replaying everything.</i>

### üí¨ Step 5: <b>Testing the Model with User Input</b>

In [22]:
# Loop for user input and printing the model prediction

while True:
 
    user_input = input("Enter a product: ")
    
    user_tfidf = vectorizer.transform([user_input])

    prediction = model.predict(user_tfidf)[0]

    print(f"\nPredicted Category: {prediction}")
    
    if user_input.lower() == 'exit':
 
        print("Exiting category classifier.")
 
        break


Predicted Category: mobile phones

Predicted Category: mobile phones

Predicted Category: mobile phones

Predicted Category: mobile phones

Predicted Category: mobile phones

Predicted Category: mobile phones
Exiting category classifier.


#### üí° <b>Explanation</b>

Here‚Äôs where the fun begins! üéâ
We let the user <b>type a product name</b>, such as ‚Äúwireless headphones‚Äù, and the model instantly predicts a category like ‚Äú<b>electronics</b>‚Äù.

Type `"exit"` to stop the program.

üìù <i>This is how you can turn your machine learning model into a real interactive tool!</i>

### üß† <b>Summary</b>

‚úÖ Loaded and cleaned product data<br>
‚úÖ Turned text into numbers using <b>TF-IDF</b><br>
‚úÖ Trained a <b>Logistic Regression</b> model<br>
‚úÖ Saved the model for future use<br>
‚úÖ Built a simple interactive prediction tool