#### Text Classification On E-commerce DataSet Using FastText

In [16]:
#Reading and storing Data Into DataFrame
import pandas as pd
df = pd.read_csv("ecommerce_dataset.csv" , names=["category" , "description"] , header=None)
df.head(3)

Unnamed: 0,category,description
0,Household,Paper Plane Design Framed Wall Hanging Motivat...
1,Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ..."
2,Household,SAF 'UV Textured Modern Art Print Framed' Pain...


In [17]:
df.category.value_counts() #Imbalance Check

category
Household                 19313
Books                     11820
Electronics               10621
Clothing & Accessories     8671
Name: count, dtype: int64

In [18]:
#Drop NA Values:
df.dropna(inplace=True)
df.shape

(50424, 2)

In [19]:
#Replacing Space with Underscores:
df.category.replace("Clothing & Accessories","Clothing_Accessories" , inplace=True)
df.head()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df.category.replace("Clothing & Accessories","Clothing_Accessories" , inplace=True)


Unnamed: 0,category,description
0,Household,Paper Plane Design Framed Wall Hanging Motivat...
1,Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ..."
2,Household,SAF 'UV Textured Modern Art Print Framed' Pain...
3,Household,"SAF Flower Print Framed Painting (Synthetic, 1..."
4,Household,Incredible Gifts India Wooden Happy Birthday U...


In [20]:
df.category.value_counts()

category
Household               19313
Books                   11820
Electronics             10621
Clothing_Accessories     8670
Name: count, dtype: int64

In [23]:
#FastText Format => '__label__*Category* *Description*'
# df['category'] = "__label__" + df['category'].astype(str)
# df.head(3)

In [31]:
#Merging Category And Description To Make it in acceptable format as per FastText:
df['category_description'] = df['category'] + " " + df['description']
df.head(3)

Unnamed: 0,category,description,category_description
0,__label__Household,Paper Plane Design Framed Wall Hanging Motivat...,__label__Household Paper Plane Design Framed W...
1,__label__Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ...",__label__Household SAF 'Floral' Framed Paintin...
2,__label__Household,SAF 'UV Textured Modern Art Print Framed' Pain...,__label__Household SAF 'UV Textured Modern Art...


In [32]:
df['category_description'][0]

'__label__Household Paper Plane Design Framed Wall Hanging Motivational Office Decor Art Prints (8.7 X 8.7 inch) - Set of 4 Painting made up in synthetic frame with uv textured print which gives multi effects and attracts towards it. This is an special series of paintings which makes your wall very beautiful and gives a royal touch. This painting is ready to hang, you would be proud to possess this unique painting that is a niche apart. We use only the most modern and efficient printing technology on our prints, with only the and inks and precision epson, roland and hp printers. This innovative hd printing technique results in durable and spectacular looking prints of the highest that last a lifetime. We print solely with top-notch 100% inks, to achieve brilliant and true colours. Due to their high level of uv resistance, our prints retain their beautiful colours for many years. Add colour and style to your living space with this digitally printed painting. Some are for pleasure and so

In [28]:
import re
#Doing all sorts of Text Cleaning in One Function:
def preprocess(text):
    #Substitute the Punctuation Marks Or Special Characters with blank space:
    text = re.sub(r'[^\w\s\']' ,' ',text,flags=re.MULTILINE)
    #Removing Extra Spaces and \n and replacing with a single space:
    text = re.sub(r"[ \n]+" , " " , text , flags=re.MULTILINE)
    #Removing Leading and Trailing Spaces and Also Convert All Words to Lower Case:
    return text.strip().lower()

In [33]:
#Performing Text Cleaning On All Rows Of df['category_description']:
df['category_description'] = df['category_description'].map(preprocess)

In [34]:
df['category_description'][0]

'__label__household paper plane design framed wall hanging motivational office decor art prints 8 7 x 8 7 inch set of 4 painting made up in synthetic frame with uv textured print which gives multi effects and attracts towards it this is an special series of paintings which makes your wall very beautiful and gives a royal touch this painting is ready to hang you would be proud to possess this unique painting that is a niche apart we use only the most modern and efficient printing technology on our prints with only the and inks and precision epson roland and hp printers this innovative hd printing technique results in durable and spectacular looking prints of the highest that last a lifetime we print solely with top notch 100 inks to achieve brilliant and true colours due to their high level of uv resistance our prints retain their beautiful colours for many years add colour and style to your living space with this digitally printed painting some are for pleasure and some for eternal bli

In [35]:
#Split Dataset:
from sklearn.model_selection import train_test_split

train , test = train_test_split(
    df,
    test_size=0.2
)

In [36]:
train.shape , test.shape

((40339, 3), (10085, 3))

In [37]:
train.head(3)

Unnamed: 0,category,description,category_description
26361,__label__Books,A Textbook of Agricultural Statistics About th...,__label__books a textbook of agricultural stat...
37102,__label__Clothing_Accessories,Clovia Women's Plain Control Panty Taupe shape...,__label__clothing_accessories clovia women's p...
46768,__label__Electronics,CAM 360 HD Mini DVR Button Pinhole Spy Hidden ...,__label__electronics cam 360 hd mini dvr butto...


In [38]:
#Exporting/Generating A Text File From the Columns Of Category_Description:
train.to_csv("ecommerce.train" , columns=['category_description'] , header=False , index=False)
test.to_csv("ecommerce.test" , columns=['category_description'] , header=False , index=False)

In [39]:
#Train Fasttext Model:
import fasttext 

#The Supervised Model Is Used For Text Classification:
#Train the Model:
fsTxt_model = fasttext.train_supervised(input="ecommerce.train")

In [41]:
#Test the Model:
fsTxt_model.test("ecommerce.test") #(Size_Of_Test_Samples , Precision , Recall)

(10085, 0.9689638076351016, 0.9689638076351016)

In [42]:
#Predictions:
fsTxt_model.predict("vimal men's cotton crush short d12 anthra p punlicize the freeliving lifestyle wearing shorts by vimal made from cotton these shorts will keep you comfortable throughout featuring an attractive colour these shorts will surely lend you a smart look crafted from cotton")

(('__label__clothing_accessories',), array([1.00000715]))

In [43]:
fsTxt_model.predict("think and grow rich deluxe edition")

(('__label__books',), array([1.00000966]))