## PHASE 2: FEATURE CLASSIFICATION
  
This phase classifies potential engineered features into:
1. Numeric
2. Categorical
3. Text/NLP


We will classify features into three core buckets:

| Type | Column / Derived Feature | Why Useful |
|----------|----------|----------|
| Numeric  | discounted_price, actual_price, discount_percentage, Discount Ratio = discounted_price / actual_price  | Numeric fields are core for ranking, regression models, trend tracking, and pricing optimization.  |
| Categorical  | Category grouping, Price Tier ("Low", "Mid", "High" based on quantiles), Discount Tier ("Small", "Medium", "Heavy")  | Enables aggregation, segmentation, and strategic targeting. Great for dashboards and category-level forecasting.  |
| Text/NLP  |  about_product, review_title, review_content, Title Length, Description Word Count, Sentiment Score (from review_content) & Keyword Flags   | Extracts qualitative insights from customer reviews & descriptions  |
 
  

In [1]:
import pandas as pd
import numpy as np
import re

In [36]:
df_2 = pd.read_csv("/workspaces/Amazon-Sales-data-analysis/notebooks/cleaned_data.csv")
df_2.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link,discount_percentage_num
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,399.0,1099.0,64%,4.2,24269.0,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...,64
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,199.0,349.0,43%,4.0,43994.0,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...,43
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,199.0,1899.0,90%,3.9,7928.0,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...,90
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,329.0,699.0,53%,4.2,94363.0,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...,53
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,154.0,399.0,61%,4.2,16905.0,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...,61


In [37]:
# converted the category column to a categorical datatype
df_2["category"] = df_2["category"].astype("category")
print(df_2["category"].dtype)

category


In [38]:
# converted respective columns to string datatype
string_cols = [ "product_name", "about_product", "review_title", "review_content", "user_id", "user_name", "review_id", "product_link", "img_link"]
df_2[string_cols] = df_2[string_cols].astype("string")
print(df_2.dtypes) 

product_id                         object
product_name               string[python]
category                         category
discounted_price                  float64
actual_price                      float64
discount_percentage                object
rating                            float64
rating_count                      float64
about_product              string[python]
user_id                    string[python]
user_name                  string[python]
review_id                  string[python]
review_title               string[python]
review_content             string[python]
img_link                   string[python]
product_link               string[python]
discount_percentage_num             int64
dtype: object


### NUMERIC:

1. Discount Ratio

Formula: discounted_price / actual_price

Why: It’s different from discount_percentage. Imagine a $10,000 product with a 10% discount vs. a $10 product with 10% discount. Both are "10%", but the absolute ratio gives clearer scale when comparing. 

2. Ultra Discount Flag  

Logic: Mark if discount_percentage_num > 90.

Why: Helps isolate products being nearly given away.  

3. Price Difference 

Formula: actual_price - discounted_price

Unlike the ratio or percentage, it tells us the absolute value lost to discounting.

In [39]:
df_2["discount_ratio"] = df_2["discounted_price"] / df_2["actual_price"]
df_2["price_difference"] = df_2["actual_price"] - df_2["discounted_price"]
df_2["ultra_discount"] = df_2["discount_percentage_num"] > 90   

In [40]:
df_2.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link,discount_percentage_num,discount_ratio,price_difference,ultra_discount
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,399.0,1099.0,64%,4.2,24269.0,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...,64,0.363057,700.0,False
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,199.0,349.0,43%,4.0,43994.0,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...,43,0.570201,150.0,False
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,199.0,1899.0,90%,3.9,7928.0,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...,90,0.104792,1700.0,False
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,329.0,699.0,53%,4.2,94363.0,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...,53,0.470672,370.0,False
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,154.0,399.0,61%,4.2,16905.0,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...,61,0.385965,245.0,False


## Categorical Features

Purpose: To group products for aggregation & segmentation.

Feature Name|Method
|----------|----------|
brands | Derive the brands from the products names descriptions
category	| Map raw category values into 5–10 super-groups
price_tier	| Bin actual_price into "Very Low", "Low", "Mid", "High" using quantiles
discount_tier |	Bin discount_percentage into "Small", "Medium", "Large"

In [41]:
# Deriving each products repective brands through string split function
df_2["brands"] = df_2["product_name"].str.split(" ").str[0]
df_2["brands"].unique() 

array(['Wayona', 'Ambrane', 'Sounce', 'boAt', 'Portronics', 'pTron', 'MI',
       'TP-Link', 'AmazonBasics', 'LG', 'Duracell', 'tizum', 'Samsung',
       'Flix', 'Acer', 'Tizum', 'OnePlus', 'Zoul', 'Amazonbasics', 'Mi',
       'FLiX', 'Wecool', 'D-Link', 'Amazon', '7SEVEN®', 'VW', 'Tata',
       'TP-LINK', 'Airtel', 'Lapster', 'Redmi', 'Model-P4', 'oraimo',
       'CEDO', 'Pinnaclz', 'TCL', 'SWAPKART', 'Firestick', 'SKYWALL',
       'Gizga', 'ZEBRONICS', 'LOHAYA', 'Gilary', 'Dealfreez', 'Isoelite',
       'CROSSVOLT', 'VU', 'PTron', 'Croma', 'Cotbolt', 'Electvision',
       'King', 'Belkin', 'Remote', 'Hisense', 'iFFALCON', 'Saifsmart',
       '10k', 'LRIPL', 'Kodak', 'BlueRigger', 'GENERIC', 'EGate',
       'realme', 'Syncwire', 'Skadioo', 'Sony', 'Storite', 'Karbonn',
       'Zebronics', 'Time', 'Caldipree', 'Universal', 'Amkette', 'POPIO',
       'MYVN', 'WZATCO', 'Crypo™', 'Posh', 'Astigo', 'Caprigo', 'TATA',
       'SoniVision', 'Rts™', 'Agaro', 'Sansui', 'Hi-Mobiler',
       'Sma

In [42]:
df_2["category"].unique()

['Computers&Accessories|Accessories&Peripherals..., 'Computers&Accessories|NetworkingDevices|Netwo..., 'Electronics|HomeTheater,TV&Video|Accessories|..., 'Electronics|HomeTheater,TV&Video|Televisions|..., 'Electronics|HomeTheater,TV&Video|Accessories|..., ..., 'Home&Kitchen|Kitchen&HomeAppliances|SmallKitc..., 'Home&Kitchen|Heating,Cooling&AirQuality|Parts..., 'Home&Kitchen|Kitchen&HomeAppliances|SmallKitc..., 'Home&Kitchen|Heating,Cooling&AirQuality|Fans|..., 'Home&Kitchen|Kitchen&HomeAppliances|Vacuum,Cl...]
Length: 211
Categories (211, object): ['Car&Motorbike|CarAccessories|InteriorAccessor..., 'Computers&Accessories|Accessories&Peripherals..., 'Computers&Accessories|Accessories&Peripherals..., 'Computers&Accessories|Accessories&Peripherals..., ..., 'OfficeProducts|OfficePaperProducts|Paper|Stat..., 'OfficeProducts|OfficePaperProducts|Paper|Stat..., 'OfficeProducts|OfficePaperProducts|Paper|Stat..., 'Toys&Games|Arts&Crafts|Drawing&PaintingSuppli...]

In [43]:
# Better representation of each category     
df_2["category"] = df_2["category"].str.split("|").str[0]
df_2["category"].unique() 

array(['Computers&Accessories', 'Electronics', 'MusicalInstruments',
       'OfficeProducts', 'Home&Kitchen', 'HomeImprovement', 'Toys&Games',
       'Car&Motorbike', 'Health&PersonalCare'], dtype=object)

In [44]:
# Binning different price tiers into low, mid & high
df_2["price_tier"] = pd.qcut(df_2["actual_price"], q=3, labels= ["Low", "Mid", "High"]) 
df_2["discount_tier"] = pd.qcut(df_2["discount_percentage_num"], q=3, labels=["Low", "Mid", "High"])
df_2.head() 

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,...,review_content,img_link,product_link,discount_percentage_num,discount_ratio,price_difference,ultra_discount,brands,price_tier,discount_tier
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories,399.0,1099.0,64%,4.2,24269.0,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...",...,Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...,64,0.363057,700.0,False,Wayona,Mid,High
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories,199.0,349.0,43%,4.0,43994.0,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...",...,I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...,43,0.570201,150.0,False,Ambrane,Low,Mid
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories,199.0,1899.0,90%,3.9,7928.0,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...",...,"Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...,90,0.104792,1700.0,False,Sounce,Mid,High
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories,329.0,699.0,53%,4.2,94363.0,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...",...,"Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...,53,0.470672,370.0,False,boAt,Low,Mid
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories,154.0,399.0,61%,4.2,16905.0,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...",...,"Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...,61,0.385965,245.0,False,Portronics,Low,High


In [45]:
# Total List of the columns 
df_2.columns

Index(['product_id', 'product_name', 'category', 'discounted_price',
       'actual_price', 'discount_percentage', 'rating', 'rating_count',
       'about_product', 'user_id', 'user_name', 'review_id', 'review_title',
       'review_content', 'img_link', 'product_link', 'discount_percentage_num',
       'discount_ratio', 'price_difference', 'ultra_discount', 'brands',
       'price_tier', 'discount_tier'],
      dtype='object')

## Text/NLP Features

Purpose: To quantify and extract patterns from text columns like review_title, and review_content.

we make use of review content and review tiltle for sentiment guage and customer satisfaction by picking out the words that best describe the product validity and integridity. Distribute them accordingly into a satisfactory & dissatisfactory index.
 

In [46]:
# A currated list of basic words assigned to positive and negative sentiment/responses

pos_words = ["great", "excellent", "amazing", "good product", "satisfied", "happy", "love", "perfect"," worth it", "nice", "really nice", "value for money", "durable", "sturdy", "reliable", "authentic", "genuine", "as described", "recommend", "good quality"]

neg_words = ["bad", "poor", "disappointed", "terrible", "awful", "regret", "waste", "fake", "duplicate", "broken", "cracked", "flimsy", "weak", "overpriced", "cheap quality", "not working", "stopped working", "not as described", "useless", "returned"]

In [47]:
# Standardization of the review content 
df_2["review_content"] = df_2["review_content"].fillna("").str.lower() 

# mapping a new dataframe for the pos and neg words that iterates over the "review_content" 
pos_word = "|".join([re.escape(words) for words in pos_words])
df_2["pos_word"] = df_2["review_content"].str.contains(pos_word, regex=True)

neg_word = "|".join([re.escape(words) for words in neg_words])
df_2["neg_word"] = df_2["review_content"].str.contains(neg_word, regex=True) 
 

In [48]:
# Creation of the consumer_sentiment 
def customer(row):
    if row["pos_word"] and not row["neg_word"]:
        return "satisfied"
    elif not row["pos_word"] and row["neg_word"]:  
        return "dissatisfied" 
    elif row["pos_word"] and row["neg_word"]:
        return "Mixed"  
    else:
        return np.nan  

df_2["customer_sentiment"] = df_2.apply(customer,axis=1)  

df_2.head(10)

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,...,discount_percentage_num,discount_ratio,price_difference,ultra_discount,brands,price_tier,discount_tier,pos_word,neg_word,customer_sentiment
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories,399.0,1099.0,64%,4.2,24269.0,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...",...,64,0.363057,700.0,False,Wayona,Mid,High,True,False,satisfied
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories,199.0,349.0,43%,4.0,43994.0,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...",...,43,0.570201,150.0,False,Ambrane,Low,Mid,True,True,Mixed
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories,199.0,1899.0,90%,3.9,7928.0,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...",...,90,0.104792,1700.0,False,Sounce,Mid,High,True,False,satisfied
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories,329.0,699.0,53%,4.2,94363.0,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...",...,53,0.470672,370.0,False,boAt,Low,Mid,True,True,Mixed
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories,154.0,399.0,61%,4.2,16905.0,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...",...,61,0.385965,245.0,False,Portronics,Low,High,True,True,Mixed
5,B08Y1TFSP6,pTron Solero TB301 3A Type-C Data and Fast Cha...,Computers&Accessories,149.0,1000.0,85%,3.9,24871.0,Fast Charging & Data Sync: Solero TB301 Type-C...,"AEQ2YMXSZWEOHK2EHTNLOS56YTZQ,AGRVINWECNY7323CW...",...,85,0.149,851.0,False,pTron,Mid,High,True,False,satisfied
6,B08WRWPM22,"boAt Micro USB 55 Tangle-free, Sturdy Micro US...",Computers&Accessories,176.63,499.0,65%,4.1,15188.0,It Ensures High Speed Transmission And Chargin...,"AG7C6DAADCTRQJG2BRS3RIKDT52Q,AFU7BOMPVJ7Q3TTA4...",...,65,0.353968,322.37,False,boAt,Low,High,True,False,satisfied
7,B08DDRGWTJ,MI Usb Type-C Cable Smartphone (Black),Computers&Accessories,229.0,299.0,23%,4.3,30411.0,1m long Type-C USB Cable|Sturdy and Durable. W...,"AHW6E5LQ2BDYOIVLAJGDH45J5V5Q,AF74RSGCHPZITVFSZ...",...,23,0.765886,70.0,False,MI,Low,Low,True,False,satisfied
8,B008IFXQFU,"TP-Link USB WiFi Adapter for PC(TL-WN725N), N1...",Computers&Accessories,499.0,999.0,50%,4.2,179691.0,USB WiFi Adapter —— Speedy wireless transmissi...,"AGV3IEFANZCKECFGUM42MRH5FNOA,AEBO7NWCNXKT4AESA...",...,50,0.499499,500.0,False,TP-Link,Low,Mid,True,True,Mixed
9,B082LZGK39,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories,199.0,299.0,33%,4.0,43994.0,Universal Compatibility – It is compatible wit...,"AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...",...,33,0.665552,100.0,False,Ambrane,Low,Low,True,True,Mixed


In [49]:
# dropping off unnnecesary columns    

df_2 = df_2.drop(columns = ["user_id",'user_name','img_link','product_link'])
df_2.columns 

Index(['product_id', 'product_name', 'category', 'discounted_price',
       'actual_price', 'discount_percentage', 'rating', 'rating_count',
       'about_product', 'review_id', 'review_title', 'review_content',
       'discount_percentage_num', 'discount_ratio', 'price_difference',
       'ultra_discount', 'brands', 'price_tier', 'discount_tier', 'pos_word',
       'neg_word', 'customer_sentiment'],
      dtype='object')

In [51]:
# Saving cleaned data to a new csv file
df_2.to_csv("cleaned_data_2.csv", index=False)