# **Big Basket 🧺**
  # **Forget the days of grocery shopping being a chore! Imagine this: you're lounging on the couch, phone in hand, and with a few taps you've got a truckload (well, maybe a basketful) of fresh produce, pantry staples, and even household essentials on their way to your doorstep. That's the magic of bigbasket, India's one stop grocery shopping destination.**

  # **They've got over 20,000 products from all your favorite brands, so you can stock up on everything you need without ever leaving home. Fruits and veggies? Got it. Dairy and meat for that epic dinner party? No problem. Bigbasket even has beauty supplies and cleaning products, so you can basically tackle your entire shopping list in one place. Plus, they have crazy convenient delivery options, so you can ditch the supermarket lines and spend that time doing way cooler things (like prepping for that dinner party!). Bigbasket basically makes grocery shopping a breeze, so you can get back to the fun stuff.**

<hr>

# **About the dataset 📊**

**This dataset is basically a big ol' bunch of info about products, all broken down into 10 easy-peasy pieces:**

  * **Index: This is just a fancy way of saying it's a unique ID for each item, like a fingerprint in the data world.**
  * **Product: The name of the product, just like you'd see it on the website.**
  * **Category: The broad group the product falls into, like groceries or home stuff.**
  * **Sub-Category: This is like zooming in on the category. So, maybe "groceries" becomes "fruits" or "home stuff" becomes "cleaning supplies."**
  * **Brand: Who makes the product? You know, like Nike or that yummy jam brand you love.**
  * **Sale Price: How much you gotta pay for it right now.**
  * **Market Price: This is kind of like a reference point, showing the usual price for the product.**
  * **Type: Another way to classify the product, just for extra organization.**
  * **Rating: What other customers think! This is a number showing how much people liked it.**
  * **Description: This is where they tell you all the juicy details about the dataset itself, what it includes and how it's put together**

<hr>

**Link to the dataset: https://drive.google.com/file/d/1aEuXxadTlHS4d_BBqrhVVurOFI154ATS/view?usp=sharing**

<hr>

# **Step 1 - Importing the libraries 📚**

**Configuration Libraries**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import warnings
warnings.filterwarnings("ignore")

**Mandatory Libraries**

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px

**External Libraries**

In [None]:
!pip install colorama
import colorama
from colorama import Fore, Back, Style

Collecting colorama
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Installing collected packages: colorama
Successfully installed colorama-0.4.6


<hr>

# **Step 2 - Data Loading and Inspection 🌐**

**Data Loading**

In [None]:
df = pd.read_csv("/content/drive/MyDrive/Datasets/BBData.csv")

**Data Inspection**

In [None]:
df.head().style.set_properties(
    **{
        "background-color":"wh",
        "border-style":"solid",
        "border-color":"white",
    }
)

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description,discounts
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220,220,Hair Oil & Serum,4.1,"This Product contains Garlic Oil that is known to help proper digestion, maintain proper cholesterol levels, support cardiovascular and also build immunity. For Beauty tips, tricks & more visit https://bigbasket.blog/",0.0
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180,180,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), refrigerator safe, dishwasher safe and can also be used for re-heating food and not for cooking. All containers come with airtight lids and a wide variety of attractive colours. Stack these stylish and colourful containers in your kitchen with ease and for a look-good factor.",0.0
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119,250,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your mother, sister, in-laws, boss or your friends, this beautiful designer piece wherever placed, is sure to beautify the surroundings Traditional design This type diya has been used for Diwali and All other Festivals for centuries. Sturdy and easy to carry The feet keep it balanced to ensure safety. Wonderful Oil Lamp made in Brass also called as Jyoti. This is a handcrafted piece of Indian brass Deepak.",52.4
3,4,Cereal Flip Lid Container/Storage Jar - Assorted Colour,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149,176,"Laundry, Storage Baskets",3.7,"Multipurpose container with an attractive design and made from food-grade plastic for your hygiene and safety ideal for storing pulses. Grains, spices, and more with easy opening and closing flip-open lid. Strong, durable and transparent body for longevity and easy identification of contents. Multipurpose storage solution for your daily needs stores your everyday food essentials in style with the Nakoda container set. With transparent bodies, you can easily identify your stored items without having to open the lids. These containers are ideal for storing a large variety of items such as food grains, snacks and pulses to sugar, spices, condiments and more. Featuring unique flip-open lids, you can easily open and close this container without any hassles. The Nakoda container is made from high-quality food-grade and BPA-free plastic that is 100% safe for storing food items. You can safely store your food items in this container without worrying about contamination and harmful toxins. As they are constructed using highly durable virgin plastic, this container will last for a long time even with regular use. This container can enhance the overall look of your kitchen decor. Being dishwasher safe, cleaning and maintaining this container is an easy task. You can also use a simple soap solution to manually wash and retain their looks for a long time.",15.340909
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162,162,Bathing Bars & Soaps,4.4,"Nivea Creme Soft Soap gives your skin the best care that it must get. The soft bar consists of Vitamins F and Almonds which are really skin gracious and help you get great skin. It provides the skin with moisture and leaves behind flawless and smooth skin. It makes sure that your body is totally free of germs & dirt and at the same time well nourished.For Beauty tips, tricks & more visit https://bigbasket.blog/",0.0


<hr>

**Data Information**

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27555 entries, 0 to 27554
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   index         27555 non-null  int64  
 1   product       27554 non-null  object 
 2   category      27555 non-null  object 
 3   sub_category  27555 non-null  object 
 4   brand         27554 non-null  object 
 5   sale_price    27555 non-null  float64
 6   market_price  27555 non-null  float64
 7   type          27555 non-null  object 
 8   rating        18929 non-null  float64
 9   description   27440 non-null  object 
dtypes: float64(3), int64(1), object(6)
memory usage: 2.1+ MB


<hr>

# **Step 3 - Data Cleaning and Processing**

**Shape Inspection**

In [None]:
a = df.shape
print(f"Rows: {a[0]} & Columns: {a[1]}")

Rows: 27555 & Columns: 10


**Null Check✅**

In [None]:
df.isnull().sum()

index              0
product            1
category           0
sub_category       0
brand              1
sale_price         0
market_price       0
type               0
rating          8626
description      115
dtype: int64

In [None]:
# Let's work with product columns
df["product"] = df["product"].fillna("NoProductName")
# If we are working with brand column
df["brand"] = df["brand"].fillna("UnknownBrand")

# Working with descriptions
df["description"] = df["description"].fillna("NoDescriptionAvailable")

<hr>

**Rounding off the values**

In [None]:
# for sales price
df["sale_price"] = df["sale_price"].round().astype(int)

# For the market price
df["market_price"] = df["market_price"].round().astype(int)

<hr>

# **Step 4 - Exploratory Data Analysis 📊**

<hr>

**Objective 1: Find out the discounts based on the market price**

In [None]:
df["discounts"] = (((df["market_price"] - df["sale_price"]) / df["market_price"])*100)

In [None]:
df.head(2)

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description,discounts
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220,220,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...,0.0
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180,180,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ...",0.0


<hr>

**Objective 2: Get a higher level overview of the data**

In [None]:
# Printing the same with the help of colorama
print(Back.BLACK + Style.BRIGHT + "Summary of the Inventory" + Style.RESET_ALL)

# Printing the total unique products available
print(Fore.RED + "Total Number of Unique Products Available: "+ Style.RESET_ALL + Fore.YELLOW + str(df["product"].nunique()) + Style.RESET_ALL)

# Printing the number of product categories available
print(Fore.BLUE + "Total Number of Unique Products Categories Available: "+ Style.RESET_ALL + Fore.YELLOW + str(df["category"].nunique()) + Style.RESET_ALL)

# Printing the total number of unique subcategories
print(Fore.GREEN + "Total Number of Unique Sub_Categories Available: "+ Style.RESET_ALL + Fore.YELLOW + str(df["sub_category"].nunique()) + Style.RESET_ALL)

# Printing the total types of products
print(Fore.CYAN + "Total Number of Unique Type Available: "+ Style.RESET_ALL + Fore.YELLOW + str(df["type"].nunique()) + Style.RESET_ALL)

# Printing all the brands that are available
print(Fore.RED + "Total Number of Unique Brand Available: "+ Style.RESET_ALL + Fore.YELLOW + str(df["brand"].nunique()) + Style.RESET_ALL)

[40m[1mSummary of the Inventory[0m
[31mTotal Number of Unique Products Available: [0m[33m23541[0m
[34mTotal Number of Unique Products Categories Available: [0m[33m11[0m
[32mTotal Number of Unique Sub_Categories Available: [0m[33m90[0m
[36mTotal Number of Unique Type Available: [0m[33m426[0m
[31mTotal Number of Unique Brand Available: [0m[33m2314[0m


**Insights**
  * **The inventory as seen is good right now**

<hr>

**Objective 3: Analysing the data on the basis on demands that is in Products & Categories**

In [None]:
# Grabbing the data from category and product
category_data = df[["category", "product"]]

# Since, we are counting the number of product, we are drop the duplicates
category_data = category_data.drop_duplicates()

# We are going to implement groupby
category_data = category_data.groupby("category").agg(product_count = ("product", "count")).reset_index().sort_values("product_count", ascending = False)

**Results**

In [None]:
category_data

Unnamed: 0,category,product_count
2,Beauty & Hygiene,6839
8,Gourmet & World Food,4109
9,"Kitchen, Garden & Pets",3186
10,Snacks & Branded Foods,2454
4,Cleaning & Household,2411
6,"Foodgrains, Oil & Masala",1997
3,Beverages,756
1,"Bakery, Cakes & Dairy",752
0,Baby Care,549
7,Fruits & Vegetables,353


**Visualize**

In [None]:
fig = px.bar(category_data, x = "category", y = "product_count", color = "category", title = "Analysis on the product and category")
fig.show()

**Insights**
  * **Out of all the given categories, `Beauty & Hygiene` is having the most products**
  * **Followed by the same, the `Gourmet & World Food` and `Kitchen, Garden & Pets` are the ones that are in the top 3**
  * **Given this data, we can easily analyse that demands are mostly from these three categories, since these categories combined have more products as compared to the rest of the data**
  * **According to the given data, we can clearly see that there is a huge stock for Beauty and Hygiene**
  * **Maybe, this is due to the upcoming trend of festival/ common reason**
  * **So, going the recent reallife experience, we need to maximize the supply of beauty and hygiene category**
  

<hr>

**Objective 4 - Analysis on the basis of Brand and Type, to know which are the brands that are famous among consumer and what kind of products are famous from those brands**

In [None]:
# Grabbing the data
brand_data = df[["brand", "type"]]

# Dropping the duplicates
brand_data = brand_data.drop_duplicates()

# We need to group the data
brand_data = brand_data.groupby("brand").agg(type_count = ("type", "count")).reset_index().sort_values("type_count", ascending = False)

In [None]:
brand_data

Unnamed: 0,brand,type_count
2296,bb Combo,53
741,Fresho,41
171,BB Home,37
507,Dabur,34
2298,bb Royal,32
...,...,...
1361,Mugi Fresh,1
1360,Mud,1
1359,Mrs Bector'S Cremica,1
1358,Mr.Copper King,1


**Visualize the top 10 brands**

In [None]:
brand_data_10 = brand_data.head(10)

In [None]:
fig = px.bar(brand_data_10, x = "brand", y = "type_count", color = "brand", title = "Analysis on the brand and kinds of category type they are dealing with")
fig.show()

**Insights**
  * **BB's own subsidiaries are dealing with a large category type of products**
  * **Can we say that BB own subsidaries can be a potential supplier**
  * **If we make BB, as our potential supplier. First thing it will less on cost, QA would be done already by BB**

<hr>

**Objective 5 - Analyse the Categories and sub categories of the given data, and try to come up a useful insight that can give a idea about the demands related to a specific category and their associated sub-category**

In [None]:
subcatgory_data = df[["category", "sub_category"]]

subcatgory_data = subcatgory_data.drop_duplicates()

In [None]:
subcatgory_data

Unnamed: 0,category,sub_category
0,Beauty & Hygiene,Hair Care
1,"Kitchen, Garden & Pets",Storage & Accessories
2,Cleaning & Household,Pooja Needs
3,Cleaning & Household,Bins & Bathroom Ware
4,Beauty & Hygiene,Bath & Hand Wash
...,...,...
8815,Gourmet & World Food,Rice & Rice Products
12995,Snacks & Branded Foods,Cuts & Sprouts
13542,Gourmet & World Food,Mutton & Lamb
18845,Baby Care,"Atta, Flours & Sooji"


In [None]:
subcatgory_data = subcatgory_data.groupby("category").agg(subcatgory_count = ("sub_category", "count")).reset_index().sort_values("subcatgory_count", ascending = False)

In [None]:
subcatgory_data.head(10)

Unnamed: 0,category,subcatgory_count
8,Gourmet & World Food,14
10,Snacks & Branded Foods,12
2,Beauty & Hygiene,10
4,Cleaning & Household,10
9,"Kitchen, Garden & Pets",10
6,"Foodgrains, Oil & Masala",9
1,"Bakery, Cakes & Dairy",8
0,Baby Care,7
7,Fruits & Vegetables,7
3,Beverages,6


**Visualize**

In [None]:
fig = px.pie(subcatgory_data, values = subcatgory_data["subcatgory_count"], names = subcatgory_data["category"], color = "category")
fig.show()

**Insights**
  * **Though there is less no. of category in the Beauty & Hygiene but still in demand**
  * **We can expand the category in the most product area that is Beauty & Hygiene, this will attract more consumer, eventually enhancing profits**


<hr>

**Objective 6 - Analysis on the sales**

In [None]:
print(Back.GREEN + Style.BRIGHT + Fore.YELLOW + "Analysis on Sales" + Style.RESET_ALL)

print('Minimum Sale Price : '+ Fore.RED+ Style.BRIGHT+ str(df['sale_price'].min()) + Style.RESET_ALL)

print('Maximum Sale Price : '+ Fore.RED+ Style.BRIGHT+ str(df['sale_price'].max()) + Style.RESET_ALL)

print('Mean Sale Price    : '+ Fore.RED+ Style.BRIGHT+ str(round(df['sale_price'].mean())) + Style.RESET_ALL)

print('Median Sale Price  : '+ Fore.RED+ Style.BRIGHT+ str(round(df['sale_price'].median())) + Style.RESET_ALL)

[42m[1m[33mAnalysis on Sales[0m
Minimum Sale Price : [31m[1m2[0m
Maximum Sale Price : [31m[1m12500[0m
Mean Sale Price    : [31m[1m323[0m
Median Sale Price  : [31m[1m190[0m


**Insights**
  * **There is a huge difference between the price range, as we can see that the average price is near to 320 whereas the median or middle segment falls to 190, that means it's a straight case a skewness**
  * **This mean that most of the products that we are having in the inventory are actually having price less than 500 (Assumption)**

In [None]:
fig = px.histogram(df, x = "sale_price")
fig.show()

<hr>

**Objective 7 - With the above analysis, can we get an idea of the range which is having most of the products**

In [None]:
# defining the ranges for the products

range_val = [
    ['1-10',1, 10],
 ['11-25', 11, 25],
  ['26-50', 26, 50],
   ['51-100',51, 100],
    ['101-150', 101, 150],
     ['151-200', 151, 200],
      ['201-300',201, 300],
       ['301-400', 301, 400],
        ['401-500', 401, 500],
         ['501-1000',501, 1000],
          ['1001-1500', 1001, 1500],
           ['1501-2000', 1501, 2000],
              ['2001-3000',2001, 3000],
               ['3001-5000', 3001, 5000],
                ['5001-10000', 5001, 10000],
                 ['10001-15000',10001, 15000]]

In [None]:
# Frame a dataframe
range_data = pd.DataFrame(range_val, columns = ["range_name", "min_value", "max_value"])

# Creating product
range_data["product_count"] = ""

In [None]:
range_data

Unnamed: 0,range_name,min_value,max_value,product_count
0,1-10,1,10,
1,11-25,11,25,
2,26-50,26,50,
3,51-100,51,100,
4,101-150,101,150,
5,151-200,151,200,
6,201-300,201,300,
7,301-400,301,400,
8,401-500,401,500,
9,501-1000,501,1000,


In [None]:
# we are trying to find the number of products available in the given range
for index, rows in range_data.iterrows():
  range_data.at[index, "product_count"] = len(df['product'][(df['sale_price']>= rows['min_value']) & (df['sale_price']<= rows['max_value'])])

**Results**

In [None]:
# Dividing on the basis of pricing
range_data

Unnamed: 0,range_name,min_value,max_value,product_count
0,1-10,1,10,178
1,11-25,11,25,689
2,26-50,26,50,2232
3,51-100,51,100,4654
4,101-150,101,150,3661
5,151-200,151,200,3196
6,201-300,201,300,4568
7,301-400,301,400,2555
8,401-500,401,500,1693
9,501-1000,501,1000,2773


**Visualize**

In [None]:
fig = px.bar(range_data, x = "range_name", y = "product_count", color = "range_name")
fig.show()

**Insights**
  * **range from 26 to 1000 having more products focusing on mid range of price**
  * **The range from 26-1000 is having more number of product, so our major focus can on on this range**



<hr>

**Objective 8 - Analysis on the basis of percentages of discounts**

In [None]:
df[["product", "category", "discounts"]][df["discounts"]==df["discounts"].max()]

Unnamed: 0,product,category,discounts
26976,Curry Leaves,Fruits & Vegetables,86.666667


<hr>

**Get the top 20 discounted products**

In [None]:
df[["product", "category", "discounts"]].sort_values("discounts", ascending = False).head(20).style.background_gradient("Reds")

Unnamed: 0,product,category,discounts
26976,Curry Leaves,Fruits & Vegetables,86.666667
17713,Fruit & Vegetables Hand Juicer,"Kitchen, Garden & Pets",82.506266
13318,Small Silicone Spatula With Plastic Handle - Assorted Colours,"Kitchen, Garden & Pets",81.203008
13740,Decorative Party Light Big Star String LED Light 2 M - Multicolour,"Kitchen, Garden & Pets",80.982712
10438,NHS 860 Temperature Control Professional Hair Straightener,Beauty & Hygiene,80.501044
4562,Concealer Brush 930,Beauty & Hygiene,80.0
11473,Decorative Party Light Golden Bell String LED Light 7 M - Warm White,"Kitchen, Garden & Pets",79.23962
13265,Decorative Party Light Golden Bell String LED Light 7 M - Multicolour,"Kitchen, Garden & Pets",79.23962
10092,USB String Fairy Lights 3M 30 LED For Decoration - Multicolour,"Kitchen, Garden & Pets",78.696742
24292,Steel Belly Shape Storage Dabba/ Container Set With PP Lid - Silver & Purple,"Kitchen, Garden & Pets",77.98995


<hr>

**Find out the category based discounts**

In [None]:
# We are grabbing the data which is having discount as not zero, and then we are trying to find the average discount
# provided in that particular columns
category_discounts = df[df["discounts"]!=0].groupby("category").agg(Average_discounts = ("discounts", "mean")).reset_index()

# Sorting the discounts
category_discounts = category_discounts.sort_values("Average_discounts", ascending = False)

# Round off
category_discounts = category_discounts.round({"Average_discounts":2})

In [None]:
category_discounts

Unnamed: 0,category,Average_discounts
9,"Kitchen, Garden & Pets",28.62
7,Fruits & Vegetables,21.81
3,Beverages,21.54
2,Beauty & Hygiene,20.88
6,"Foodgrains, Oil & Masala",19.8
8,Gourmet & World Food,19.48
4,Cleaning & Household,19.47
10,Snacks & Branded Foods,17.63
0,Baby Care,16.16
5,"Eggs, Meat & Fish",16.04


**Visualize**

In [None]:
fig = px.bar(category_discounts, x = "category", y = "Average_discounts", color = "category")
fig.show()

**Insights**
  * **the discount lies mostly between 15 to 25 and above 25% for niche products**
  * **product which have less sale have max discounts**



<hr>

**Objective 9 - Getting the idea about the most expensive item with their discounts and same with the cheapest items**

**For Expensive Products**

In [None]:
# Expensive products
expensive_products = df.nlargest(10, "sale_price")[["product",  "category", 'market_price', "sale_price", "discounts"]]

print(Back.YELLOW+ Style.BRIGHT+ 'Expensive Products: ' + Style.RESET_ALL)
print(expensive_products.to_string())

[43m[1mExpensive Products: [0m
                                                          product                category  market_price  sale_price  discounts
25301                                             Bravura Clipper  Kitchen, Garden & Pets         12500       12500   0.000000
21761               Pet Food - N&D Team Breeder Puppy Top Farmina  Kitchen, Garden & Pets         10090       10090   0.000000
12669                            Epilator SE9-9961 Legs-Body-Face        Beauty & Hygiene         10769        8184  24.004086
23082  Gas Stove-4 Burner Royale Plus Schott Glass, Black (40278)  Kitchen, Garden & Pets         12245        7999  34.675378
2781                                       Extra Virgin Olive Oil    Gourmet & World Food          7299        7299   0.000000
25797  4 Burner Gas Stove - Marvel Plus Glass Tables, GTM04 40355  Kitchen, Garden & Pets          9695        7270  25.012893
1056   Gas Stove-3 Burner Royale Plus Schott Glass, Black (40177)  Kitchen, G

**For cheapest products**

In [None]:
# Expensive products
expensive_products = df.nsmallest(10, "sale_price")[["product",  "category", 'market_price', "sale_price", "discounts"]]

print(Back.YELLOW+ Style.BRIGHT+ 'Cheapest Products: ' + Style.RESET_ALL)
print(expensive_products.to_string())

[43m[1mCheapest Products: [0m
                                           product                category  market_price  sale_price  discounts
26976                                 Curry Leaves     Fruits & Vegetables            15           2  86.666667
21312                                        Serum        Beauty & Hygiene             3           3   0.000000
2761   Orbit Sugar-Free Chewing Gum - Lemon & Lime  Snacks & Branded Foods             5           5   0.000000
2978          Sugar Free Chewing Gum - Mixed Fruit  Snacks & Branded Foods             5           5   0.000000
3445                 Marie Light Biscuits - Active  Snacks & Branded Foods             5           5   0.000000
6014                       Good Day Butter Cookies  Snacks & Branded Foods             5           5   0.000000
9971             Tomato - Local, Organically Grown     Fruits & Vegetables             6           5  16.666667
11306               Happy Happy Choco-Chip Cookies  Snacks & Branded Fo

<hr>

**Insights**
  * **snacks and foods will be consumed by everyone even without any discount, so even if there is no discount still this will be in demand**


<hr>

**Objective 10 - Measure the standard variation to see how the discount varies in a particular category?**


In [None]:
# We are trying to group categories based on discounts, and then we are finding standard deviations
variation_data = df.groupby("category")["discounts"].std().mean()

In [None]:
variation_data

12.561645837790525

<hr>

**Objective 11 - Find the best brand based on rating, so that we can get best suppliers**

In [None]:
df["rating"].median()

4.1

In [None]:
df["rating"].mean()

3.943409583179249

In [None]:
fig = px.histogram(df, "rating")
fig.show()

In [None]:
df["rating"] = df["rating"].fillna(df["rating"].mean())

In [None]:
best_rated_brand = df.groupby("brand")["rating"].mean().sort_values(ascending = False)

In [None]:
best_rated_brand.head()

brand
DIVING DUCK    5.0
Muscleblaze    5.0
Depend         5.0
LuxaDerme      5.0
Pez            5.0
Name: rating, dtype: float64

<hr>

**Building a Regression Model for predicting market price with help of sales prices**

In [None]:
# library
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Divide the data
X = df[["sale_price", "market_price"]]

Y = df["discounts"]

In [None]:
X.shape

(27555, 1)

In [None]:
# Split the data
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2, random_state = 32)

In [None]:
# making a object to the LR
model = LinearRegression()

# Fit the data
model.fit(X, Y)

In [None]:
# Predict
model.predict([[8990]])

array([10390.33493725])

In [None]:
r2_score()

**Model with split Data**

In [None]:
# making model 2
model2 = LinearRegression()


model2.fit(x_train, y_train)


pred = model2.predict(x_test)

In [None]:
error_data = pd.DataFrame(columns = ["Actual_Data", "Predicted_Data", "Errors"])

In [None]:
error_data["Actual_Data"] = y_test

error_data["Predicted_Data"] = pred

error_data["Errors"] = error_data["Actual_Data"] - error_data["Predicted_Data"]

In [None]:
error_data.head()

Unnamed: 0,Actual_Data,Predicted_Data,Errors
7440,52,69.38468,-17.38468
25164,45,61.307484,-16.307484
25902,275,295.54619,-20.54619
1676,455,280.545681,174.454319
9512,279,239.005812,39.994188


In [None]:
a = model2.predict([[279]])

In [None]:
a

array([331.31663246])

In [None]:
def discount(market, sales):
  return (((market - sales) / market) * 100)

In [None]:
discount(331, 279)

15.709969788519636

In [None]:
from sklearn.metrics import *

print(r2_score(y_test, pred))

0.9269535603496292


In [None]:
np.sqrt(mean_squared_error(y_test, pred))

157.5896574573107