# Big basket EDA by rmd

**E-commerce** (electronic commerce) is the activity of electronically buying or selling of products on online services or over the Internet. E-commerce draws on technologies such as mobile commerce, electronic funds transfer, supply chain management, Internet marketing, online transaction processing, electronic data interchange (EDI), inventory management systems, and automated data collection systems. E-commerce is in turn driven by the technological advances of the semiconductor industry, and is the largest sector of the electronics industry.

![](https://img.freepik.com/free-vector/online-shopping-banner-mobile-app-templates-concept-flat-design_1150-34865.jpg?w=740&t=st=1659930109~exp=1659930709~hmac=1df6da9d4f62478113772068daa95cf21428910736485939e6b09f7ad501fbcd)

# About the dataset:

Bigbasket is the largest online grocery supermarket in India. Was launched somewhere around in 2011 since then they've been expanding their business. Though some new competitors have been able to set their foot in the nation such as Blinkit etc. but BigBasket has still not loose anything - thanks to ever expanding popular base and their shift to online buying.

# Features of the dataset:

It contains
* Index
* Product name
* Category
* Sub-category
* Brand of the product
* Sale price 
* Type
* Rating (out of 5)
* Small description of the product

Let's start our work with importing necessary libraries and the dataset!

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

plt.style.use("Solarize_Light2")
#sns.set_palette("dark")
#sns.set_style("ticks")

%matplotlib inline

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
bigbasket =pd.read_csv("../input/bigbasket-entire-product-list-28k-datapoints/BigBasket Products.csv")
bigbasket.head()

In [None]:
bigbasket.describe()

In [None]:
bigbasket.info()

# Top & least sold products

In [None]:
top = bigbasket["product"].value_counts().head(15)
least=bigbasket["product"].value_counts().tail(15)

In [None]:
fig = plt.figure(figsize=(14,4))
ax = fig.add_axes([0,0,1,1])

import textwrap
def wrap_labels(ax, width, break_long_words=False):
    labels = []
    for label in ax.get_xticklabels():
        text = label.get_text()
        labels.append(textwrap.fill(text, width=width,
                      break_long_words=break_long_words))
    ax.set_xticklabels(labels, rotation=0)

    


sns.barplot(x=top.index, y=top.values, data=bigbasket,
            linewidth=0,
            alpha=1.0,
            color="b")

#format axis
ax.set_xlabel("Most sold products",fontsize=15, weight='semibold')
ax.set_ylabel("Number",fontsize=15, weight='semibold')

wrap_labels(ax, 10)

plt.show()

In [None]:
fig = plt.figure(figsize=(14,4))
ax = fig.add_axes([0,0,1,1])

sns.barplot(x=least.index, y=least.values, data=bigbasket, linewidth=0, alpha=1.0, color="darkgoldenrod")
sns.set(style="ticks")

#format axis
ax.set_xlabel("Least sold products",fontsize=15, weight='semibold')
ax.set_ylabel("Number",fontsize=15, weight='semibold')
plt.xticks(fontsize=10, weight="semibold")


wrap_labels(ax, 10)
fig.show()

Through the plots of most and least sold products, we get to know that 
* Indians have a trend to purchase different types of oils and ghee. 
* Turmeric and coriander are the most used spices in indian cuisine. On the other hand, Nutmeg and pepper powder are used in scarce amounts.  
* Hand santizer is one of the highest sold products most probably due to covid-19 pandemic.
* Indians are very keen about haircare as we see hair color, shampoo and oils in the highest selling products.
* Raw spices like turmeric are preferred on masala mixtures.


In [None]:
bigbasket.head()

Lets check which items can be bought at discounted price from Big basket. For this, we will add another feature "diff_in_prices" measuring discount on a certain item.

In [None]:
bigbasket["diff_in_prices"] = bigbasket["market_price"] - bigbasket["sale_price"]
#bigbasket.head()
discount = bigbasket[bigbasket["diff_in_prices"] != 0]
discount

In [None]:
#sns.boxplot(bigbasket['market_price'], showmeans=True)
#sns.boxplot(discount['market_price'], showmeans=True, color='r')

In [None]:
fig = plt.figure(figsize=(20,10))

plt.style.use("Solarize_Light2")
sns.distplot(discount.rating, color='b', kde =True)
sns.distplot(bigbasket.rating, color='gold', kde =True)
plt.xlabel("Ratings",fontsize=15, weight='semibold')
plt.ylabel("Density",fontsize=15, weight='semibold')
plt.title("Relative distribution of all products with discounted products",fontsize=15, weight='semibold')
fig.legend()

In the above graph, yellow color specifies rating of all the items, whereas blue color denotes the ratings of the items on which some discount has been offered. As we see, 
> the offered discounts showed a **little increase in purchase of items with 3.0 to 4.2 ratings**. Otherwise, discounts helped no increase in purchase. Another interesting observation was that **the highest rated products (4.5 to 5) with no discount exceeded the rate of purchase of discounted products**. It means the customers if provided with high quality products which satisfy them, will buy the products no matter discount is offered or not.

In [None]:
s=bigbasket.query('rating > 4', inplace=False)
s#bigbasket

# Work in process!
Do upvote and comment if you like my efforts. Any constructive addition or suggestion is highly welcomed.