## Super Market Retail Analysis

### Dataset

The dataset is a fictional dataset created for helping data analysts practice exploratory data analysis and data visualization. The dataset has data on orders placed by customers on a grocery delivery application and includes details like the order ID, customer name, category, discount and profit made. The dataset is designed with an assumption that the orders are placed by customers living in the state of Tamil Nadu, India.


The analysis and visualization will be done in Excel and Tableau but some rudimentary operations to clean the dataset will be done here in with python

### Importing necessary Packages

In [1]:
#Import Pandas to be able to perform the cleaning function
import pandas as pd 

### Data Gathering

In [3]:
#import dataset already saved in local machine 
retail_sales = pd.read_csv(r"C:\Users\FORTUNE\Documents\Datasets\Supermart Grocery Sales - Retail Analytics Dataset.csv")

In [10]:
#examine the basic structure and components of the data
print(retail_sales.head(10))
print (retail_sales.info())
print (retail_sales.describe())

  Order ID Customer Name          Category      Sub Category            City  \
0      OD1        Harish      Oil & Masala           Masalas         Vellore   
1      OD2         Sudha         Beverages     Health Drinks     Krishnagiri   
2      OD3       Hussain       Food Grains      Atta & Flour      Perambalur   
3      OD4       Jackson  Fruits & Veggies  Fresh Vegetables      Dharmapuri   
4      OD5       Ridhesh       Food Grains   Organic Staples            Ooty   
5      OD6        Adavan       Food Grains   Organic Staples      Dharmapuri   
6      OD7         Jonas  Fruits & Veggies  Fresh Vegetables          Trichy   
7      OD8         Hafiz  Fruits & Veggies      Fresh Fruits  Ramanadhapuram   
8      OD9         Hafiz            Bakery          Biscuits     Tirunelveli   
9     OD10      Krithika            Bakery             Cakes         Chennai   

   Order Date Region  Sales  Discount  Profit       State  
0  11-08-2017  North   1254      0.12  401.28  Tamil Nadu  

### Data Cleaning

#### Tidyness issues
- "State" Column unnecessary.
- Data in "Order Date" cannot be used to analyze trends.

###### Define
- Remove State Column.
- Convert data in "Order Dates" to datetime.
- Separate data in "Order Dates" into different columns

###### Code

In [11]:
#copy dataset before cleaning
df = retail_sales.copy()

In [12]:
df

Unnamed: 0,Order ID,Customer Name,Category,Sub Category,City,Order Date,Region,Sales,Discount,Profit,State
0,OD1,Harish,Oil & Masala,Masalas,Vellore,11-08-2017,North,1254,0.12,401.28,Tamil Nadu
1,OD2,Sudha,Beverages,Health Drinks,Krishnagiri,11-08-2017,South,749,0.18,149.80,Tamil Nadu
2,OD3,Hussain,Food Grains,Atta & Flour,Perambalur,06-12-2017,West,2360,0.21,165.20,Tamil Nadu
3,OD4,Jackson,Fruits & Veggies,Fresh Vegetables,Dharmapuri,10-11-2016,South,896,0.25,89.60,Tamil Nadu
4,OD5,Ridhesh,Food Grains,Organic Staples,Ooty,10-11-2016,South,2355,0.26,918.45,Tamil Nadu
...,...,...,...,...,...,...,...,...,...,...,...
9989,OD9990,Sudeep,"Eggs, Meat & Fish",Eggs,Madurai,12/24/2015,West,945,0.16,359.10,Tamil Nadu
9990,OD9991,Alan,Bakery,Biscuits,Kanyakumari,07-12-2015,West,1195,0.26,71.70,Tamil Nadu
9991,OD9992,Ravi,Food Grains,Rice,Bodi,06-06-2017,West,1567,0.16,501.44,Tamil Nadu
9992,OD9993,Peer,Oil & Masala,Spices,Pudukottai,10/16/2018,West,1659,0.15,597.24,Tamil Nadu


In [14]:
#drop "State" column
df.drop("State", axis = 1, inplace = True)

In [16]:
#convert order dates to timestamp
df["Order Date"] = pd.to_datetime(df["Order Date"] )


In [18]:
df['Year'] = pd.DatetimeIndex(df["Order Date"]).year
df["Months"] =pd.DatetimeIndex(df["Order Date"]).month
df["Week"] =pd.DatetimeIndex(df["Order Date"]).week
df["Day"] =pd.DatetimeIndex(df["Order Date"]).day

  df["Week"] =pd.DatetimeIndex(df["Order Date"]).week


###### Test

In [19]:
df.sample(5)

Unnamed: 0,Order ID,Customer Name,Category,Sub Category,City,Order Date,Region,Sales,Discount,Profit,Year,Months,Week,Day
7383,OD7384,Verma,"Eggs, Meat & Fish",Eggs,Madurai,2015-03-03,East,2173,0.31,847.47,2015,3,10,3
8803,OD8804,Vidya,"Eggs, Meat & Fish",Eggs,Tenkasi,2017-12-10,West,1191,0.33,178.65,2017,12,49,10
7397,OD7398,Willams,Food Grains,Atta & Flour,Kanyakumari,2016-05-21,East,1442,0.15,158.62,2016,5,20,21
1476,OD1477,Vidya,Oil & Masala,Edible Oil & Ghee,Dharmapuri,2017-05-31,Central,2279,0.29,660.91,2017,5,22,31
3181,OD3182,Rumaiza,Snacks,Cookies,Nagercoil,2017-03-04,Central,1144,0.3,125.84,2017,3,9,4


### Export

In [20]:
df.to_csv("Sales_data_clean.xlsx")