## Super Market Retail Analysis

### Dataset

The dataset is a fictional dataset created for helping data analysts practice exploratory data analysis and data visualization. The dataset has data on orders placed by customers on a grocery delivery application and includes details like the order ID, customer name, category, discount and profit made. The dataset is designed with an assumption that the orders are placed by customers living in the state of Tamil Nadu, India.


The analysis and visualization will be done in Excel and Tableau but some rudimentary operations to clean the dataset will be done here in with python

### Importing necessary Packages

In [1]:
#Import Pandas to be able to perform the cleaning function
import pandas as pd 

### Data Gathering

In [2]:
#import dataset already saved in local machine 
retail_sales = pd.read_csv(r"C:\Users\FORTUNE\Documents\Datasets\Supermart Grocery Sales - Retail Analytics Dataset.csv")

In [3]:
#examine the basic structure and components of the data
print(retail_sales.head(10))
print (retail_sales.info())
print (retail_sales.describe())

  Order ID Customer Name          Category      Sub Category            City  \
0      OD1        Harish      Oil & Masala           Masalas         Vellore   
1      OD2         Sudha         Beverages     Health Drinks     Krishnagiri   
2      OD3       Hussain       Food Grains      Atta & Flour      Perambalur   
3      OD4       Jackson  Fruits & Veggies  Fresh Vegetables      Dharmapuri   
4      OD5       Ridhesh       Food Grains   Organic Staples            Ooty   
5      OD6        Adavan       Food Grains   Organic Staples      Dharmapuri   
6      OD7         Jonas  Fruits & Veggies  Fresh Vegetables          Trichy   
7      OD8         Hafiz  Fruits & Veggies      Fresh Fruits  Ramanadhapuram   
8      OD9         Hafiz            Bakery          Biscuits     Tirunelveli   
9     OD10      Krithika            Bakery             Cakes         Chennai   

   Order Date Region  Sales  Discount  Profit       State  
0  11-08-2017  North   1254      0.12  401.28  Tamil Nadu  

### Data Cleaning

#### Tidyness issues
- "State" Column unnecessary.
- Data in "Order Date" cannot be used to analyze trends.

###### Define
- Remove State Column.
- Convert data in "Order Dates" to datetime.
- Separate data in "Order Dates" into different columns

###### Code

In [4]:
#copy dataset before cleaning
df = retail_sales.copy()

In [5]:
#drop "State" column
df.drop("State", axis = 1, inplace = True)

In [6]:
#convert order dates to timestamp
df["Order Date"] = pd.to_datetime(df["Order Date"] )

In [7]:
#separate year, month and weekday to help in analysis
df['Year'] = pd.DatetimeIndex(df["Order Date"]).year
df["Months"] =pd.DatetimeIndex(df["Order Date"]).month
df["Weekday"] =pd.DatetimeIndex(df["Order Date"]).weekday

In [8]:
#replace numbers with text to aid visualization
df.replace({"Months": { 1: "January", 2: "February", 3: "March", 4: "April", 5: "May", 6: "June", 7: "July", 8: "August", 9: "September", 10: "October", 11: "November" , 12 : "December"}}, inplace = True)
df.replace({"Weekday": { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday", 5: "Saturday", 6: "Sunday"}}, inplace = True)

###### Test

In [9]:
df.sample(5)

Unnamed: 0,Order ID,Customer Name,Category,Sub Category,City,Order Date,Region,Sales,Discount,Profit,Year,Months,Weekday
1458,OD1459,Amrish,Snacks,Noodles,Virudhunagar,2016-01-19,Central,2000,0.34,460.0,2016,January,Tuesday
985,OD986,Alan,"Eggs, Meat & Fish",Fish,Perambalur,2018-09-29,Central,2444,0.33,415.48,2018,September,Saturday
2840,OD2841,Sharon,Food Grains,Organic Staples,Pudukottai,2015-03-05,East,2433,0.24,608.25,2015,March,Thursday
1071,OD1072,Roshan,"Eggs, Meat & Fish",Mutton,Dindigul,2018-11-30,East,1656,0.16,629.28,2018,November,Friday
7539,OD7540,Vince,Food Grains,Rice,Perambalur,2015-04-29,South,1705,0.23,409.2,2015,April,Wednesday


### Export

In [106]:
df.to_csv("Sales_data_clean.csv")