
# Project : Sales Data Wrangling 

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#Gathering">Data Gathering</a></li>
<li><a href="#Assessing">Data Assessing</a></li>
<li><a href="#Cleaning">Data Cleaning</a></li>    
<li><a href="#Assessing2">Data Assessing After Cleaning</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#Storing"> Data Storing</a></li>    
</ul>

<a id='intro'></a>
## Introduction

### Dataset Description 

This dataset includes detailed sales information from multiple branches, with 16 columns and 1006 rows. The data contains details about invoices such as invoice number, branch where the sale occurred, customer type, product type, quantity sold, price per unit, taxes, total, payment method, and customer ratings. The goal of this data is to analyze sales, understand customer behavior, evaluate the financial performance of different branches, and explore factors affecting customer satisfaction.


In [1]:
# import  the libraries that you will use
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt 
%matplotlib inline
from datetime import time
# show all columns
pd.set_option('display.max_columns', None)

<a id='Gathering'></a>

## 1- Data Gathering

In [2]:
df = pd.read_csv("Capstone Data - Supermarket Sales.csv")
df.head(5)

Unnamed: 0,Invoice ID,Branch,Yangon,Naypyitaw,Mandalay,Customer type,Gender,Product line,Unit price,Quantity,Tax 5%,Total,Date,Time,Payment,Rating
0,750-67-8428,A,1,0,0,Normal,Male,Health and beauty,74.69,7,26.1415,,1/5/2019,13:08,Ewallet,9.1
1,226-31-3081,C,0,1,0,Normal,Male,Electronic accessories,15.28,5,3.82,80.22,3/8/2019,10:29,Cash,9.6
2,631-41-3108,A,1,0,0,Normal,Male,Home and lifestyle,46.33,7,16.2155,340.5255,3/3/2019,13:23,Credit card,7.4
3,123-19-1176,A,1,0,0,Normal,Male,Health and beauty,58.22,8,,489.048,1/27/2019,8 - 30 PM,Ewallet,8.4
4,373-73-7910,A,1,0,0,Normal,Male,Sports and travel,86.31,7,30.2085,634.3785,2/8/2019,10:37,Ewallet,5.3


<a id='Assessing'></a>
## 2- Data Assessing

### 2-1- Tidiness issues

In [3]:
df.sample(5)

Unnamed: 0,Invoice ID,Branch,Yangon,Naypyitaw,Mandalay,Customer type,Gender,Product line,Unit price,Quantity,Tax 5%,Total,Date,Time,Payment,Rating
391,173-82-9529,B,0,0,1,Normal,Female,Fashion accessories,37.95,10,18.975,398.475,1/26/2019,14:51,Cash,9.7
620,420-97-3340,A,1,0,0,Normal,Female,Food and beverages,71.68,3,10.752,225.792,3/28/2019,15:30,Credit card,9.2
888,137-74-8729,C,0,1,0,Normal,Female,Fashion accessories,12.19,8,4.876,102.396,3/13/2019,12:47,Ewallet,6.8
273,633-91-1052,A,1,0,0,Normal,Female,Home and lifestyle,12.03,2,1.203,25.263,1/27/2019,15:51,Cash,5.1
263,447-15-7839,A,1,0,0,Member,Female,Sports and travel,22.24,10,11.12,233.52,2/9/2019,11:00,Cash,4.2


In [4]:
df.shape

(1006, 16)

#### Each variable forms a column and contains values
- 'Yangon', 'Naypyitaw', 'Mandalay' convertir les colonnes


#### Each observation forms a row
- nothing

#### Each type of observational unit forms a table
- nothing

### 2-2- Quality issues

In [5]:
df.sample(10)

Unnamed: 0,Invoice ID,Branch,Yangon,Naypyitaw,Mandalay,Customer type,Gender,Product line,Unit price,Quantity,Tax 5%,Total,Date,Time,Payment,Rating
853,866-70-2814,B,0,0,1,Normal,Female,Electronic accessories,52.79,10,26.395,554.295,2/25/2019,11:58,Ewallet,10.0
893,715-20-1673,B,0,0,1,Normal,Male,Electronic accessories,28.38,5,7.095,148.995,3/6/2019,20:57,Cash,9.4
102,551-21-3069,C,0,1,0,Normal,Female,Electronic accessories,23.07,9,10.3815,218.0115,2/1/2019,11:27,Cash,4.9
51,162-48-8011,A,1,0,0,-,Female,Food and beverages,44.59,5,11.1475,234.0975,2/10/2019,15:10,Cash,8.5
864,124-31-1458,A,1,0,0,Member,Female,Electronic accessories,79.59,3,11.9385,250.7085,1/8/2019,14:30,Cash,6.6
424,489-64-4354,C,0,1,0,Normal,Male,Fashion accessories,16.28,1,0.814,17.094,3/9/2019,15:36,Cash,5.0
766,801-88-0346,C,0,1,0,Normal,Female,Fashion accessories,76.06,3,11.409,239.589,1/5/2019,20:30,Credit card,9.8
43,228-96-1411,C,0,1,0,Normal,Male,Food and beverages,98.7,8,39.48,829.08,3/4/2019,20:39,Cash,7.6
221,468-01-2051,B,0,0,1,Normal,Male,Food and beverages,62.08,7,21.728,456.288,3/6/2019,13:46,Ewallet,5.4
225,746-68-6593,C,0,1,0,Member,Female,Sports and travel,87.16,2,8.716,183.036,1/11/2019,14:29,Credit card,9.7


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1006 entries, 0 to 1005
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Invoice ID     1006 non-null   object 
 1   Branch         1006 non-null   object 
 2   Yangon         1006 non-null   int64  
 3   Naypyitaw      1006 non-null   int64  
 4   Mandalay       1006 non-null   int64  
 5   Customer type  1006 non-null   object 
 6   Gender         1006 non-null   object 
 7   Product line   1006 non-null   object 
 8   Unit price     1006 non-null   object 
 9   Quantity       1006 non-null   int64  
 10  Tax 5%         997 non-null    float64
 11  Total          1003 non-null   float64
 12  Date           1006 non-null   object 
 13  Time           1006 non-null   object 
 14  Payment        1006 non-null   object 
 15  Rating         1006 non-null   float64
dtypes: float64(3), int64(4), object(9)
memory usage: 125.9+ KB


In [7]:
df.describe()

Unnamed: 0,Yangon,Naypyitaw,Mandalay,Quantity,Tax 5%,Total,Rating
count,1006.0,1006.0,1006.0,1006.0,997.0,1003.0,1006.0
mean,0.338966,0.329026,0.332008,5.469185,15.479682,322.734689,7.056163
std,0.473594,0.470093,0.471168,3.014153,11.72832,245.865964,3.318751
min,0.0,0.0,0.0,-8.0,0.5085,10.6785,4.0
25%,0.0,0.0,0.0,3.0,5.9865,123.78975,5.5
50%,0.0,0.0,0.0,5.0,12.2275,254.016,7.0
75%,1.0,1.0,1.0,8.0,22.7205,471.009,8.5
max,1.0,1.0,1.0,10.0,49.65,1042.65,97.0


In [8]:
df.duplicated().sum()

6

In [9]:
df.isnull().sum()

Invoice ID       0
Branch           0
Yangon           0
Naypyitaw        0
Mandalay         0
Customer type    0
Gender           0
Product line     0
Unit price       0
Quantity         0
Tax 5%           9
Total            3
Date             0
Time             0
Payment          0
Rating           0
dtype: int64

In [10]:
df['Customer type'].value_counts()

Normal     515
Member     463
-           27
Memberr      1
Name: Customer type, dtype: int64

In [11]:
df['Quantity'].value_counts()

 10    120
 1     111
 4     109
 7     102
 5     102
 6      98
 9      94
 2      92
 3      91
 8      83
-8       2
-1       1
-7       1
Name: Quantity, dtype: int64

In [12]:
negative_values = df[df['Quantity'] < 0]
negative_values

Unnamed: 0,Invoice ID,Branch,Yangon,Naypyitaw,Mandalay,Customer type,Gender,Product line,Unit price,Quantity,Tax 5%,Total,Date,Time,Payment,Rating
629,308-39-1707,A,1,0,0,Normal,Female,Fashion accessories,12.09 USD,-1,,12.6945,1/26/2019,18:19,Credit card,8.2
830,237-44-6163,A,1,0,0,Normal,Male,Electronic accessories,10.56 USD,-8,,88.704,1/24/2019,17:43,Cash,7.6
881,115-38-7388,C,0,1,0,Member,Female,Fashion accessories,10.18 USD,-8,,85.512,3/30/2019,12:51,Credit card,9.5
903,865-41-9075,A,1,0,0,Normal,Male,Food and beverages,11.53 USD,-7,,84.7455,1/28/2019,17:35,Cash,8.1


In [13]:
df['Rating'].value_counts()

6.0     26
6.6     24
4.2     23
9.5     22
5.1     21
        ..
4.0     11
5.3     11
4.6      9
10.0     5
97.0     1
Name: Rating, Length: 62, dtype: int64

#### 1- Completeness :
- Remove USD from unit price columns
- Remove (pm)and (-) from Time columns
- Replace the empty values in the "Total" column by multiplying "Quantity" by "Unit Price" and adding "5% Tax"
- Replace empty values in the "5% Tax" column
- Replace (-) with the most common value.

#### 2- Validity :
- Change column names  => 'Invoice_ID' , 'Customer_type' , 'Product_line' , 'Unit_price' , 'Tax_5%' 
- Replace column type Unit price from object to int
- Replace column Date from int to date



#### 3- accuracy :
- Replacing rating (97.0) with the valid value (9.7).
- Replace negative values

#### 4- Consistency :
- remove duplicates

<a id='Cleaning'></a>
## 3- Data Cleaning

In [14]:
#copy from the dataframe
df_clean = df.copy()
df_clean.head()

Unnamed: 0,Invoice ID,Branch,Yangon,Naypyitaw,Mandalay,Customer type,Gender,Product line,Unit price,Quantity,Tax 5%,Total,Date,Time,Payment,Rating
0,750-67-8428,A,1,0,0,Normal,Male,Health and beauty,74.69,7,26.1415,,1/5/2019,13:08,Ewallet,9.1
1,226-31-3081,C,0,1,0,Normal,Male,Electronic accessories,15.28,5,3.82,80.22,3/8/2019,10:29,Cash,9.6
2,631-41-3108,A,1,0,0,Normal,Male,Home and lifestyle,46.33,7,16.2155,340.5255,3/3/2019,13:23,Credit card,7.4
3,123-19-1176,A,1,0,0,Normal,Male,Health and beauty,58.22,8,,489.048,1/27/2019,8 - 30 PM,Ewallet,8.4
4,373-73-7910,A,1,0,0,Normal,Male,Sports and travel,86.31,7,30.2085,634.3785,2/8/2019,10:37,Ewallet,5.3


### 3-1 Fixing Tidiness issues

#### A- Define :
- 'Yangon', 'Naypyitaw', 'Mandalay' convertir les colonnes

#### B- Code :

In [15]:
# Reshape DataFrame from wide to long format for sales by city  
df_clean = pd.melt(df_clean , 
                    id_vars=['Invoice ID', 'Branch', 'Customer type', 'Gender', 'Product line', 'Unit price', 'Quantity', 'Tax 5%', 'Total', 'Date', 'Time', 'Payment', 'Rating'], 
                    value_vars=['Yangon', 'Naypyitaw', 'Mandalay'], 
                    var_name='City',
                    value_name='Sales')  
#  Filter the "df" to keep only the rows where the "Sales" column is = 1
df_clean = df_clean [df_clean ['Sales'] == 1]

#### C-Test :

In [16]:
df_clean.sample(5)

Unnamed: 0,Invoice ID,Branch,Customer type,Gender,Product line,Unit price,Quantity,Tax 5%,Total,Date,Time,Payment,Rating,City,Sales
2875,533-66-5566,B,Normal,Female,Home and lifestyle,51.07,7,17.8745,375.3645,1/12/2019,11:42,Cash,7.0,Mandalay,1
2522,347-72-6115,B,Member,Female,Sports and travel,90.74,7,31.759,666.939,1/16/2019,18:03,Credit card,6.2,Mandalay,1
2787,120-54-2248,B,Normal,Female,Food and beverages,28.86,5,7.215,151.515,1/22/2019,18:08,Credit card,8.0,Mandalay,1
479,575-67-1508,A,Normal,Male,Electronic accessories,38.6,1,1.93,40.53,1/29/2019,11:26,Ewallet,6.7,Yangon,1
1910,545-07-8534,C,Normal,Female,Health and beauty,58.32,2,5.832,122.472,2/14/2019,12:42,Ewallet,6.0,Naypyitaw,1


In [17]:
df_clean['City'].value_counts()

Yangon       341
Mandalay     334
Naypyitaw    331
Name: City, dtype: int64

In [18]:
# Remove columns sales
df_clean= df_clean.drop(columns=['Sales'])

### 3-2 Fixing Quality Issues

#### A- Define :
- Remove duplicates

#### B- Code :

In [19]:
df_clean = df_clean.drop_duplicates()

#### C-Test  :

In [20]:
df_clean.duplicated().sum()

0

#### A- Define :
- Change column names

#### B- Code :

In [21]:
df_clean = df_clean.rename(columns={'Invoice ID': 'Invoice_ID', 'Customer type': 'Customer_type', 'Product line': 'Product_line', 'Unit price': 'Unit_price', 'Tax 5%': 'Tax_5%'})

#### C-Test :

In [22]:
df_clean.head(5)

Unnamed: 0,Invoice_ID,Branch,Customer_type,Gender,Product_line,Unit_price,Quantity,Tax_5%,Total,Date,Time,Payment,Rating,City
0,750-67-8428,A,Normal,Male,Health and beauty,74.69,7,26.1415,,1/5/2019,13:08,Ewallet,9.1,Yangon
2,631-41-3108,A,Normal,Male,Home and lifestyle,46.33,7,16.2155,340.5255,3/3/2019,13:23,Credit card,7.4,Yangon
3,123-19-1176,A,Normal,Male,Health and beauty,58.22,8,,489.048,1/27/2019,8 - 30 PM,Ewallet,8.4,Yangon
4,373-73-7910,A,Normal,Male,Sports and travel,86.31,7,30.2085,634.3785,2/8/2019,10:37,Ewallet,5.3,Yangon
6,355-53-5943,A,Normal,Male,Electronic accessories,68.84,6,20.652,433.692,2/25/2019,14:36,Ewallet,5.8,Yangon


#### A- Define :
- Remove USD 'from unit' price columns

#### B- Code :

In [23]:
# count USD
contains_usd = df_clean['Unit_price'].str.contains('USD', regex=False).sum()
contains_usd 

5

In [24]:
# Remove USD
df_clean['Unit_price'] = df_clean['Unit_price'].astype(str).str.replace('USD', '', regex=False).str.strip()

#### C-Test :

In [25]:
# Check if there are any values containing "USD"  
contains_usd = df_clean['Unit_price'].str.contains('USD').any()  
if contains_usd:  
    print("There is still USD")  
else:  
    print("There is no USD")

There is no USD


#### A- Define :
- Replace column type Unit price from object to int

#### B- Code :

In [26]:
# Replace type
df_clean['Unit_price'] = pd.to_numeric(df_clean['Unit_price'])

#### C-Test :

In [27]:
df_clean['Unit_price'].info()

<class 'pandas.core.series.Series'>
Int64Index: 1000 entries, 0 to 3008
Series name: Unit_price
Non-Null Count  Dtype  
--------------  -----  
1000 non-null   float64
dtypes: float64(1)
memory usage: 15.6 KB


#### A- Define :
- Replace column Date from int to datetime

#### B- Code :

In [28]:
# Replace type
df_clean['Date'] = pd.to_datetime(df_clean['Date'])

#### C-Test :

In [29]:
df_clean['Date'].info()

<class 'pandas.core.series.Series'>
Int64Index: 1000 entries, 0 to 3008
Series name: Date
Non-Null Count  Dtype         
--------------  -----         
1000 non-null   datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 15.6 KB


#### A- Define :
- Remove (pm)and (-) from Time columns

#### B- Code :

In [30]:
# Replace --
df_clean['Time'] = df_clean['Time'].replace({' - ': ':'}, regex=True) 

In [31]:
# Remove Pm
df_clean['Time'] = df_clean['Time'].str.replace('PM', '', regex=False).str.replace('AM', '', regex=False).str.strip()

#### C-Test :

In [32]:
# Count - 
contains_dash = df_clean['Time'].str.contains('-').any()  
contains_dash.sum()

0

In [33]:
# count Pm
contains_PM = df_clean['Time'].str.contains('PM').any()  
contains_PM.sum()

0

#### A- Define :
- Replace null values in the "Total" column 


#### B- Code :

In [34]:
# Replace null 
df_clean['Total'] = df_clean['Total'].fillna(df_clean['Quantity'] * df_clean['Unit_price'] + df_clean['Tax_5%'])

#### C-Test :

In [35]:
df_clean['Total'].isnull().sum()

0

#### A- Define :
- Replace null values in the "5% Tax" column

#### B- Code :

In [36]:
# Replace null 
df_clean['Tax_5%'] = df_clean['Tax_5%'].fillna(df_clean['Unit_price'] * df_clean['Quantity'] * 0.05)

#### C-Test :

In [37]:
df_clean['Tax_5%'].isnull().sum()

0

#### A- Define :
- Replace (-) with the most common value.

#### B- Code :

In [38]:
# Replace (-)
df_clean['Customer_type'] = df_clean['Customer_type'].replace('-', np.nan)

In [39]:
# Most repeated
most = df_clean['Customer_type'].mode()[0]
most

'Normal'

In [40]:
df_clean['Customer_type'] = df_clean['Customer_type'].fillna(most)

In [41]:
# Replace Memberr
df_clean['Customer_type'] = df_clean['Customer_type'].replace('Memberr', 'Member')

#### C-Test :

In [42]:
df_clean['Customer_type'].value_counts()

Normal    540
Member    460
Name: Customer_type, dtype: int64

#### A- Define :
- Replace negative values

#### B- Code :

In [43]:
df_clean['Quantity'] = df_clean['Quantity'].abs()

#### C-Test :

In [44]:
# count negative
negative_values2 = df_clean['Quantity'] < 0
negative_values2.sum()

0

#### A- Define :
- Replacing rating (97.0) with the valid value (9.7).

#### B- Code :

In [48]:
df_clean.loc[df_clean['Rating'] == 97.0, 'Rating'] = 9.7

#### C-Test :

In [49]:
df_clean['Rating'].value_counts()

6.0     26
6.6     24
4.2     22
9.5     22
5.0     21
        ..
8.3     11
5.3     11
4.0     11
4.6      8
10.0     5
Name: Rating, Length: 61, dtype: int64

In [50]:
# Profit margin (25%)
profit_margin = 0.25 

# Cost of Goods Sold
df_clean['COGS'] = df_clean['Total'] / (1 + profit_margin)

In [51]:
# Calculate Profit margin 
df_clean['Profit'] = df_clean['Total'] - df_clean['COGS'].round(1)

In [52]:
df_clean.head()

Unnamed: 0,Invoice_ID,Branch,Customer_type,Gender,Product_line,Unit_price,Quantity,Tax_5%,Total,Date,Time,Payment,Rating,City,COGS,Profit
0,750-67-8428,A,Normal,Male,Health and beauty,74.69,7,26.1415,548.9715,2019-01-05,13:08,Ewallet,9.1,Yangon,439.1772,109.7715
2,631-41-3108,A,Normal,Male,Home and lifestyle,46.33,7,16.2155,340.5255,2019-03-03,13:23,Credit card,7.4,Yangon,272.4204,68.1255
3,123-19-1176,A,Normal,Male,Health and beauty,58.22,8,23.288,489.048,2019-01-27,8:30,Ewallet,8.4,Yangon,391.2384,97.848
4,373-73-7910,A,Normal,Male,Sports and travel,86.31,7,30.2085,634.3785,2019-02-08,10:37,Ewallet,5.3,Yangon,507.5028,126.8785
6,355-53-5943,A,Normal,Male,Electronic accessories,68.84,6,20.652,433.692,2019-02-25,14:36,Ewallet,5.8,Yangon,346.9536,86.692


<a id='Assessing2'></a>
## 4- Data Assessing After Cleaning

In [53]:
df_clean.sample(10)

Unnamed: 0,Invoice_ID,Branch,Customer_type,Gender,Product_line,Unit_price,Quantity,Tax_5%,Total,Date,Time,Payment,Rating,City,COGS,Profit
2057,132-32-9879,B,Normal,Male,Electronic accessories,93.96,4,18.792,394.632,2019-03-09,18:00,Cash,9.5,Mandalay,315.7056,78.932
1980,744-82-9138,C,Normal,Male,Fashion accessories,86.13,2,8.613,180.873,2019-02-07,17:59,Cash,8.2,Naypyitaw,144.6984,36.173
1149,488-25-4221,C,Member,Female,Food and beverages,30.41,1,1.5205,31.9305,2019-02-22,10:36,Credit card,8.4,Naypyitaw,25.5444,6.4305
1508,620-02-2046,C,Normal,Male,Home and lifestyle,69.4,2,6.94,145.74,2019-01-27,19:48,Ewallet,9.0,Naypyitaw,116.592,29.14
546,647-50-1224,A,Normal,Female,Fashion accessories,29.42,10,14.71,308.91,2019-01-12,16:23,Ewallet,8.9,Yangon,247.128,61.81
2532,734-91-1155,B,Normal,Female,Electronic accessories,45.71,3,6.8565,143.9865,2019-03-26,10:34,Credit card,7.7,Mandalay,115.1892,28.7865
1321,174-36-3675,C,Member,Male,Food and beverages,99.37,2,9.937,208.677,2019-02-14,17:29,Cash,5.2,Naypyitaw,166.9416,41.777
1121,225-98-1496,C,Normal,Female,Fashion accessories,27.02,3,4.053,85.113,2019-03-02,13:01,Credit card,7.1,Naypyitaw,68.0904,17.013
1344,633-09-3463,C,Normal,Female,Electronic accessories,47.65,3,7.1475,150.0975,2019-03-28,12:58,Credit card,9.5,Naypyitaw,120.078,29.9975
1302,727-75-6477,C,Normal,Male,Electronic accessories,28.84,4,5.768,121.128,2019-03-29,14:44,Cash,6.4,Naypyitaw,96.9024,24.228


In [54]:
df_clean.shape

(1000, 16)

In [55]:
df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 3008
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Invoice_ID     1000 non-null   object        
 1   Branch         1000 non-null   object        
 2   Customer_type  1000 non-null   object        
 3   Gender         1000 non-null   object        
 4   Product_line   1000 non-null   object        
 5   Unit_price     1000 non-null   float64       
 6   Quantity       1000 non-null   int64         
 7   Tax_5%         1000 non-null   float64       
 8   Total          1000 non-null   float64       
 9   Date           1000 non-null   datetime64[ns]
 10  Time           1000 non-null   object        
 11  Payment        1000 non-null   object        
 12  Rating         1000 non-null   float64       
 13  City           1000 non-null   object        
 14  COGS           1000 non-null   float64       
 15  Profit         1000 n

In [56]:
df_clean.describe()

Unnamed: 0,Unit_price,Quantity,Tax_5%,Total,Rating,COGS,Profit
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,55.67213,5.51,15.353497,322.966749,6.9727,258.373399,64.593849
std,26.494628,2.923431,11.742764,245.885335,1.71858,196.708268,49.175056
min,10.08,1.0,-4.224,10.6785,4.0,8.5428,2.1785
25%,32.875,3.0,5.924875,124.422375,5.5,99.5379,24.922375
50%,55.23,5.0,12.088,253.848,7.0,203.0784,50.798
75%,77.935,8.0,22.44525,471.35025,8.5,377.0802,94.30025
max,99.96,10.0,49.65,1042.65,10.0,834.12,208.55


In [57]:
df_clean.isna().sum()

Invoice_ID       0
Branch           0
Customer_type    0
Gender           0
Product_line     0
Unit_price       0
Quantity         0
Tax_5%           0
Total            0
Date             0
Time             0
Payment          0
Rating           0
City             0
COGS             0
Profit           0
dtype: int64

In [58]:
df_clean.duplicated().sum()

0

<a id='Storing'></a>
## 5- Data Storing

In [59]:
df_clean.to_csv('clean_Supermarket_Sales_data.csv')