<h2><strong>Load Data in MySQL<strong></h2>
<p><span style="color: #808080;"> We need to import necessary libraries for performing loading, connecting with SQL and doing analysis</span></p>

<h3><span style="color: #808080;">Import Necessary Library</span></h3>
<p data-sourcepos="3:1-3:41"><span style="color: #ffcc00;">Code Explanation:</span></p>
<ul>
<li ><span style="color: #808080;">mysql.connector: Library offers connectivity to MySQL server to query from database</span></li>
<li ><span style="color: #808080;">numpy (np): Provides efficient numerical computation tools.</span></li>
<li ><span style="color: #808080;">pandas (pd): Offers data manipulation and analysis structures (DataFrames, Series).</span></li>
<li ><span style="color: #808080;">warnings (with warnings.filterwarnings("ignore")): Suppresses warnings.</span></li>
</ul>

In [17]:
import mysql.connector
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

<h3><span style="color: #808080;">Establish a connection to the euro_mart database using MySQL Connector</span></h3>
<p data-sourcepos="3:1-3:41"><span style="color: #ffcc00;">Code Explanation:</span></p>
<ul>
<li ><span style="color: #808080;">Initialize a connection to a MySQL database named euro_mart on the local machine (localhost).</span></li>
<li ><span style="color: #808080;">Uses credentials with username and password</span></li>

In [18]:
Connection = mysql.connector.connect(
  host="localhost",               #hostname
  user="root",                    #the user who has privilege to the db
  passwd="root",                  #password for user
  database="euro_mart",           #database name
)

<h2><strong>Load Data (Jupyter Notebook or VS Code)<strong></h2>
<p><span style="color: #808080;"> The data is loaded to Jupyter Notebook from MySQL, this actively removes all the null values in rows by default, which is make data one step closer for it to be cleaned</span></p>

<h3><span style="color: #808080;">Retrieve data from a MySQL database table and load it into a pandas DataFrame for further analysis in Jupyter Notebook</span></h3>
<p data-sourcepos="3:1-3:41"><span style="color: #ffcc00;">Code Explanation:</span></p>
<ul>
<li ><span style="color: #808080;">query1 = "Use euro_mart; establishes connection with correct database</span></li>
<li ><span style="color: #808080;">query = "select * from euromart_table; selects all columns (*) from the table named euromart_table</span></li>
<li ><span style="color: #808080;">df = pd.read_sql(query, Connection); execute the defined query (query) on the established connection (Connection). The result is stored in a pandas DataFrame named df</span></li>

In [19]:
query1 = "Use euro_mart;" # establish connection with correct database
query = "select * from euromart_table;" #select all columsn in database/table
df = pd.read_sql(query,Connection) #Execute query and store in dataframe df

# <a id='toc3_'></a>[Familiarize with Data & Identifying the Target Variable ](#toc0_)

<h2><strong>Explore the provided data (column names, data types)<strong></h2>
<p><span style="color: #808080;"> We need to understand the data before cleaning the data and also cross verify if all the required data are provided by Euromart</span></p>

<h3><span style="color: #808080;">Overview of data</span></h3>
<p data-sourcepos="3:1-3:41"><span style="color: #ffcc00;">Code Explanation:</span></p>
<ul>
<li ><span style="color: #808080;">df.head(); Let's see the data by displaying the first 5 rows</span></li>
<li ><span style="color: #808080;">df.tail(); Let's see the last 5 rows</span></li>
<li ><span style="color: #808080;">df.shape is used to get the dimensions (number of rows and columns) of data</span></li>
<li ><span style="color: #808080;">df.size is used to get the total number of elements in a pandas</span></li>
<li ><span style="color: #808080;">df.info() - used to display concise information about</span></li>
</ul>

<p data-sourcepos="3:1-3:41"><strong><span style="color: #99cc00;">Interpretation:</span></strong></p>
<ul>
<li ><span style="color: #808080;">Structured Data: Data provided is in table format</span></li>
<li ><span style="color: #808080;">Dimensions (17 Columns x 8047 Rows, and has 1,36,799 elements in it) of the DataFrame or data</span></li>
<li ><span style="color: #808080;">Column Data Types: Observed mix of data type of each column (e.g., object, float, int, etc.)</span></li>
<li ><span style="color: #808080;">Also note that all categorical/qualitative variables are Nominal in nature (it has no specific orders)</span></li>
<li ><span style="color: #808080;">Non-Null Counts (no Null values observed) in each column.</span></li>
<li ><span style="color: #808080;">Memory Usage: An estimate of the memory usage is 1.0+ MB</span></li>
</ul>

In [20]:
df.head()

Unnamed: 0,Order ID,Order Date,Customer Name,Country,State,City,Region,Segment,Ship Mode,Category,Sub-Category,Product Name,Discount,Sales,Profit,Quantity,Feedback?
0,BN-2011-7407039,1/1/2011,Ruby Patel,Sweden,Stockholm,Stockholm,North,Home Office,Economy Plus,Office Supplies,Paper,"Enermax Note Cards, Premium",0.5,45,-26,3,False
1,AZ-2011-9050313,1/3/2011,Summer Hayward,United Kingdom,England,Southport,North,Consumer,Economy,Furniture,Bookcases,"Dania Corner Shelving, Traditional",0.0,854,290,7,True
2,AZ-2011-6674300,1/4/2011,Devin Huddleston,France,Auvergne-Rhône-Alpes,Valence,Central,Consumer,Economy,Office Supplies,Art,"Binney & Smith Sketch Pad, Easy-Erase",0.0,140,21,3,True
3,BN-2011-2819714,1/4/2011,Mary Parker,United Kingdom,England,Birmingham,North,Corporate,Economy,Office Supplies,Art,"Boston Markers, Easy-Erase",0.5,27,-22,2,True
4,BN-2011-2819714,1/4/2011,Mary Parker,United Kingdom,England,Birmingham,North,Corporate,Economy,Office Supplies,Storage,"Eldon Folders, Single Width",0.5,17,-1,2,True


In [21]:
df.shape

(8047, 17)

In [22]:
df.size

136799

In [23]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8047 entries, 0 to 8046
Data columns (total 17 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Order ID       8047 non-null   object 
 1   Order Date     8047 non-null   object 
 2   Customer Name  8047 non-null   object 
 3   Country        8047 non-null   object 
 4   State          8047 non-null   object 
 5   City           8047 non-null   object 
 6   Region         8047 non-null   object 
 7   Segment        8047 non-null   object 
 8   Ship Mode      8047 non-null   object 
 9   Category       8047 non-null   object 
 10  Sub-Category   8047 non-null   object 
 11  Product Name   8047 non-null   object 
 12  Discount       8047 non-null   float64
 13  Sales          8047 non-null   int64  
 14  Profit         8047 non-null   int64  
 15  Quantity       8047 non-null   int64  
 16  Feedback?      8047 non-null   object 
dtypes: float64(1), int64(3), object(13)
memory usage: 1.

<h2><strong>Identify target variable based on objective<strong></h2>
<p><span style="color: #808080;">Once we are clear with data, for further analysis we have to fix target variable and feature variables for performing EDA</span></p>

<p data-sourcepos="3:1-3:41"><strong><span style="color: #99cc00;">Fix Target value:</span></strong></p>
<ul>
<li ><span style="color: #808080;">From the data and from problem statement we understand that<strong> target variables are Profit and Sales</strong></span></li>
<li ><span style="color: #808080;">Feature variables are remaining columns like Country, State, City, Region, etc..,</span></li>
<li ><span style="color: #808080;">Now we are clear to perform data cleaning and EDA analaysis</span></li>
</ul>

# <a id='toc4_'></a>[Data Preparation & Transformation ](#toc0_)

<h2><strong>Data Cleaning</strong></h2>
<p><span style="color: #808080;"> We need to perform steps mentioned below to clean data</span></p>
<ul>
<li ><strong><span style="color: #808080;">Steps involved in handling missing values (imputation, deletion)</strong></span></li>
    <ul>
        <li ><span style="color: #808080;">We accept missing values if data is small in dimension</span></li>
        <li ><span style="color: #808080;">We delete missing values if:</span></li>
    <ul>
            <li ><span style="color: #808080;">When more than 80% of data is missing/null values</span></li>
            <li ><span style="color: #808080;">When the percentage of missing values are very small, deleting will have minimal effect on analysis</span></li>
    </ul>
        <li ><span style="color: #808080;">Replacing the missing values by imputation</span></li>
    <ul>
            <li ><span style="color: #808080;">Imputation: We replace the missing values by Mean, Median or Mode of the variable or perform fill null values(fillna method) with the desired value</span></li>
    </ul>
    </ul>
        
<li ><strong><span style="color: #808080;">Data Reduction: Remove unwanted data (if present) which are not required for analysis</strong></span></li>
    <ul>
            <li ><span style="color: #808080;">Delete unwanted columns</span></li>
            <li ><span style="color: #808080;">Delete duplicate rows</span></li>
    </ul>
<li ><strong><span style="color: #808080;">Format data types (numerical & categorical variables)</strong></span></li>
<li ><strong><span style="color: #808080;">Outlier detection and handling (we ignore this step because outliers are valid in our case)</strong></span></li>
    <ul>
            <li ><span style="color: #808080;">When data has extreme values that could effect our analysis, we either replace them with Mean or Median or Mode values or we accept the outliers</span></li>
                    <li ><span style="color: #808080;">We identify the outliers by plotting the Box plot</span></li>
    </ul>
</ul>

<h3><span style="color: #808080;">Handle missing values (imputation or deletion)</span></h3>
<p data-sourcepos="3:1-3:41"><span style="color: #ffcc00;">Code Explanation:</span></p>
<ul>
<li ><span style="color: #808080;">df.isnull().sum() - Gives sum of all null values in each column</span></li>
<li ><span style="color: #808080;">df.notnull().sum() - Gives sum of all not null unique values in each column</span></li>
</ul>

<p data-sourcepos="3:1-3:41"><strong><span style="color: #99cc00;">Interpretation:</span></strong></p>
<ul>
<li ><span style="color: #808080;">Data has no null values, so no need to perform process to handle missing values</span></li>
</ul>

In [24]:
df.isnull().sum()

Order ID         0
Order Date       0
Customer Name    0
Country          0
State            0
City             0
Region           0
Segment          0
Ship Mode        0
Category         0
Sub-Category     0
Product Name     0
Discount         0
Sales            0
Profit           0
Quantity         0
Feedback?        0
dtype: int64

In [25]:
df.notnull().sum()

Order ID         8047
Order Date       8047
Customer Name    8047
Country          8047
State            8047
City             8047
Region           8047
Segment          8047
Ship Mode        8047
Category         8047
Sub-Category     8047
Product Name     8047
Discount         8047
Sales            8047
Profit           8047
Quantity         8047
Feedback?        8047
dtype: int64

<h3><span style="color: #808080;">Data Reduction: Remove unwanted columns or rows</span></h3>
<li ><span style="color: #808080;">There are no unwanted columns to delete, so we can check for duplicated rows and delete the duplicates</span></li>
<p data-sourcepos="3:1-3:41"><span style="color: #ffcc00;">Code Explanation:</span></p>
<ul>
<li ><span style="color: #808080;">df.duplicated().sum(); This shows number of duplicated rows</span></li>
<li ><span style="color: #808080;">df = df.drop_duplicates(); This removes duplicated rows</span></li>
</ul>

<p data-sourcepos="3:1-3:41"><strong><span style="color: #99cc00;">Interpretation:</span></strong></p>
<ul>
<li ><span style="color: #808080;">It is noted that there are 2 rows which are repeated. We have removed duplicated rows</span></li>
</ul>

In [26]:
df.duplicated().sum()

2

In [27]:
df = df.drop_duplicates()

In [28]:
df.duplicated().sum()

0

<h3><span style="color: #808080;">Rename of columns</span></h3>
<p data-sourcepos="3:1-3:41"><span style="color: #ffcc00;">Code Explanation:</span></p>
<ul>
<li ><span style="color: #808080;">df.rename(columns={'Feedback?': 'Feedback'}, inplace=True) - This renames the column name, Inplace=True; this permanently alters the name </span></li>
<li ><span style="color: #808080;">df.columns; This display columns for cross verifying that renaming step is performed</span></li>
</ul>

<p data-sourcepos="3:1-3:41"><strong><span style="color: #99cc00;">Interpretation:</span></strong></p>
<ul>
<li ><span style="color: #808080;">Feedback column is renamed by removing special character in column</span></li>
</ul>

In [29]:
df.rename(columns={'Feedback?': 'Feedback'}, inplace=True)
df.columns

Index(['Order ID', 'Order Date', 'Customer Name', 'Country', 'State', 'City',
       'Region', 'Segment', 'Ship Mode', 'Category', 'Sub-Category',
       'Product Name', 'Discount', 'Sales', 'Profit', 'Quantity', 'Feedback'],
      dtype='object')

<h2><strong>Feature Engineering (Create new features/variables)</strong></h2>
<p><span style="color: #808080;"> We derive new variables or features by combining multiple columns or derive new features by performing calculation </span></p>
    
<p><span style="color: #808080;"> Here we need to create new columns for easier analysis</span></p>
<ul>
<li ><span style="color: #808080;">Create new columns by extracting date, month, year and generate new columns like Quarter and Weeks</span></li>
<li ><span style="color: #808080;">Create new columns by calculating Total sales, Total profit, Profit margin and discount percentage</span></li>
</ul>

<h3><span style="color: #808080;">Create new features</span></h3>
<p data-sourcepos="3:1-3:41"><span style="color: #ffcc00;">Code Explanation:</span></p>
<ul>
<li ><span style="color: #808080;">df['Year'] = df['Order Date'].dt.year; Extract and create a new column named Year form Order date column</span></li>

<li ><span style="color: #808080;">df['Month'] = df['Order Date'].dt.month; Extract and create a new column named Date form Order date column </span></li>

<li ><span style="color: #808080;">df['Day'] = df['Order Date'].dt.day; Extract and create a new column named Year Day Order date column</span></li>

<li ><span style="color: #808080;">quarter_dict = {1 : 'Q1', 2 : 'Q1', 3 : 'Q1',4 : 'Q2'......}; this defines a dictionary which maps month to corresponding Quarters</span></li>

<li ><span style="color: #808080;">df['Quarter'] = df['Month'].map(quarter_dict);  this uses the map method to apply the quarter_dict dictionary to the "Quarter" column</span></li>

<li ><span style="color: #808080;">week_dict = {1: 'W1', 2: 'W1', 3: 'W1'......};  this defines a dictionary which maps days to corresponding Week</span></li>

<li ><span style="color: #808080;">df['Week'] = None; Initialize 'Week' column with None values</span></li>

<li ><span style="color: #808080;">df['Week'] = df['Day'].map(week_dict); this uses the map method to apply the week_dict dictionary to the "Week" column</span></li>

<li ><span style="color: #808080;">df.info; gives list of all columns and its details for cross verifying on feature engineering</span></li>

<li ><span style="color: #808080;">df['Order Size'] = df.groupby('Order ID')['Product Name']; Create a new column based on number of times order ID is reapeated</span></li>
</ul>

<p data-sourcepos="3:1-3:41"><strong><span style="color: #99cc00;">Interpretation:</span></strong></p>
<ul>
<li ><span style="color: #808080;">We are set with feature engineering by creating multiple columns for easier analysis</span></li>

</ul>

In [30]:
# Convert the "Order Date" column to datetime data type
df['Order Date'] = pd.to_datetime(df['Order Date'])

# The year and month of the order date.
df['Year'] = df['Order Date'].dt.year
df['Month'] = df['Order Date'].dt.month
df['Day'] = df['Order Date'].dt.day


# Creating a new colum that has Quarters in the year
quarter_dict = {
    1 : 'Q1', 2 : 'Q1', 3 : 'Q1',
    4 : 'Q2', 5 : 'Q2', 6 : 'Q2',
    7 : 'Q3', 8 : 'Q3', 9 : 'Q3',
    10 : 'Q4', 11 : 'Q4', 12 : 'Q4',
}
df['Quarter'] = df['Month'].map(quarter_dict)

# Creating a new colum that has weeks in the year
week_dict = {
    1: 'W1', 2: 'W1', 3: 'W1', 4: 'W1', 5: 'W1', 6: 'W1', 7: 'W1',
    8: 'W2', 9: 'W2', 10: 'W2', 11: 'W2', 12: 'W2', 13: 'W2', 14: 'W2',
    15: 'W3', 16: 'W3', 17: 'W3', 18: 'W3', 19: 'W3', 20: 'W3', 21: 'W3',
    22: 'W4', 23: 'W4', 24: 'W4', 25: 'W4', 26: 'W4', 27: 'W4', 28: 'W4',
    29: 'W5', 30: 'W5', 31: 'W5',
}
df['Week'] = None  # Initialize 'Week' column with None values
df['Week'] = df['Day'].map(week_dict)


# The total revenue generated from sales.
df['Total Sales'] = df['Sales'] * df['Quantity']

# The total profit generated from sales.
df['Total Profit'] = df['Profit'] * df['Quantity']

# Creating a new column by calulating profit margin
df['Profit Margin'] = (df['Total Profit'] / df['Total Sales'])

# Creating a new colum that shows number of times the Order ID is repeated
df['Order Size'] = df.groupby('Order ID')['Product Name'].transform('size')

df.info();

<class 'pandas.core.frame.DataFrame'>
Index: 8045 entries, 0 to 8046
Data columns (total 26 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Order ID       8045 non-null   object        
 1   Order Date     8045 non-null   datetime64[ns]
 2   Customer Name  8045 non-null   object        
 3   Country        8045 non-null   object        
 4   State          8045 non-null   object        
 5   City           8045 non-null   object        
 6   Region         8045 non-null   object        
 7   Segment        8045 non-null   object        
 8   Ship Mode      8045 non-null   object        
 9   Category       8045 non-null   object        
 10  Sub-Category   8045 non-null   object        
 11  Product Name   8045 non-null   object        
 12  Discount       8045 non-null   float64       
 13  Sales          8045 non-null   int64         
 14  Profit         8045 non-null   int64         
 15  Quantity       8045 non-nu

<h2><strong>Overview of data before analysis</strong></h2>
<p><span style="color: #808080;">After Data Wrangling, we can check the columns once before we proceed to perform analysis</span></p>

In [31]:
df.columns

Index(['Order ID', 'Order Date', 'Customer Name', 'Country', 'State', 'City',
       'Region', 'Segment', 'Ship Mode', 'Category', 'Sub-Category',
       'Product Name', 'Discount', 'Sales', 'Profit', 'Quantity', 'Feedback',
       'Year', 'Month', 'Day', 'Quarter', 'Week', 'Total Sales',
       'Total Profit', 'Profit Margin', 'Order Size'],
      dtype='object')

<p data-sourcepos="3:1-3:41"><span style="color: #ffcc00;">Code Explanation:</span></p>
<ul>
<li ><span style="color: #808080;">df.columns; Display all columns in the data frame</span></li>
</ul>

<p data-sourcepos="3:1-3:41"><strong><span style="color: #99cc00;">Interpretation:</span></strong></p>
<ul>
<li ><span style="color: #808080;">Description of variables</span></li>
</ul>

<table style="height: 374px; width: 1000px;">
<tbody>
<tr style="height: 18px;">
<td style="width: 116.75px; height: 18px; padding-left: 30px;"><strong>Variables/Columns</strong></td>
<td style="width: 160.85px; height: 18px; padding-left: 30px;"><strong>Description</strong></td>
</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Order ID</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Unique identifier for each sales transaction</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Order Date</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Date and time of the purchase</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Customer Name</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em> Name of the customer</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Country, State, City, Region</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Location of sales</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Segment</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Types of customers</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Ship Mode</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em> Shipping method chosen by the customer</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Category & Sub-Category</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Purchased product category and more specific sub category</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Product Name</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Name of the specific product purchased</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Discount and Discount Percentage</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Discount applied to the purchase (if any)</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Sales</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Total sales amount for the transaction</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Profit</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Profit earned on the transaction</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Quantity</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Quantity of each product purchased in the transaction</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Feedback</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Customer provided feedback on the purchase experience (binary)</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Year, Month, Day, Week, Quarter</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Extracted from Order Date</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Total Sales</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Sales for all items in a transaction (Product of sales and Quantity)</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Total Profit</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Profit for all items in a transaction (Product of Profit and Quantity)</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Profit Margin</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Profit margin per transaction (Total profit over total sales in percentage)</em></span></td>

</tr>
<tr style="height: 1.03334px;">
<td style="width: 145.75px; height: 1.03334px;"><span style="color: #808080;">Order Size</span></td>
<td style="width: 189.85px; height: 1.03334px;"><span style="color: #999999;"><em>Gives number of times the order is placed w.r.t order ID</em></span></td>

</tr>
</tbody>
</table>
<p>&nbsp;</p>

<h2><strong>Export the data to CSV format</strong></h2>
<p><span style="color: #808080;">After Data Wrangling, we can export the data for further visualisation in Power Bi</span></p>

In [32]:
# df.to_csv(r"Cleaned Euromart Data form Python")