#   US Super-Store Analysis

# Problem Statement

Our hypothetical company has announced that they would like to get into the E-Commerce industry. They will be creating an E-commerce website, however they have no knowledge of the E-Commerce industry. My goal is to collect, clean and analyze a `Groceries shopping` data from variety of sources so that I can provide reommendation to the company that will allow them to be successful in the E-commerce Industry.

## Import the library 

In [1]:
import pandas as pd 
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt
from dateutil.parser import parse
import plotly.figure_factory as ff 
import plotly.graph_objects as go
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected= True)
from plotly.subplots import make_subplots
from plotly.offline import iplot
from plotly import tools
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook_connected"


## Loading the data 

In [2]:
df =pd.read_csv("C:/Users/HP/Documents/workspace/usa_super_store.csv")

In [3]:
df

Unnamed: 0,Row ID,Order Priority,Discount,Unit Price,Shipping Cost,Customer ID,Customer Name,Ship Mode,Customer Segment,Product Category,...,State or Province,City,Postal Code,Order Date,Ship Date,Profit,Quantity ordered new,Sales,Order ID,Status
0,20847,High,0.01,2.84,0.93,3,Bonnie Potter,Express Air,Corporate,Office Supplies,...,Washington,Anacortes,98221,1/7/2015,1/8/2015,4.5600,4,13.01,88522,0
1,20228,Not Specified,0.02,500.98,26.00,5,Ronnie Proctor,Delivery Truck,Home Office,Furniture,...,California,San Gabriel,91776,6/13/2015,6/15/2015,4390.3665,12,6362.85,90193,0
2,21776,Critical,0.06,9.48,7.29,11,Marcus Dunlap,Regular Air,Home Office,Furniture,...,New Jersey,Roselle,7203,2/15/2015,2/17/2015,-53.8096,22,211.15,90192,0
3,24844,Medium,0.09,78.69,19.99,14,Gwendolyn F Tyson,Regular Air,Small Business,Furniture,...,Minnesota,Prior Lake,55372,5/12/2015,5/14/2015,803.4705,16,1164.45,86838,0
4,24846,Medium,0.08,3.28,2.31,14,Gwendolyn F Tyson,Regular Air,Small Business,Office Supplies,...,Minnesota,Prior Lake,55372,5/12/2015,5/13/2015,-24.0300,7,22.23,86838,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1947,19842,High,0.01,10.90,7.46,3397,Andrea Shaw,Regular Air,Small Business,Office Supplies,...,Illinois,Danville,61832,3/11/2015,3/12/2015,-116.7600,18,207.31,87536,0
1948,19843,High,0.10,7.99,5.03,3397,Andrea Shaw,Regular Air,Small Business,Technology,...,Illinois,Danville,61832,3/11/2015,3/12/2015,-160.9520,22,143.12,87536,0
1949,26208,Not Specified,0.08,11.97,5.81,3399,Marvin Reid,Regular Air,Small Business,Office Supplies,...,Illinois,Des Plaines,60016,3/29/2015,3/31/2015,-41.8700,5,59.98,87534,0
1950,24911,Medium,0.10,9.38,4.93,3400,Florence Gold,Express Air,Small Business,Furniture,...,West Virginia,Fairmont,26554,4/4/2015,4/4/2015,-24.7104,15,135.78,87537,0


In [4]:
df.head().T

Unnamed: 0,0,1,2,3,4
Row ID,20847,20228,21776,24844,24846
Order Priority,High,Not Specified,Critical,Medium,Medium
Discount,0.01,0.02,0.06,0.09,0.08
Unit Price,2.84,500.98,9.48,78.69,3.28
Shipping Cost,0.93,26.0,7.29,19.99,2.31
Customer ID,3,5,11,14,14
Customer Name,Bonnie Potter,Ronnie Proctor,Marcus Dunlap,Gwendolyn F Tyson,Gwendolyn F Tyson
Ship Mode,Express Air,Delivery Truck,Regular Air,Regular Air,Regular Air
Customer Segment,Corporate,Home Office,Home Office,Small Business,Small Business
Product Category,Office Supplies,Furniture,Furniture,Furniture,Office Supplies


In [5]:
df.columns

Index(['Row ID', 'Order Priority', 'Discount', 'Unit Price', 'Shipping Cost',
       'Customer ID', 'Customer Name', 'Ship Mode', 'Customer Segment',
       'Product Category', 'Product Sub-Category', 'Product Container',
       'Product Name', 'Product Base Margin', 'Country', 'Manager', 'Region',
       'State or Province', 'City', 'Postal Code', 'Order Date', 'Ship Date',
       'Profit', 'Quantity ordered new', 'Sales', 'Order ID', 'Status'],
      dtype='object')

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1952 entries, 0 to 1951
Data columns (total 27 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Row ID                1952 non-null   int64  
 1   Order Priority        1952 non-null   object 
 2   Discount              1952 non-null   float64
 3   Unit Price            1952 non-null   float64
 4   Shipping Cost         1952 non-null   float64
 5   Customer ID           1952 non-null   int64  
 6   Customer Name         1952 non-null   object 
 7   Ship Mode             1952 non-null   object 
 8   Customer Segment      1952 non-null   object 
 9   Product Category      1952 non-null   object 
 10  Product Sub-Category  1952 non-null   object 
 11  Product Container     1952 non-null   object 
 12  Product Name          1952 non-null   object 
 13  Product Base Margin   1936 non-null   float64
 14  Country               1952 non-null   object 
 15  Manager              

**Looking at the structure in the data, the data types are mostly floats, objects & int. However, the structure of the date is kept into consideration during data wrangling by converting object or strings into datatime format for proper analysis**

In [7]:
df.describe()

Unnamed: 0,Row ID,Discount,Unit Price,Shipping Cost,Customer ID,Product Base Margin,Postal Code,Profit,Quantity ordered new,Sales,Order ID,Status
count,1952.0,1952.0,1952.0,1952.0,1952.0,1936.0,1952.0,1952.0,1952.0,1952.0,1952.0,1952.0
mean,19916.479508,0.048975,109.079221,12.968151,1735.376537,0.515186,51534.769467,114.793859,12.944672,985.828832,82365.92418,0.007684
std,5957.595627,0.031378,393.481301,17.414631,991.078006,0.137055,29362.82842,1141.112387,13.871565,2559.900167,19042.295798,0.087346
min,64.0,0.0,1.14,0.49,3.0,0.35,1001.0,-16476.838,1.0,2.25,359.0,0.0
25%,19121.0,0.02,6.48,3.23,875.0,0.38,28560.0,-84.4854,5.0,58.8075,86767.75,0.0
50%,21164.5,0.05,20.99,6.15,1738.0,0.525,48765.5,1.47645,10.0,202.395,88376.0,0.0
75%,23483.25,0.08,100.9725,14.3625,2578.25,0.59,78550.0,116.201575,16.0,802.945,89957.0,0.0
max,26389.0,0.21,6783.02,164.73,3403.0,0.85,99362.0,9228.2256,167.0,45737.33,91586.0,1.0


Statistical distribution of the data.

## Data Preprocessing 

In [8]:
df.columns = df.columns.str.lower().str.replace(' ', '_')

string_columns = list(df.dtypes[df.dtypes == 'object'].index)

for col in string_columns:
    df[col] = df[col].str.lower().str.replace(' ', '_')

Cleaning the features in the data for easy readability.

In [9]:
for col in df.columns:
    print(col)
    df[col].unique()
    print(df[col].unique()[:5])
    df[col].nunique()
    print(df[col].nunique())

row_id
[20847 20228 21776 24844 24846]
1951
order_priority
['high' 'not_specified' 'critical' 'medium' 'low']
6
discount
[0.01 0.02 0.06 0.09 0.08]
13
unit_price
[  2.84 500.98   9.48  78.69   3.28]
597
shipping_cost
[ 0.93 26.    7.29 19.99  2.31]
497
customer_id
[ 3  5 11 14 15]
1130
customer_name
['bonnie_potter' 'ronnie_proctor' 'marcus_dunlap' 'gwendolyn_f_tyson'
 'timothy_reese']
1130
ship_mode
['express_air' 'delivery_truck' 'regular_air']
3
customer_segment
['corporate' 'home_office' 'small_business' 'consumer']
4
product_category
['office_supplies' 'furniture' 'technology']
3
product_sub-category
['pens_&_art_supplies' 'chairs_&_chairmats' 'office_furnishings'
 'rubber_bands' 'envelopes']
17
product_container
['wrap_bag' 'jumbo_drum' 'small_pack' 'small_box' 'medium_box']
7
product_name
['sanford_liquid_accent™_tank-style_highlighters'
 'global_troy™_executive_leather_low-back_tilter'
 'dax_two-tone_rosewood/black_document_frame,_desktop,_5_x_7'
 'howard_miller_12-3/4_diameter

Viewing the first `five features` in the data.

In [10]:
df.isnull().sum()

row_id                   0
order_priority           0
discount                 0
unit_price               0
shipping_cost            0
customer_id              0
customer_name            0
ship_mode                0
customer_segment         0
product_category         0
product_sub-category     0
product_container        0
product_name             0
product_base_margin     16
country                  0
manager                  0
region                   0
state_or_province        0
city                     0
postal_code              0
order_date               0
ship_date                0
profit                   0
quantity_ordered_new     0
sales                    0
order_id                 0
status                   0
dtype: int64

**After checking the structure of the data, i found out that product base margin had missing points in the data and this would be treated by dropping the data. The reason for dropping the data is bent on the fact that calculating the product base margin giving the formula:**

                    gross profit/Revenue * 100

**and using the formula for gross profit which is:**

                    cost of goods sold (COGS) - total revenue

**I found out that there were some miscalculations in the data. However, this won't stop my analysis, as i move to drop the missing values**

In [11]:
df.dropna(how= 'all', subset=['product_base_margin'], inplace=True)

Dropping the missing the data.

In [12]:
df.isnull().sum()

row_id                  0
order_priority          0
discount                0
unit_price              0
shipping_cost           0
customer_id             0
customer_name           0
ship_mode               0
customer_segment        0
product_category        0
product_sub-category    0
product_container       0
product_name            0
product_base_margin     0
country                 0
manager                 0
region                  0
state_or_province       0
city                    0
postal_code             0
order_date              0
ship_date               0
profit                  0
quantity_ordered_new    0
sales                   0
order_id                0
status                  0
dtype: int64

Looking at the above, there are no more missing values in the data.

In [13]:
df

Unnamed: 0,row_id,order_priority,discount,unit_price,shipping_cost,customer_id,customer_name,ship_mode,customer_segment,product_category,...,state_or_province,city,postal_code,order_date,ship_date,profit,quantity_ordered_new,sales,order_id,status
0,20847,high,0.01,2.84,0.93,3,bonnie_potter,express_air,corporate,office_supplies,...,washington,anacortes,98221,1/7/2015,1/8/2015,4.5600,4,13.01,88522,0
1,20228,not_specified,0.02,500.98,26.00,5,ronnie_proctor,delivery_truck,home_office,furniture,...,california,san_gabriel,91776,6/13/2015,6/15/2015,4390.3665,12,6362.85,90193,0
2,21776,critical,0.06,9.48,7.29,11,marcus_dunlap,regular_air,home_office,furniture,...,new_jersey,roselle,7203,2/15/2015,2/17/2015,-53.8096,22,211.15,90192,0
3,24844,medium,0.09,78.69,19.99,14,gwendolyn_f_tyson,regular_air,small_business,furniture,...,minnesota,prior_lake,55372,5/12/2015,5/14/2015,803.4705,16,1164.45,86838,0
4,24846,medium,0.08,3.28,2.31,14,gwendolyn_f_tyson,regular_air,small_business,office_supplies,...,minnesota,prior_lake,55372,5/12/2015,5/13/2015,-24.0300,7,22.23,86838,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1947,19842,high,0.01,10.90,7.46,3397,andrea_shaw,regular_air,small_business,office_supplies,...,illinois,danville,61832,3/11/2015,3/12/2015,-116.7600,18,207.31,87536,0
1948,19843,high,0.10,7.99,5.03,3397,andrea_shaw,regular_air,small_business,technology,...,illinois,danville,61832,3/11/2015,3/12/2015,-160.9520,22,143.12,87536,0
1949,26208,not_specified,0.08,11.97,5.81,3399,marvin_reid,regular_air,small_business,office_supplies,...,illinois,des_plaines,60016,3/29/2015,3/31/2015,-41.8700,5,59.98,87534,0
1950,24911,medium,0.10,9.38,4.93,3400,florence_gold,express_air,small_business,furniture,...,west_virginia,fairmont,26554,4/4/2015,4/4/2015,-24.7104,15,135.78,87537,0


In [14]:
from dateutil.parser import parse
from datetime import datetime
tCol = df['order_date']

List = [(datetime.ctime(parse(x[0:-3])),x[-2:]) for x in tCol]
dayList = []
monthList = []
periodList = []
for row in List:
    day = row[0][0:4]
    month = row[0][4:7]
    if row[1]=='AM':
        period = 'Morning'
    elif row[1] =='PM' and int(row[0][11:13])<4:
        period = 'Afternoon'
    elif row[1] =='PM' and int(row[0][11:13])<5:
        period = 'Evening' 
    elif row[1] =='PM' and int(row[0][11:13])>6:
        period = 'Night'
    else:
        period = 'Unknown'
        
    dayList.append(day)
    monthList.append(month)
    periodList.append(period)
    
print(len(dayList), len(monthList), len(periodList))    

df['month'] = monthList
df['day'] = dayList
df['period']= periodList
df.head()


1936 1936 1936


Unnamed: 0,row_id,order_priority,discount,unit_price,shipping_cost,customer_id,customer_name,ship_mode,customer_segment,product_category,...,order_date,ship_date,profit,quantity_ordered_new,sales,order_id,status,month,day,period
0,20847,high,0.01,2.84,0.93,3,bonnie_potter,express_air,corporate,office_supplies,...,1/7/2015,1/8/2015,4.56,4,13.01,88522,0,Jan,Mon,Unknown
1,20228,not_specified,0.02,500.98,26.0,5,ronnie_proctor,delivery_truck,home_office,furniture,...,6/13/2015,6/15/2015,4390.3665,12,6362.85,90193,0,Jun,Thu,Unknown
2,21776,critical,0.06,9.48,7.29,11,marcus_dunlap,regular_air,home_office,furniture,...,2/15/2015,2/17/2015,-53.8096,22,211.15,90192,0,Feb,Fri,Unknown
3,24844,medium,0.09,78.69,19.99,14,gwendolyn_f_tyson,regular_air,small_business,furniture,...,5/12/2015,5/14/2015,803.4705,16,1164.45,86838,0,May,Sun,Unknown
4,24846,medium,0.08,3.28,2.31,14,gwendolyn_f_tyson,regular_air,small_business,office_supplies,...,5/12/2015,5/13/2015,-24.03,7,22.23,86838,0,May,Sun,Unknown


Creating new columns `month`, `day`, `period` in the data.

# Exploratory Data Analysis 

In [15]:
corr = df.corr()

data=go.Heatmap(z=corr.values,
                x=corr.index.values.tolist(),
                y=corr.index.values.tolist(),
                hoverinfo='all')
layout = go.Layout(title='Correlation Heatmap',titlefont=dict(size=20))
    
iplot(dict(data=data,layout=layout))

**The above plot is a heatmap showing the correlation between all variables.**

**As we can see above 0.6 - 1 shows a strong correlation, this shows we have just one positively correlated variable amongst others which is 'order id and row id'.**

**From the above 0 - 0.4 shows a weak correlation and we have a few correlations in this category such as 'sales and unit price', 'sales and profit', and many more.**

**Lastly 0 - (-value) shows a negative correlation between variables. in this category just as we can see above, many variables fall within this category.**

In [16]:
temp = df.groupby('region').sum().sort_values('sales',ascending=False)

data = []

data.append(go.Bar(x=temp.index,y=temp['sales'],name='sales'))
data.append(go.Bar(x=temp.index,y=temp['profit'],name='profit'))

layout = go.Layout(xaxis=dict(title='region',titlefont=dict(size=20)),
                   yaxis=dict(title='Sales/Profits',titlefont=dict(size=20)))

iplot(dict(data=data,layout=layout))

**The above plot here shows the distribution of sales and profit amongst all 4 regions.**

- **The east region had more sales and profit than any other region while the south, despite making sales, had a loss.**

- **Further analyses can be made to determine why this happened.**

In [17]:
top_10_state_sales = df.groupby( 'state_or_province').sum()['sales'].sort_values(ascending=False)[0:10]
top_10_state_profits = df.groupby(by='state_or_province').sum().sort_values(by='sales',ascending=False)[0:10]['profit']

data=[go.Bar(x=top_10_state_sales.index,
             y=top_10_state_sales,
             name='Top 10 States Sales'),
      go.Bar(x=top_10_state_profits.index,
             y=top_10_state_profits,
             name='Top 10 States Profits')]

layout=go.Layout(dict(title="Grouped Bar Plot Fot Sales and Profits<br>(For The Top Ten State Sales)",
                      barmode='group'))
    
iplot(dict(data=data,layout=layout))

In [18]:
df['state_or_province'].nunique()

49

**US Super Store operates in 49 states of the province.**

**However from this visualization, California and new york had significantly high profits and sales, while Ohio and Illinois had similar profits to new york but low sales.**

**These 4 states should be further investigated to determine how and why California and new york had more sales, and how Ohio and Illinois were able to make a significant profit without making many sales.**

**These systems should be able to give maximum sales and maximum profit after deployment, however this is dependent on the result from the investigation.**

In [19]:
fig = go.Figure(data=px.scatter(x=df.groupby('state_or_province').sum()['sales'],
                                y=df.groupby('state_or_province').sum()['profit'],
                                hover_name=df.groupby('state_or_province').sum().index,
                                size=df.groupby('state_or_province').sum()['quantity_ordered_new']))

fig['layout']=go.Layout(title='Sales & Profits by States',
                        titlefont=dict(size=25),
                        xaxis=dict(title='Sales',titlefont=dict(size=18)),
                        yaxis=dict(title='Profits',titlefont=dict(size=18)))
iplot(dict(data=fig))

**All states except 'California' and 'New york' had their sales within 0 - 100k and profit within -20k - 30k.**

- **Only New york and California had sales of over 223k and 284k respectively.**

- **This can only mean that California and New York are investing in some market growth principles. further analysis can be conducted on that to determine what exactly contributed to the exponential positive sales report as against the others.**

In [20]:
data=go.Pie(labels=df['product_sub-category'].value_counts().index,
            values=df['product_sub-category'].value_counts(),
            textinfo='label+percent',
            hoverinfo='label+percent',
            marker=dict(line=dict(width=1.5)))

layout=go.Layout(title='Pie Plot for Total Sales for each Product-Sub-Category',
                 titlefont=dict(size=25),
                 height=700)

iplot(dict(data=data,layout=layout))

**The above plot can help a lot with inventory stocking.**

**This is the percentage count of quantity sold per item**

**Amongst all products paper had the highest percentage count of 14.7%, then 'Binders and Binder Accessories which had a count of 10.3%, etc**

**We could invest in stocking more paper, however, it is important to know that it doesn't yield or amount to higher sales as shown in the bar plot below.**

In [21]:
temp = df.groupby(by=['product_sub-category']).sum().sort_values(by='sales',ascending=False).reset_index()

data=go.Bar(x=temp['product_sub-category'],
            y=temp['sales'])

layout=go.Layout(title='Bar Plot For Product-Sub-Categrotries Sales',
                 titlefont=dict(size=25),
                 height=700, 
                 xaxis=dict(title='Sub-Categroies',titlefont=dict(size=19)),
                 yaxis=dict(title='Sales',titlefont=dict(size=19)),)


iplot(dict(data=data,layout=layout))

**This above plot shows the distribution of sales between subcategories.**

- **Here we can see that the highest sales came from office machines, chairs, chairmats, telephone and communication, etc.**

- **Further analysis can be done to determine what region or state was this and why office machines had the most sales, in order to replicate the same high sales mechanism deployed.**

In [22]:
new= df[df['product_sub-category']=='paper']
new.head().T

Unnamed: 0,16,28,31,32,48
row_id,20631,18552,22890,25354,19877
order_priority,high,not_specified,high,high,medium
discount,0.06,0.02,0.02,0.04,0.05
unit_price,55.48,5.98,5.98,29.14,5.18
shipping_cost,14.3,5.79,5.15,4.86,2.04
customer_id,24,53,62,62,91
customer_name,edna_thomas,sidney_russell_austin,pam_gilbert,pam_gilbert,wallace_werner
ship_mode,regular_air,regular_air,regular_air,regular_air,regular_air
customer_segment,corporate,corporate,corporate,corporate,home_office
product_category,office_supplies,office_supplies,office_supplies,office_supplies,office_supplies


In [23]:
data=[go.Scatter(y=df['sales'],x=df.index,name='Sales Trend Line'),
      go.Scatter(y=df['profit'],x=df.index,name='Profit Trend Line')]

layout = go.Layout(title='Sales/Profits Line plot',titlefont=(dict(size=25)),
                   xaxis=dict(title='Months',titlefont=(dict(size=18))),
                   yaxis=dict(title='Sales/Profit',titlefont=(dict(size=18))))

iplot(dict(data=data,layout=layout))

- **This line plot clearly shows that most sales were between 0 - 48k and profit trends were on average during these periods.**

- **However, there were cases of loss especially close to mid-year, where we had the highest sales of about 45.7k but the lowest profit of -16.7k.**

- **This could be a result of many things such as extreme discount sales, massive returns, late delivery timeframe, etc further analysis could be done to determine what exactly contributed to a massive loss despite the high sales.**

In [24]:
labels = ['5% Discount','3% Discount','2% Discount','1% Discount','4% Discount','9% Discount',
          '7% Discount','6% Discount','0% Discount','10% Discount','8% Discount','17% Discount', '21% Discount']

trace_pie = go.Pie(labels=labels,
                   values=df['discount'].value_counts(),
                   textinfo='label+percent',hoverinfo='label+percent',
                   marker=dict(line=dict(width=1)),)

layout = go.Layout(title='Discount Pie Plot' ,titlefont=dict(size=20))

iplot(dict(data=trace_pie,layout=layout))

**The above plot shows the percentage of discounts given by the US Super Store.**

- **The highest discount given on product was 5%.**

- **The least discount given on a product was 17% and 21% respectively.** 

# CONCLUSION

The following conclusion was derived from the analysis:

- **The dataset contains sales transactions for 2015, from the analysis, the store operates in 4 regions and 49 states.**


- **This analysis shows that `California` and `New York` had more sales and profit than other states, while `Florida` and `Washington` made losses.**


- **The `Eastern region` had more sales and profit amongst other other region. The `Southern region` had losses.**


- **The analysis also shows that more sales were made from office furniture, chairs, and chairmats, telephone and communications, etc while very low sales were made on rubber bands, labels, scissors, rubber, and trimmers, etc.**


- **There was a massive quantity/count of products sold on paper, binders and binders accessories, telephone and communication, office furniture etc**


- **That most sales were between 0 - 48k and profit trends were on average during these periods.**


- **The highest sales made was about 45.7k but the profit was at its lowest during this period at about -16.7k.**


-  **The highest discount given on product was 5%, the least discount given on a product was 17% and 21% respectively.**



## The problem the insight generated could solve for an E-commerce Company

- This analysis would help companies solve inventory problems as we can see exactly the product that is more in demand and contributes more profit and sales to the organization. This would help management decide to either eliminate/enhance the stocking of products that contribute to very low sales and profit.

- Customers Behavior is a very important factor to consider in an e-commerce company, as this would help put recommendation systems in place which would lead to more sales.

- Storage location is another critical dilemma faced by most e-commerce companies, as they would love to be closer to their customers for faster delivery and reduced delivery services.



## My advice to any Ecommerce platform using the analyzed data, would be to;

- Locate its storehouse in the eastern region of the country.

- To also consider implementing 'recommenders systems' as we know per time the shopping behavior and pattern of our customers.

- lastly, I would say we invest more in marketing goods with higher sales and profit, and think of ways to enhance low-selling products. If after 2-3 months of enhancing the low-selling product, it still generates low sales, the product should be dropped. 

**Other advise could be explored by further analysing the data, as the insight gotten was for a specific problem faced by an E-commerce company.**

## From my understanding of business processes, the assumptions or insights I would make from an external overview of the platforms as it pertains to customers' patronage and engagements would be:

- Social Media Awareness/interaction: Using the social media platforms for a `demo E-commerce company`, there seem to be  less interaction, the platform does not show if individuals or groups have used the platform before, and it does not give off a customer satisfaction insight if anything has happened. A value proposition is not very much evident.


- User-Friendly Website: The website is not very user friendly, as the categories are not very classified easily for faster shopping. This might create a shorter retention rate on the website.


- Recommenders Systems: There is absence of a recommenders system, with the use of recommender systems, customers who visit the website would have a good shopping experience, those who have no idea what to shop on would be recommended products  that best suits their online or shopping behaviour. This would lead to patronage and enagagements on the platforms.

### Peradventure there is an increase in churn, the analysis would be carried out to figure out why, and recommendations made to correct such errors such as understanding customers' shopping behaviour and putting measures in place for a recommenders system

## The Top 3 metrics to define the success of the product are:

- `Sales Matrics`

- `Financial metrics`

- `Marketing metrics`


- **Sales Matrics**: The goal of any organization is to make sales and profit. These matrics are a good indicator of any successful business. They would help determine if the company is making sales or not. 

                  income from customers - returned goods

in this metrics, this is where we calculate

- sales cycle length

- average selling price

- averege profit margin

- average purchases value


- `Financial metrics`: These are financial performance that track sales turnover, profits, expenditures, assets, liabilities, and capital. They are used to track business processes, improve operational efficiency, and assist in planning and strategy formulation.


`Marketing metrics`: Marketing is that segment of the business that helps create awareness for the company. One of the company's goal or objective is to reach more customers and making more sales, these matrics is very important  to consider, as we look at the following;

- conversion rate

- seo metrics

- website traffic

 

## Using Data to ensure the product is profitable

- With the use of data, good decision making is fueled and this enables the company determine the best metrics to be used. This can drive profitbility and growth of the product.


- Data enusres product profitability through enabling the company to monitor their stock closely so they don’t over order and always have stock on hand to meet demand. With data, managing inventory means ordering is more efficient and you can take advantage of price changes and new stock. This way, wastage is reduced, customer demand is met, the product is profitable.


- Data enables salespeople meet or exceed customers expectaion by having access to consolidated customer data that shows what products and sub-product groups your customers buy enables sales reps to assist customers to use their budgets efficiently and reduce the number of at-risk customers.


-  Data also helps you find more up-sell opportunities in your sales data, this way more sales is made to aid product profitability.

### N:B - Other areas in the data can be analyzed to geneate further insigth as i focused on using this data to answer some major problem that could be faced by an E-commerce Platform.