# Global consumer electronic brand sales analysis

## Importing libraries and data

During this step, we import the necessary libraries and load the data required for further analysis. 


In [3]:
import datetime

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sqlalchemy import create_engine, MetaData, Table, ForeignKey
from sqlalchemy import Column, Integer, String, inspect, Float, Date, join, select
import os
import plotly.graph_objs as go 
from plotly.offline import init_notebook_mode,iplot
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
init_notebook_mode(connected=True) 

In [4]:
engine = create_engine("sqlite:///database/my_database.db")
metadata = MetaData(bind=engine)

## Descriptive Statistics


This section offers a comprehensive overview of the dataset, including key descriptive statistics for the variables.


### Sales Dataset

In [6]:
sales_df = pd.read_sql('''SELECT * FROM sales;''', engine)

In [19]:
sales_df['Order Date'].min()

'2016-01-01 00:00:00.000000'

In [21]:
sales_df['Order Date'].max()

'2021-02-20 00:00:00.000000'

The dataset spans nearly five years, from January 1, 2016, to February 2021.

In [34]:
sales_df.value_counts('Currency Code', normalize=True).reset_index().rename(columns={0: 'Percentage'})

Unnamed: 0,Currency Code,Percentage
0,USD,0.536973
1,EUR,0.200703
2,GBP,0.129445
3,CAD,0.086111
4,AUD,0.046769


The company supports five currencies: USD, EUR, GBP, CAD, and AUD. The US dollar is the most commonly used, making up approximately 53% of all sales, while the Euro accounts for around 20%.

In [35]:
sales_df['Quantity'].describe()

count    62884.000000
mean         3.144790
std          2.256371
min          1.000000
25%          1.000000
50%          2.000000
75%          4.000000
max         10.000000
Name: Quantity, dtype: float64

The number of items per order ranges from 1 to 10, with an average purchase quantity of 3.

### Stores Dataset

In [38]:
stores_df = pd.read_sql('''SELECT * FROM stores;''', engine)

In [49]:
stores_df.value_counts('Country').reset_index().rename(columns={0: 'Count'})

Unnamed: 0,Country,Count
0,United States,24
1,Germany,9
2,France,7
3,United Kingdom,7
4,Australia,6
5,Canada,5
6,Netherlands,5
7,Italy,3


In [53]:
stores_df.value_counts('Country').reset_index().rename(columns={0: 'Count'})['Count'].sum()


66

In [55]:
stores_df.value_counts('Country').reset_index().rename(columns={0: 'Count'})['Count'].median()


6.5

The company operates 66 stores across 7 countries, with the United States having the highest count at 24 stores and Italy the lowest at 3 stores. The median number of stores per country is 6.

### Customers Dataset

In [159]:
customers_df = pd.read_sql('''SELECT * FROM customers;''', engine)

In [161]:
customers_df.value_counts('Gender').reset_index(name='Count')

Unnamed: 0,Gender,Count
0,Male,7742
1,Female,7514


Among the nearly 15,000 customers in the dataset, the gender distribution is almost equal, with approximately 7,700 males and 7,500 females.

In [165]:
((pd.Timestamp.today() - customers_df['Birthday'].astype('datetime64[ns]'))/pd.Timedelta(days=365.25)).describe()


count    15256.000000
mean        56.508485
std         19.319906
min         23.100866
25%         39.788752
50%         56.665548
75%         73.117294
max         90.142619
Name: Birthday, dtype: float64

The average age of the customers is 56, with a minimum age of 23 and a maximum age of 90.

### Products Dataset

In [166]:
products_df = pd.read_sql('''SELECT * FROM products;''', engine)

In [170]:
products_df['Brand'].nunique()

11

In [169]:
products_df.value_counts('Brand').reset_index(name='Count')

Unnamed: 0,Brand,Count
0,Contoso,710
1,Fabrikam,267
2,Litware,264
3,Proseware,244
4,Adventure Works,192
5,Southridge Video,192
6,Wide World Importers,173
7,The Phone Company,152
8,Tailspin Toys,144
9,A. Datum,132


The company offers products from 11 different brands, with Contoso being the leading brand in terms of product offerings.

In [172]:
products_df['Unit Price USD'].describe()

count    2517.000000
mean      356.830131
std       494.054962
min         0.950000
25%        69.000000
50%       199.990000
75%       410.000000
max      3199.990000
Name: Unit Price USD, dtype: float64

The average price of the offered products is `$356`, with prices ranging from `$1` to `$3,200`.


In [175]:
products_df.value_counts('Subcategory').reset_index(name='Count')

Unnamed: 0,Subcategory,Count
0,Computers Accessories,201
1,Lamps,158
2,Download Games,120
3,Camcorders,103
4,Projectors & Screens,103
5,Microwaves,102
6,"Printers, Scanners & Fax",101
7,Smart phones & PDAs,101
8,Digital SLR Cameras,100
9,Home Theater System,100


The company offers products across 32 distinct categories, with computer accessories having the largest selection.