<h1 > E-commerce </h1> 
<img src="./img/tof-ecommerce.jpg" width="500" height="300">

#### This dataset consists of orders made in different countries from December 2010 to December 2011. The company is a UK-based online retailer that mainly sells unique all-occasion gifts. Many of its customers are wholesalers.

In [20]:
# importation des bibliotheques
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### uploading the dataset: 

In [21]:
commerce=pd.read_csv("./data/datacamp_workspace_export_2022-10-27 10_53_34.csv")

## Data Dictionary
| Variable    | Explanation                                                                                                                       |
|-------------|-----------------------------------------------------------------------------------------------------------------------------------|
| InvoiceNo   | A 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c' it indicates a cancellation. |
| StockCode   | A 5-digit integral number uniquely assigned to each distinct product.                                                             |
| Description | Product (item) name                                                                                                               |
| Quantity    | The quantities of each product (item) per transaction                                                                             |
| InvoiceDate | The day and time when each transaction was generated                                                                              |
| UnitPrice   | Product price per unit in sterling (pound)                                                                                        |
| CustomerID  | A 5-digit integral number uniquely assigned to each customer                                                                      |
| Country     | The name of the country where each customer resides  

### it is extremely important to realize the volume of data so i need to be aware of the dimensions of the data

In [22]:
commerce.shape

(2500, 8)

### To get a quick overview and summary of the dataset

In [23]:
commerce.info()    

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2500 entries, 0 to 2499
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   InvoiceNo    2500 non-null   object 
 1   StockCode    2500 non-null   object 
 2   Description  2490 non-null   object 
 3   Quantity     2500 non-null   int64  
 4   InvoiceDate  2500 non-null   object 
 5   UnitPrice    2500 non-null   float64
 6   CustomerID   1919 non-null   float64
 7   Country      2500 non-null   object 
dtypes: float64(2), int64(1), object(5)
memory usage: 156.4+ KB


In [24]:
commerce.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/10 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/10 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/10 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/10 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/10 8:26,3.39,17850.0,United Kingdom


In [25]:
commerce.tail(10)

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
2490,536592,20727,LUNCH BAG BLACK SKULL.,2,12/1/10 17:06,4.21,,United Kingdom
2491,536592,20733,GOLD MINI TAPE MEASURE,6,12/1/10 17:06,0.85,,United Kingdom
2492,536592,20735,BLACK MINI TAPE MEASURE,4,12/1/10 17:06,0.85,,United Kingdom
2493,536592,20752,BLUE POLKADOT WASHING UP GLOVES,1,12/1/10 17:06,4.21,,United Kingdom
2494,536592,20754,RETROSPOT RED WASHING UP GLOVES,1,12/1/10 17:06,4.21,,United Kingdom
2495,536592,20761,BLUE PAISLEY SKETCHBOOK,1,12/1/10 17:06,7.62,,United Kingdom
2496,536592,20780,BLACK EAR MUFF HEADPHONES,1,12/1/10 17:06,11.02,,United Kingdom
2497,536592,20846,ZINC HEART LATTICE T-LIGHT HOLDER,1,12/1/10 17:06,2.51,,United Kingdom
2498,536592,20914,SET/5 RED RETROSPOT LID GLASS BOWLS,1,12/1/10 17:06,5.91,,United Kingdom
2499,536592,20931,BLUE POT PLANT CANDLE,2,12/1/10 17:06,7.62,,United Kingdom


### to get more details generate descriptive statistics that summarize the dataset

In [26]:
commerce.describe(include='all')

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
count,2500.0,2500.0,2490,2500.0,2500,2500.0,1919.0,2500
unique,138.0,1175.0,1167,,117,,,7
top,536544.0,22632.0,HAND WARMER SCOTTY DOG DESIGN,,12/1/10 14:32,,,United Kingdom
freq,527.0,19.0,18,,528,,,2341
mean,,,,10.0652,,3.701044,15637.112559,
std,,,,29.139317,,12.328907,1848.418705,
min,,,,-24.0,,0.0,12431.0,
25%,,,,1.0,,1.25,14307.0,
50%,,,,3.0,,2.51,15605.0,
75%,,,,10.0,,4.21,17841.0,


In [27]:
T=commerce["StockCode"]
print(commerce["StockCode"])

0       85123A
1        71053
2       84406B
3       84029G
4       84029E
         ...  
2495     20761
2496     20780
2497     20846
2498     20914
2499     20931
Name: StockCode, Length: 2500, dtype: object


In [28]:
print(commerce["CustomerID"])

0       17850.0
1       17850.0
2       17850.0
3       17850.0
4       17850.0
         ...   
2495        NaN
2496        NaN
2497        NaN
2498        NaN
2499        NaN
Name: CustomerID, Length: 2500, dtype: float64


### change NAN to 0

In [29]:
commerce["StockCode"] = commerce["StockCode"].fillna(0.0)
commerce["CustomerID"] = commerce["CustomerID"].fillna(0.0)


In [30]:
print(commerce["StockCode"])

0       85123A
1        71053
2       84406B
3       84029G
4       84029E
         ...  
2495     20761
2496     20780
2497     20846
2498     20914
2499     20931
Name: StockCode, Length: 2500, dtype: object


### trying to find out which countries are ordering online

In [31]:
print(commerce["Country"].unique()) 

['United Kingdom' 'France' 'Australia' 'Netherlands' 'Germany' 'Norway'
 'EIRE']


In [32]:
# highest price
Max_prix=commerce["UnitPrice"].max()
print(Max_prix)

569.77


In [33]:
commerce[commerce["UnitPrice"]==Max_prix]["CustomerID"]

1814    0.0
Name: CustomerID, dtype: float64

# the quantity purchased in the various countries

In [34]:
Quantity = commerce.groupby('Country')['Quantity'].sum()
Quantity

Country
Australia           107
EIRE                243
France              449
Germany             117
Netherlands          97
Norway             1852
United Kingdom    22298
Name: Quantity, dtype: int64

### the number of customers who buy online compared to other countries

In [35]:
Quantity = commerce.groupby('Country')['CustomerID'].count()
Quantity

Country
Australia           14
EIRE                21
France              20
Germany             29
Netherlands          2
Norway              73
United Kingdom    2341
Name: CustomerID, dtype: int64

In [36]:
Date_Quantity = commerce.groupby('InvoiceDate')['CustomerID'].count()
Date_Quantity

InvoiceDate
12/1/10 10:03    14
12/1/10 10:19    24
12/1/10 10:24     7
12/1/10 10:29    10
12/1/10 10:37     1
                 ..
12/1/10 9:53     13
12/1/10 9:56      7
12/1/10 9:57      3
12/1/10 9:58      5
12/1/10 9:59     14
Name: CustomerID, Length: 117, dtype: int64

In [39]:
R=commerce['Quantity'].tolist()
Prix=commerce['UnitPrice'].tolist()
amount=[]
P=0
for i in range(len(R)):
    amountID=int(R[i])*int(Prix[i])
    amount.append(amountID)
commerce['amount']=amount
Country_amount = commerce.groupby('Country')['amount'].sum()
Country_amount

Country
Australia           287
EIRE                402
France              522
Germany              67
Netherlands         111
Norway             1142
United Kingdom    35826
Name: amount, dtype: int64