# Eniac

## Importing Data

Turning them into a dictionary of dataframes.

In [1]:
import pandas as pd

# URLs for raw content of the CSV files on GitHub
orders_url = "https://raw.githubusercontent.com/MerleSt/Eniac/main/Data-Eniac/orders.csv"
orderlines_url = "https://raw.githubusercontent.com/MerleSt/Eniac/main/Data-Eniac/orderlines.csv"
products_url = "https://raw.githubusercontent.com/MerleSt/Eniac/main/Data-Eniac/products.csv"
brands_url = "https://raw.githubusercontent.com/MerleSt/Eniac/main/Data-Eniac/brands.csv"


# Loading dataframes directly from GitHub
orders = pd.read_csv(orders_url)
orderlines = pd.read_csv(orderlines_url)
products = pd.read_csv(products_url)
brands = pd.read_csv(brands_url)

- DataFrame **.describe()** gives basic numerical aggregations. It can be applied to a single column as well.
- DataFrame **.isna().any()** highlights which columns contain missing data
- DataFrame **.shape** gives the number of rows and columns
- DataFrame **.columns** gives the column names. Note that a list with new names can be passed to this attribute to rename the columns.
- DataFrame **.columnName.isna().sum()** is a quick way to check the number of missing values in a column
- DataFrame **.columnName.value_counts()** is a great way to summarise a categorical column. You can use it to discover how many orders are completed, cancelled, pending…
- DataFrame **.columnName.hist()** is an easy way to plot a histogram in a numerical column. Play with the bins argument to change the granularity of the graph.

In [2]:
orders.describe()

Unnamed: 0,order_id,total_paid
count,226909.0,226904.0
mean,413296.48248,569.225818
std,65919.250331,1761.778002
min,241319.0,0.0
25%,356263.0,34.19
50%,413040.0,112.99
75%,470553.0,525.98
max,527401.0,214747.53


In [3]:
products.describe()

Unnamed: 0,in_stock
count,19326.0
mean,0.109593
std,0.31239
min,0.0
25%,0.0
50%,0.0
75%,0.0
max,1.0


In [4]:
orderlines.describe()

Unnamed: 0,id,id_order,product_id,product_quantity
count,293983.0,293983.0,293983.0,293983.0
mean,1397918.0,419999.116544,0.0,1.121126
std,153009.6,66344.486479,0.0,3.396569
min,1119109.0,241319.0,0.0,1.0
25%,1262542.0,362258.5,0.0,1.0
50%,1406940.0,425956.0,0.0,1.0
75%,1531322.0,478657.0,0.0,1.0
max,1650203.0,527401.0,0.0,999.0


In [5]:
brands.describe()

Unnamed: 0,short,long
count,187,187
unique,187,181
top,8MO,Mophie
freq,1,2


In [6]:
orders.isna().sum()

order_id        0
created_date    0
total_paid      5
state           0
dtype: int64

In [7]:
orderlines.isna().sum()

id                  0
id_order            0
product_id          0
product_quantity    0
sku                 0
unit_price          0
date                0
dtype: int64

In [8]:
brands.isna().sum()

short    0
long     0
dtype: int64

In [9]:
products.isna().sum()

sku             0
name            0
desc            7
price          46
promo_price     0
in_stock        0
type           50
dtype: int64

```products['price'] = products['price'].str.replace('.', '', regex=False).astype(float)```

In [10]:
orders.dtypes

order_id          int64
created_date     object
total_paid      float64
state            object
dtype: object

In [11]:
orders['order_id'] = orders['order_id'].astype(str)

In [12]:
orders['created_date']  = pd.to_datetime(orders['created_date'])

created_date change into date, change order_id into object

In [13]:
orderlines.dtypes

id                   int64
id_order             int64
product_id           int64
product_quantity     int64
sku                 object
unit_price          object
date                object
dtype: object

In [14]:
orderlines['date']  = pd.to_datetime(orderlines['date'])

In [15]:
orderlines['unit_price'] = orderlines['unit_price'].str.replace('.', '', regex=False).astype(float)

In [16]:
orderlines['id']= orderlines['id'].astype(str)

In [17]:
orderlines['id_order']= orderlines['id_order'].astype(str)

In [19]:
orderlines.drop('product_id', axis=1, inplace=True)

In [20]:
orderlines

Unnamed: 0,id,id_order,product_quantity,sku,unit_price,date
0,1119109,299539,1,OTT0133,1899.0,2017-01-01 00:07:19
1,1119110,299540,1,LGE0043,39900.0,2017-01-01 00:19:45
2,1119111,299541,1,PAR0071,47405.0,2017-01-01 00:20:57
3,1119112,299542,1,WDT0315,6839.0,2017-01-01 00:51:40
4,1119113,299543,1,JBL0104,2374.0,2017-01-01 01:06:38
...,...,...,...,...,...,...
293978,1650199,527398,1,JBL0122,4299.0,2018-03-14 13:57:25
293979,1650200,527399,1,PAC0653,14158.0,2018-03-14 13:57:34
293980,1650201,527400,2,APP0698,999.0,2018-03-14 13:57:41
293981,1650202,527388,1,BEZ0204,1999.0,2018-03-14 13:58:01


change id, order_id, into objects. drop product_id sinc eno longer in use, change unit_price to float and date to date

In [22]:
products.dtypes

sku            object
name           object
desc           object
price          object
promo_price    object
in_stock        int64
type           object
dtype: object

change price to float, promo_price to float, in_stock to boolean

In [23]:
products['price'] = products['price'].str.replace('.', '', regex=False).astype(float)

In [24]:
products['promo_price'] = products['promo_price'].str.replace('.', '', regex=False).astype(float)

In [25]:
products['in_stock'] = products['in_stock'].astype(bool)

In [26]:
brands.dtypes

short    object
long     object
dtype: object

In [27]:
brands.head()

Unnamed: 0,short,long
0,8MO,8Mobility
1,ACM,Acme
2,ADN,Adonit
3,AII,Aiino
4,AKI,Akitio


In [28]:
products.head()

Unnamed: 0,sku,name,desc,price,promo_price,in_stock,type
0,RAI0007,Silver Rain Design mStand Support,Aluminum support compatible with all MacBook,5999.0,499899.0,True,8696
1,APP0023,Apple Mac Keyboard Keypad Spanish,USB ultrathin keyboard Apple Mac Spanish.,59.0,589996.0,False,13855401
2,APP0025,Mighty Mouse Apple Mouse for Mac,mouse Apple USB cable.,59.0,569898.0,False,1387
3,APP0072,Apple Dock to USB Cable iPhone and iPod white,IPhone dock and USB Cable Apple iPod.,25.0,229997.0,False,1230
4,KIN0007,Mac Memory Kingston 2GB 667MHz DDR2 SO-DIMM,2GB RAM Mac mini and iMac (2006/07) MacBook Pr...,3499.0,3199.0,True,1364


In [29]:
products.tail()

Unnamed: 0,sku,name,desc,price,promo_price,in_stock,type
19321,BEL0376,Belkin Travel Support Apple Watch Black,compact and portable stand vertically or horiz...,2999.0,269903.0,True,12282
19322,THU0060,"Enroute Thule 14L Backpack MacBook 13 ""Black",Backpack with capacity of 14 liter compartment...,6995.0,649903.0,True,1392
19323,THU0061,"Enroute Thule 14L Backpack MacBook 13 ""Blue",Backpack with capacity of 14 liter compartment...,6995.0,649903.0,True,1392
19324,THU0062,"Enroute Thule 14L Backpack MacBook 13 ""Red",Backpack with capacity of 14 liter compartment...,6995.0,649903.0,False,1392
19325,THU0063,"Enroute Thule 14L Backpack MacBook 13 ""Green",Backpack with capacity of 14 liter compartment...,6995.0,649903.0,True,1392


In [31]:
orderlines

Unnamed: 0,id,id_order,product_quantity,sku,unit_price,date
0,1119109,299539,1,OTT0133,1899.0,2017-01-01 00:07:19
1,1119110,299540,1,LGE0043,39900.0,2017-01-01 00:19:45
2,1119111,299541,1,PAR0071,47405.0,2017-01-01 00:20:57
3,1119112,299542,1,WDT0315,6839.0,2017-01-01 00:51:40
4,1119113,299543,1,JBL0104,2374.0,2017-01-01 01:06:38
...,...,...,...,...,...,...
293978,1650199,527398,1,JBL0122,4299.0,2018-03-14 13:57:25
293979,1650200,527399,1,PAC0653,14158.0,2018-03-14 13:57:34
293980,1650201,527400,2,APP0698,999.0,2018-03-14 13:57:41
293981,1650202,527388,1,BEZ0204,1999.0,2018-03-14 13:58:01
