# Eniac —the e-commerce

The company has high hopes put into the possibilities that come with Data Analysis, and they are especially hopeful that your work can finally settle an ongoing debate: whether or not it’s beneficial to discount products.

* The Marketing Team Lead is convinced that offering discounts is beneficial in the long run. She believes discounts improve customer acquisition, satisfaction and retention, and allow the company to grow.
* The main investors in the Board are worried about offering aggressive discounts. They have pointed out how the company’s recent quarterly results showed an increase in orders placed, but a decrease in the total revenue. They prefer that the company positions itself in the quality segment, rather than competing to offer the lowest prices in the market.

In [2]:
%pip install pandas numpy

Note: you may need to restart the kernel to use updated packages.


## Load necessary libraries and datasets

In [3]:
import pandas as pd
import numpy as np 
import math

brands_df = pd.read_csv("./Datasets/brands.csv")
orderlines_df = pd.read_csv("./Datasets/orderlines.csv")
orders_df = pd.read_csv("./Datasets/orders.csv")
products_df = pd.read_csv("./Datasets/products.csv")

## First glance and task statements

### Brands

In [4]:
brands_df.info(), 
brands_df.duplicated().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 187 entries, 0 to 186
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   short   187 non-null    object
 1   long    187 non-null    object
dtypes: object(2)
memory usage: 3.1+ KB


np.int64(0)

Clear 0 duplicates/ 0 null shape(187,2)

### Orderlines

In [5]:
orderlines_df.info() 
orderlines_df.duplicated().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 293983 entries, 0 to 293982
Data columns (total 7 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   id                293983 non-null  int64 
 1   id_order          293983 non-null  int64 
 2   product_id        293983 non-null  int64 
 3   product_quantity  293983 non-null  int64 
 4   sku               293983 non-null  object
 5   unit_price        293983 non-null  object
 6   date              293983 non-null  object
dtypes: int64(4), object(3)
memory usage: 15.7+ MB


np.int64(0)

In [6]:
orderlines_df.head()

Unnamed: 0,id,id_order,product_id,product_quantity,sku,unit_price,date
0,1119109,299539,0,1,OTT0133,18.99,2017-01-01 00:07:19
1,1119110,299540,0,1,LGE0043,399.0,2017-01-01 00:19:45
2,1119111,299541,0,1,PAR0071,474.05,2017-01-01 00:20:57
3,1119112,299542,0,1,WDT0315,68.39,2017-01-01 00:51:40
4,1119113,299543,0,1,JBL0104,23.74,2017-01-01 01:06:38


In [7]:
orderlines_df.describe()

Unnamed: 0,id,id_order,product_id,product_quantity
count,293983.0,293983.0,293983.0,293983.0
mean,1397918.0,419999.116544,0.0,1.121126
std,153009.6,66344.486479,0.0,3.396569
min,1119109.0,241319.0,0.0,1.0
25%,1262542.0,362258.5,0.0,1.0
50%,1406940.0,425956.0,0.0,1.0
75%,1531322.0,478657.0,0.0,1.0
max,1650203.0,527401.0,0.0,999.0


### Orderlines_df tasks
We need to:
* drop product_id - all nulls
* convert unit_price to float
* convert date from object to date


### Orders

In [8]:
orders_df.info() 
orders_df.duplicated().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 226909 entries, 0 to 226908
Data columns (total 4 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   order_id      226909 non-null  int64  
 1   created_date  226909 non-null  object 
 2   total_paid    226904 non-null  float64
 3   state         226909 non-null  object 
dtypes: float64(1), int64(1), object(2)
memory usage: 6.9+ MB


np.int64(0)

In [9]:
orders_df['total_paid'].isna().value_counts()

total_paid
False    226904
True          5
Name: count, dtype: int64

In [8]:
orders_df.head()

Unnamed: 0,order_id,created_date,total_paid,state
0,241319,2017-01-02 13:35:40,44.99,Cancelled
1,241423,2017-11-06 13:10:02,136.15,Completed
2,242832,2017-12-31 17:40:03,15.76,Completed
3,243330,2017-02-16 10:59:38,84.98,Completed
4,243784,2017-11-24 13:35:19,157.86,Cancelled


### Orders_df tasks
We need to:

* convert date from object to date
* delete all null values in total_paid columns

### Products

In [9]:
products_df.info() 
products_df.duplicated().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19326 entries, 0 to 19325
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   sku          19326 non-null  object
 1   name         19326 non-null  object
 2   desc         19319 non-null  object
 3   price        19280 non-null  object
 4   promo_price  19326 non-null  object
 5   in_stock     19326 non-null  int64 
 6   type         19276 non-null  object
dtypes: int64(1), object(6)
memory usage: 1.0+ MB


np.int64(8746)

In [40]:
products_df.describe()


Unnamed: 0,in_stock
count,19326.0
mean,0.109593
std,0.31239
min,0.0
25%,0.0
50%,0.0
75%,0.0
max,1.0


In [39]:
products_df.sample(10)

Unnamed: 0,sku,name,desc,price,promo_price,in_stock,type
12730,PAC1457,WD My Cloud EX4100 Pack | WD 16TB Network,WD My Cloud EX4100 + 16TB (4x4TB) Network WD H...,1275.0,9.629.906,0,11935397
13848,APP1851,"Apple MacBook Pro 13 ""Core i5 with Touch Bar 3...",New MacBook Pro 13-inch Core i5 Touch Bar to 3...,2799.0,2.665.584,0,2158
15120,SAT0010-A,Open - Satechi Sonic Dual Conical Mac v2.0 Spe...,Speakers matte finish sleek design and volume ...,39.99,263.913,0,1298
8325,PAC0930,"Apple iMac 27 ""Core i5 3.2GHz Retina 5K | 16GB...",IMac desktop computer 27 inch Retina 5K RAM 16...,2849.0,25.759.896,0,1282
6477,PAC1070,"Apple iMac 27 ""Core i5 3.3GHz Retina 5K | 32GB...",IMac desktop computer 27 inch 5K Retina i5 3.3...,4369.0,37.659.895,0,"5,74E+15"
17086,WAC0236-A,Open - Education - A4 Wacom Bamboo Slate Gray,Smart Bloc notes A4 size reconditioned app inc...,149.99,966.599,0,1298
10131,PAC1591,"Apple iMac 27 ""Core i5 3.2GHz Retina 5K | RAM ...",Desktop computer iMac 27-inch 3.2GHz Core i5 5...,3409.0,27.249.902,0,"5,74E+15"
16260,PAC2119,"Apple iMac 27 ""Core i7 Retina 5K 42GHz | 32GB ...",IMac desktop computer 27 inch Retina 5K RAM 32...,3799.0,33.750.045,0,"5,74E+15"
4141,APP1389,"Apple iMac 27 ""Core i5 3.3GHz Retina 5K | 8GB ...",IMac desktop computer 27 inch 8GB RAM 512GB Re...,3169.0,30.175.839,0,"5,74E+15"
15857,DRO0030,5N2 Drobo NAS server Mac and PC,5-bay NAS server with two Gigabit Ethernet por...,689.0,482.99,0,1280


In [10]:
products_df.head()

Unnamed: 0,sku,name,desc,price,promo_price,in_stock,type
0,RAI0007,Silver Rain Design mStand Support,Aluminum support compatible with all MacBook,59.99,499.899,1,8696
1,APP0023,Apple Mac Keyboard Keypad Spanish,USB ultrathin keyboard Apple Mac Spanish.,59.0,589.996,0,13855401
2,APP0025,Mighty Mouse Apple Mouse for Mac,mouse Apple USB cable.,59.0,569.898,0,1387
3,APP0072,Apple Dock to USB Cable iPhone and iPod white,IPhone dock and USB Cable Apple iPod.,25.0,229.997,0,1230
4,KIN0007,Mac Memory Kingston 2GB 667MHz DDR2 SO-DIMM,2GB RAM Mac mini and iMac (2006/07) MacBook Pr...,34.99,31.99,1,1364


### Products_df tasks
We need to:

* drop 8746 duplicates
* solve problem with additional dots in price and promo_price columns
* convert price and promo_price
* deside what to do with in_stock columns
