# Data overview

## orders.csv 
Every row in this file represents an order.

* **order_id** – a unique identifier for each order
* **created_date** – a timestamp for when the order was created
* **total_paid** – the total amount paid by the customer for this order, in euros
* **state** –
    * “Shopping basket” - products have been placed in the shopping basket
    - “Place Order” – the order has been placed, but is awaiting shipment details 
    - “Pending” – the order is awaiting payment confirmation
    - “Completed” – the order has been placed and paid, and the transaction is completed.
    - “Cancelled” – the order has been cancelled and the payment returned to the customer.

## orderlines.csv 
Every row represents each one of the different products involved in an order.

* **id** – a unique identifier for each row in this file
* **id_order** – corresponds to orders.order_id
* **product_id** – an old identifier for each product, nowadays not in use
* **product_quantity** – how many units of that product were purchased on that order
* **sku** – stock keeping unit: a unique identifier for each product
* **unit_price** – the unitary price (in euros) of each product at the moment of placing that order
* **date** – timestamp for the processing of that product

## products.csv

* **sku** – stock keeping unit: a unique identifier for each product
* **name** – product name
* **desc** – product description
* **price** – base price of the product, in euros
* **promo_price** – promotional price, in euros
* **in_stock** – whether or not the product was in stock at the moment of the data extraction
* **type** – a numerical code for product type

## brands.csv

* **short** – the 3-character code by which the brand can be identified in the first 3 characters of products.sku
* **long** – brand name

# Data cleaning
## Import the data

In [2]:
def start_pipeline(df):
    '''Make a copy of the pipeline to prevent corrupting the original data'''
    return df.copy()

def remove_missing_data(df, col):
    return df[~df[col].isna()]

    
orders_clean = (orders
                .pipe(start_pipeline)
                .pipe(remove_missing_data, col='total_paid')
                )

print(f"{orders.shape[0]-orders_clean.shape[0]} missing values were removed from orders.")
print(f"This represents {(orders.shape[0]-orders_clean.shape[0])/orders.shape[0] * 100:.2f}% of the data.")

# Save the data
orders_clean.to_csv(path + 'orders_clean.csv', index=False)

5 missing values were removed from orders.
This represents 0.00% of the data.


### Clean orderlines

In [3]:
def drop_deprecated_columns(df, col_list):
    return (df
            .drop(col_list, axis=1)
           )

def rename_columns(df, col_dict):
    return (df
            .rename(columns=col_dict)
           )

# Transform the unit_price price column to floats
def transform_unit_price_to_floats(df):
    return (
        df.assign(unit_price = df.unit_price.str.split('.')
                  .apply(lambda x : x[0]+x[1]+'.'+x[2] if len(x)==3 else x[0]+'.'+ x[1])
                  .astype(float)
        )
    )

def create_short_col(df):
    return df.assign(short = lambda row: row['sku'].str[:3])

orderlines_clean = (orderlines
                    .pipe(start_pipeline)
                    .pipe(drop_deprecated_columns, col_list=['product_id'])
                    .pipe(rename_columns, {'id_order': 'order_id'})
                    .pipe(transform_unit_price_to_floats)
                    .pipe(create_short_col)
                    )

print(f"{orderlines.shape[0]-orderlines_clean.shape[0]} missing values were removed from orderlines.")
print(f"This represents {(orderlines.shape[0]-orderlines_clean.shape[0])/orderlines.shape[0] * 100:.2f}% of the data.")

# Save the data
orderlines_clean.to_csv(path + 'orderlines_clean.csv', index=False)

0 missing values were removed from orderlines.
This represents 0.00% of the data.


In [32]:
orderlines_clean[orderlines_clean.product_quantity > 1]

Unnamed: 0,id,order_id,product_quantity,sku,unit_price,date,short
5,1119114,295310,10,WDT0249,231.79,2017-01-01 01:14:27,WDT
10,1119125,299548,5,SPE0132,35.14,2017-01-01 02:02:20,SPE
20,1119136,299558,2,WDT0141,112.99,2017-01-01 02:24:33,WDT
52,1119206,299589,4,TRK0009,28.49,2017-01-01 09:21:47,TRK
58,1119213,299595,2,OWC0153-4,286.99,2017-01-01 09:45:56,OWC
...,...,...,...,...,...,...,...
293925,1650113,527358,2,APP0927,13.99,2018-03-14 13:31:45,APP
293931,1650127,527360,6,APP0927,13.99,2018-03-14 13:33:17,APP
293947,1650152,527370,2,APP0698,9.99,2018-03-14 13:40:36,APP
293953,1650161,527375,2,OWC0255,46.99,2018-03-14 13:42:37,OWC


### Clean products

In [4]:
# Check for products without descriptions
names_of_products_without_descriptions = products[products.desc.isna()].name.tolist()

# Add missing descriptions
missing_descriptions = [
    '2TB Mac hard drive and Nas',
    'Apple keyboard for iPad 9.7',
    'NAS server with 10GB RAM',
    'Ethernet adapter for Macbook 12',
    'Luxury power bank combined with powder, 2 mirrors - normal and 3x magnification, Illuminated under mirror with LED, Low weight and compact dimensions',
    'Battery capacity: 20,000 mAh; ultra-stable: outer shell made of durable synthetic rubber (military standard, withstands drops from up to 2 metres) ; protection: dust and splash proof: military standard iP54; battery level indicator and super fast charging; USB port can be connected to charger and other devices',
    'Smart thermostat designed to provide automatic time and temperature control of heating systems in homes and apartments. '
]

def add_missing_product_descriptions(df):
    for i in range(len(names_of_products_without_descriptions)):
        df.loc[df.name == names_of_products_without_descriptions[i], 'desc'] = missing_descriptions[i]
    return df

def drop_duplicate_rows_by_column(df, col):
    return df.drop_duplicates(subset=col)

def remove_missing_prices(df, col):
    return df[~df[col].isna()]

products_clean = (products
        .pipe(start_pipeline)
        .pipe(drop_deprecated_columns, col_list=['type', 'in_stock']) 
        .pipe(add_missing_product_descriptions)
        .pipe(remove_missing_prices, col='price')
        .pipe(drop_duplicate_rows_by_column, 'sku')
)

print(f"{products.shape[0]-products_clean.shape[0]} missing values were removed from products")
print(f"This represents {(products.shape[0]-products_clean.shape[0])/products.shape[0] * 100:.2f}% of the data.")

# Save the data
products_clean.to_csv(path + 'products_clean.csv', index=False)

8792 missing values were removed from products
This represents 45.49% of the data.


## Data merging and secondary cleaning process
Merge orders, orderlines, brands and products to fix corrupted price data and compare discounts

In [7]:
col_order = [
    'order_id',
    'orderline_id',
    'date',
    'name',
    'desc',
    'brand',
    'sku',
    'category',
    'total_paid',
    'product_quantity',
    'regular_price',
    'promo_price',
    'sale_price'
]

def reorder_columns(df, col_list):
    return df[col_order]
    

def assign_product_categories(df):
    apple_regexp_dict = {
        'iPod': '^.{0,7}apple ipod',
        'iPhone':  'apple iphone',
        'iPad':  'apple ipad',
        'Mac':  'apple macbook|apple iMac|apple Mac mini|desktop computer',
    }
    
    other_regexp_dict = {        
        'Smartwatch':'withings|watch|fitbit|apple watch|smartwatch|smart watch',
        'Accessories': 'kit|strap|armband|belt|bracelet|stylus|pen|Bamboo Wacom Intuos|pencil|pen|rubber pointers|screwdriver|case|funda|housing|casing|folder|bag|backpack|cable|connector|Lightning to USB|Wall socket|power strip|adapter|battery|headset|headphones|mouse|trackpad|stand|support|protect|cover|sleeve|Screensaver|shellhub|dock|microphone|keyboard|keypad',
        'Hardware': 'Philips Hue|temperature sensor|display|monitor|camera|charger|speaker|router|repeater|Synology|nas|server|Parrot FPV Glasses|Command Pack 2 Skycontroller|Apple TV',
        'Software':  'adobe|Office 365|Office Home and Student|software|parallels',
        'Memory': 'hard disk|hard drive|flash drive|USB 2.0 key|USB 2.0 pen|SSD|pendrive|raid|SDHC|sata|memory card|Portable Hard Thunderbolt',
        'Repairs & warranties': 'repair|parts and labor|warranty|applecare|license|protection|installation',
    }
    
    df = df.assign(category = 'unknown')
    
    # Find main apple items
    for label, val in apple_regexp_dict.items(): 
        regexp = re.compile(val, flags=re.IGNORECASE)
        df = (
            df
            .assign(
                category = lambda x: np.where(
                    ((x['desc'].str.contains(regexp, regex=True))|(x['name'].str.contains(regexp, regex=True))) &
                    (x['category'] == 'unknown') & (x['brand'] == 'Apple'), 
                    label, x['category'])
            )
        )
    
    # Find other items
    for label, val in other_regexp_dict.items(): 
        regexp = re.compile(val, flags=re.IGNORECASE)
        df = (
            df
            .assign(
                category = lambda x: np.where(
                    ((x['desc'].str.contains(regexp, regex=True))|(x['name'].str.contains(regexp, regex=True))) &
                    (x['category'] == 'unknown'), label, x['category'])
            )
        )
    
    return df

def merge_dataframes(df, merge_df, col):
    return df.merge(merge_df, on=col)

def drop_uncompleted_orders(df):
    return df[df.state=='Completed']

completed_sales = (orders_clean
                   .pipe(start_pipeline)
                   .pipe(drop_uncompleted_orders)
                   .pipe(merge_dataframes, orderlines_clean, 'order_id')
                   .pipe(merge_dataframes, products_clean, 'sku')
                   .pipe(merge_dataframes, brands, 'short')
                   .pipe(rename_columns, col_dict={'long': 'brand', 'unit_price': 'sale_price', 'price': 'regular_price', 'id': 'orderline_id'})
                   .pipe(drop_deprecated_columns, col_list=['short', 'created_date', 'state'])
                   .pipe(assign_product_categories)
                   .pipe(reorder_columns, col_order)
             )

completed_sales['category'].value_counts()

category
Accessories             30123
Memory                   7183
unknown                  7121
Hardware                 6202
iPhone                   3823
Smartwatch               2913
Mac                      2652
iPad                     1432
Repairs & warranties      109
Software                   60
iPod                       54
Name: count, dtype: int64

In [13]:
completed_sales.head(5)

Unnamed: 0,order_id,orderline_id,date,name,desc,brand,sku,category,total_paid,product_quantity,regular_price,promo_price,sale_price
0,241423,1398738,2017-11-06 12:47:20,LaCie Porsche Design Desktop Drive 4TB USB 3.0...,External Hard Drive 4TB 35-inch USB 3.0 for Ma...,LaCie,LAC0212,Memory,136.15,1,139.99,1.149.948,129.16
1,242832,1529178,2017-12-31 17:26:40,Parrot 550mAh battery for MiniDrones,550mAh rechargeable battery for Parrot minidrones,Parrot,PAR0074,Accessories,15.76,1,17.99,109.904,10.77
2,243330,1181923,2017-02-15 17:07:44,Mac OWC Memory 8GB 1066MHZ DDR3 SO-DIMM,8GB RAM Mac mini iMac MacBook and MacBook Pro ...,OWC,OWC0074,unknown,84.98,1,99.99,999.896,77.99
3,245275,1276706,2017-06-28 11:12:30,Tado Smart Climate Control Intelligent AC,intelligent control air conditioning works wit...,Tado,TAD0007,Accessories,149.0,1,179.0,1.489.994,149.0
4,245595,1154394,2017-01-21 12:49:00,"Macally External Hard Drive 1TB 35 ""USB 3.0 SA...",Aluminum External Hard Drive 1TB capacity form...,Pack,PAC1561,Memory,112.97,2,103.95,59.584,52.99


### Data integrity check
##### Ensure that the total_paid value is the same for all rows with identical order_id values.

In [14]:
# Group by order_id and check the number of unique total_paid values for each group
inconsistent_orders = completed_sales.groupby('order_id')['total_paid'].nunique()

# Filter for orders where there is more than 1 unique total_paid value
inconsistent_orders = inconsistent_orders[inconsistent_orders > 1]

# Check if there are any inconsistencies
if not inconsistent_orders.empty:
    print("Inconsistent 'total_paid' values found for the following order_ids:")
    print(inconsistent_orders)
else:
    print("All 'total_paid' values are consistent for each 'order_id'.")


All 'total_paid' values are consistent for each 'order_id'.


##### Ensure that for each order_id, the sum of the prices for each product within the order is equal to total_paid

In [50]:
num_orders = completed_sales.order_id.nunique()

# Calculate the total sale for each row by multiplying sale_price by product_quantity
completed_sales['calculated_total'] = completed_sales['sale_price'] * completed_sales['product_quantity']

# Group by order_id and sum the calculated totals for each order
calculated_totals = completed_sales.groupby('order_id')['calculated_total'].sum()

# Group by order_id and get the unique total_paid values for each order
paid_totals = completed_sales.groupby('order_id')['total_paid'].first()

# completed_sales.drop('calculated_total', axis=1, inplace=True)

# Compare the calculated totals with the total_paid values
inconsistent_orders_filter = calculated_totals != paid_totals
inconsistent_order_ids = inconsistent_orders[inconsistent_orders_filter].index

"""
# Check for inconsistencies
if inconsistent_order_ids.any():
    print("Inconsistent orders found where calculated totals do not match total_paid:")
    print(completed_sales[completed_sales['order_id'].isin(inconsistent_orders[inconsistent_orders].index)])
else:
    print("All orders have consistent total_paid values.")
"""

inconsistent_order_ids

Index([241423, 242832, 243330, 245595, 246018, 246405, 247524, 250275, 251688,
       251969,
       ...
       527033, 527034, 527035, 527036, 527038, 527042, 527070, 527074, 527096,
       527112],
      dtype='int64', name='order_id', length=33167)

In [51]:
inconsistent_orders[inconsistent_orders_filter].count()

33167

In [49]:
num_orders

46361

In [40]:
completed_sales[completed_sales.product_quantity > 1]

Unnamed: 0,order_id,orderline_id,date,name,desc,brand,sku,category,total_paid,product_quantity,regular_price,promo_price,sale_price,calculated_total
4,245595,1154394,2017-01-21 12:49:00,"Macally External Hard Drive 1TB 35 ""USB 3.0 SA...",Aluminum External Hard Drive 1TB capacity form...,Pack,PAC1561,Memory,112.97,2,103.95,59.584,52.99,105.98
24,253220,1369930,2017-10-04 19:15:16,"LG 27UD88-W Monitor 27 ""UHD 4K USB 3.0 USB-C",99% Professional Monitor sRGB color calibrator...,LG,LGE0044,Accessories,1610.00,2,599,5.599.892,559.99,1119.98
45,256799,1616190,2018-02-16 09:42:27,Mophie Powerstation 4000mAh Battery Plus Mini ...,external battery capacity 4000mAh output volta...,Mophie,MOP0107,Accessories,20.97,2,69.95,269.903,3.99,7.98
53,259192,1401175,2017-11-09 10:45:29,"Seagate Barracuda 2TB 35 ""SATA hard drive Mac ...",Internal Hard Drive 2TB Mac and PC (ST2000DM006),Seagate,SEA0039,Memory,195.74,2,89.99,639.945,70.58,141.16
67,263738,1319083,2017-08-16 17:01:01,Belkin Valet Charging Dock Cable Apple Watch W...,Magnetic Charging Dock cable to Apple Watch.,Belkin,BEL0189,Smartwatch,80.96,2,69.99,379.904,29.99,59.98
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61655,527022,1649392,2018-03-14 11:36:03,EarPods Apple Headphones with Remote and Mic (...,EarPods headphones Apple iPhone iPad and iPod ...,Apple,APP0927,iPhone,31.97,2,35,13.99,13.99,27.98
61666,527038,1649436,2018-03-14 11:41:33,Apple Lightning Cable Connector to USB 1m Whit...,Apple Lightning USB Cable 1 meter to charge an...,Apple,APP0698,Accessories,24.97,2,25,99.898,9.99,19.98
61668,527070,1649512,2018-03-14 11:49:01,Apple Lightning Cable Connector to USB 1m Whit...,Apple Lightning USB Cable 1 meter to charge an...,Apple,APP0698,Accessories,24.97,2,25,99.898,9.99,19.98
61669,527074,1649522,2018-03-14 11:49:36,Apple Lightning Cable Connector to USB 1m Whit...,Apple Lightning USB Cable 1 meter to charge an...,Apple,APP0698,Accessories,24.97,2,25,99.898,9.99,19.98


In [44]:
completed_sales[completed_sales.order_id == 263738]

Unnamed: 0,order_id,orderline_id,date,name,desc,brand,sku,category,total_paid,product_quantity,regular_price,promo_price,sale_price,calculated_total
67,263738,1319083,2017-08-16 17:01:01,Belkin Valet Charging Dock Cable Apple Watch W...,Magnetic Charging Dock cable to Apple Watch.,Belkin,BEL0189,Smartwatch,80.96,2,69.99,379.904,29.99,59.98
68,263738,1319085,2017-08-16 17:01:54,Elago Airpod Charging stand Charging Stand Bla...,Silicone holder for positioning and loading th...,Elago,ELA0027,Accessories,80.96,1,19.95,13.99,13.99,13.99


In [23]:

# Calculate the total sale for each row by multiplying sale_price by product_quantity
completed_sales['calculated_total'] = completed_sales['sale_price'] * completed_sales['product_quantity']

# Group by order_id and sum the calculated totals for each order
calculated_totals = completed_sales.groupby('order_id')['calculated_total'].sum()

# Group by order_id and get the unique total_paid values for each order
paid_totals = completed_sales.groupby('order_id')['total_paid'].first()

# Compare the calculated totals with the total_paid values
inconsistent_orders = calculated_totals != paid_totals

inconsistent_order_ids = inconsistent_orders[inconsistent_orders].index

"""
# Check for inconsistencies
if inconsistent_order_ids.any():
    print("Inconsistent orders found where calculated totals do not match total_paid:")
    print(completed_sales[completed_sales['order_id'].isin(inconsistent_orders[inconsistent_orders].index)])
else:
    print("All orders have consistent total_paid values.")
"""

inconsistent_orders

order_id
241423     True
242832     True
243330     True
245275    False
245595     True
          ...  
527042     True
527070     True
527074     True
527096     True
527112     True
Length: 46361, dtype: bool

In [27]:
# completed_sales[completed_sales['order_id'].isin(inconsistent_orders[inconsistent_orders].index)]
completed_sales[completed_sales['order_id'].isin(inconsistent_order_ids)]
completed_sales[completed_sales['order_id'] == 241423]

Unnamed: 0,order_id,orderline_id,date,name,desc,brand,sku,category,total_paid,product_quantity,regular_price,promo_price,sale_price,calculated_total
0,241423,1398738,2017-11-06 12:47:20,LaCie Porsche Design Desktop Drive 4TB USB 3.0...,External Hard Drive 4TB 35-inch USB 3.0 for Ma...,LaCie,LAC0212,Memory,136.15,1,139.99,1.149.948,129.16,129.16


In [10]:
# def split_and_join_regular_prices(df):
#     '''Remove the decimal points from the regular_price strings and append .00 to the string'''
#     return df.assign(price=df.regular_price
#                      .str.split('.')
#                      .str.join('')
#                      .apply(lambda x: x+'.00')
#                     )

# def split_and_join_promo_prices(df):
#     '''Remove the decimal points from the promo_price strings and append .00 to the string'''
#     return df.assign(promo_price=df.promo_price
#                      .str.split('.')
#                      .str.join('')
#                      .apply(lambda x: x+'.00')
#                     )

def split_str_on_dots_and_append_decimal(df, col):
    '''Remove the decimal points from the strings and append .00'''
    return df.assign(promo_price=df[col]
                     .str.split('.')
                     .str.join('')
                     .apply(lambda x: x+'.00')
                    )

def _insert_decimal_at_string_position(s, pos):
    '''Insert a decimal point at a given position in a string'''
    s = s.split('.')
    s = s[0] + s[1]
    s = s[:pos]+'.'+s[pos:]
    return s 
    
def _insert_decimal_in_regular_price(row):
    '''
    Keep moving the decimal point towards the end of the regular_price 
    string until the sale_price is lower or equal to the price.
    Then transform the regular_price string to a float and round it to two decimal places.
    '''
    decimal_position = 1
    row.regular_price = _insert_decimal_at_string_position(row.regular_price, decimal_position)
    
    while float(row.regular_price) < row.sale_price:
        if round(float(row.regular_price), 0) == round(row.sale_price, 0):
            row.sale_price = round(float(row.price), 2)
            return round(float(row.regular_price), 2)
        else:
            row.regular_price = _insert_decimal_at_string_position(row.regular_price, decimal_position)
            decimal_position += 1

    return round(float(row.price), 2)

def transform_regular_price_to_float(df):
    df.regular_price = [_insert_decimal_in_regular_price(row) for index, row in df.iterrows()]
    return df

def _insert_decimal_in_promo_price(row, decimal_position = -2):
    '''
    If the euro value of the regular_price is equal to the euro value of the promo_price, 
    set the promo_price equal to regular_price and return it.
    This is because some promo_prices are slightly larger than their equivalent prices, e.g. 12.95 - 12.99
    
    Otherwise, keep moving the decimal point towards the start of the string until the promo_price is lower than the price.
    Then transform the string to a float and round it to two decimal places.
    '''
    
    while float(row.promo_price) > row.regular_price:
        if round(float(row.promo_price), 0) == round(row.regular_price, 0):
            row.promo_price = row.regular_price
            return row.promo_price
        else:
            row.promo_price = _insert_decimal_at_string_position(row.promo_price, decimal_position)
            decimal_position += -1
    
    return round(float(row.promo_price), 2)

def transform_promo_price_to_floats(df):
    df.promo_price = [_insert_decimal_in_promo_price(row) for index, row in df.iterrows()]
    return df
    
def calculate_products_discounts(df):
    return df.assign(
        discount=round(df.price - df.promo_price, 2),
        discount_pc = round((df.price - df.promo_price)/df.price * 100, 2)
    )

def calculate_sales_discounts(df):
    return (df
            .assign(
                sales_discount=round(df.price - df.sale_price, 2),
                sales_discount_pc = round((df.price - df.sale_price)/df.price * 100, 2)
            )
           )

# Just run it on the completed orders because it takes forever...
temp = sales_info[sales_info.state=='Completed'].copy()

temp = (temp
        .pipe(start_pipeline)
        .pipe(split_str_on_dots_and_append_decimal, 'regular_price')
        #.pipe(split_and_join_regular_prices)
        .pipe(transform_regular_price_to_float)
        .pipe(split_str_on_dots_and_append_decimal, 'promo_price')
        #.pipe(split_and_join_promo_prices)
        .pipe(transform_promo_price_to_floats)
        .pipe(calculate_products_discounts)
        .pipe(calculate_sales_discounts)
)

sales_info = temp.copy()

sales_info.to_csv(path + 'sales_info_clean.csv', index=False)

In [11]:
sales_info

Unnamed: 0,id,order_id,product_quantity,sku,sale_price,date,name,desc,price,promo_price,brand,total_paid,state,category,discount,discount_pc,sales_discount,sales_discount_pc
6,1119116,299545,1,OWC0100,47.49,2017-01-01 01:46:16,OWC In-line Digital Temperature Sensor Kit HDD...,Kit temperature sensor for HDD iMac 21 inch an...,60.99,49.99,OWC,51.48,Completed,Accessories,11.00,18.04,13.50,22.13
7,1119119,299546,1,IOT0014,18.99,2017-01-01 01:50:34,iOttie Easy View 2 Car Black Support,IPhone car holder 7 plus / 7/6 Plus / 6 / 5s /...,22.95,16.99,iOttie,18.99,Completed,Accessories,5.96,25.97,3.96,17.25
8,1119120,295347,1,APP0700,72.19,2017-01-01 01:54:11,Apple 85W MagSafe 2 charger MacBook Pro screen...,Apple MagSafe 2 Charger for MacBook Pro 15-inc...,89.00,64.99,Apple,72.19,Completed,Hardware,24.01,26.98,16.81,18.89
10,1119126,299549,1,PAC0929,2565.99,2017-01-01 02:07:42,"Apple iMac 27 ""Core i5 3.2GHz Retina 5K | 32GB...",IMac desktop computer 27 inch Retina 5K RAM 32...,3209.00,2667.99,Pack,2565.99,Completed,unknown,541.01,16.86,643.01,20.04
17,1119134,299556,1,CRU0039-A,60.90,2017-01-01 02:20:14,(Open) Crucial 240GB SSD 7mm BX200,SSD hard drive and high-speed performance with...,76.99,70.70,Crucial,65.89,Completed,Accessories,6.29,8.17,16.09,20.90
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
291384,1649474,525664,1,TUC0207,16.52,2018-03-14 11:45:05,Tucano Elements Second Skin Macbook Sleeve 12 ...,velvety inner protective case for MacBook 12 i...,24.99,19.99,Tucano,85.73,Completed,Accessories,5.00,20.01,8.47,33.89
291401,1649512,527070,2,APP0698,9.99,2018-03-14 11:49:01,Apple Lightning Cable Connector to USB 1m Whit...,Apple Lightning USB Cable 1 meter to charge an...,25.00,9.99,Apple,24.97,Completed,Accessories,15.01,60.04,15.01,60.04
291406,1649522,527074,2,APP0698,9.99,2018-03-14 11:49:36,Apple Lightning Cable Connector to USB 1m Whit...,Apple Lightning USB Cable 1 meter to charge an...,25.00,9.99,Apple,24.97,Completed,Accessories,15.01,60.04,15.01,60.04
291429,1649565,527096,3,APP0698,9.99,2018-03-14 11:54:35,Apple Lightning Cable Connector to USB 1m Whit...,Apple Lightning USB Cable 1 meter to charge an...,25.00,9.99,Apple,34.96,Completed,Accessories,15.01,60.04,15.01,60.04


In [9]:
orders['total_paid'].sum()

129159615.07

In [12]:
orders.head(5) # ['total_paid']

Unnamed: 0,order_id,created_date,total_paid,state
0,241319,2017-01-02 13:35:40,44.99,Cancelled
1,241423,2017-11-06 13:10:02,136.15,Completed
2,242832,2017-12-31 17:40:03,15.76,Completed
3,243330,2017-02-16 10:59:38,84.98,Completed
4,243784,2017-11-24 13:35:19,157.86,Cancelled


In [13]:
orderlines['total_price'] = orderlines['product_quantity'] * orderlines['unit_price']
sum(orderlines['total_price'])

128658082.36