# Changes 

In version 2, the variance in the recency score and frequency wasn't a lot which basically left only the cost of the product to determine the rfm score's value and therefore the loyalty of a customer. To overcome that problem in this iteration we have combined the transactions data with a more updated version.

- Combined transactions data used from year 2020 and then 2021-03 to 2021-07. 

#### Additions to dataframes

- Consider the returns data


#### Addition to metrics
- While looking at the product distributions the categories each customer bought could be considered, a loyal customer is more likely to buy items from different categories
- Satisfaction : bought more than one product on different dates and didnt return them
- Customers buying flagship products - keep this as optional (for clients)
- average monetary value instead of total sum
- Optimize RFM - qcut could be lower ranged 
- When a customer jumps from one loyalty score to another (of a certain range) - what products did they buy during that time?

#### Additional parameters (notes)
- More repeat purchases
- Higher retention
- Higher average order value (AOV)
- Higher engagement
- Forgiveness for poor service
- More vocal customer advocates

In [2]:
#data processing and math
import pandas as pd

#visualization
import seaborn as sns
import plotly.offline as py
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import plotly.express as px

from datetime import datetime,timedelta

#pandas setting to see all columns and rows
pd.set_option("display.max_columns", 50)

In [2]:
#set path accordingly if running in your own local system and make sure you have the required data from gcp
path = '/home/td/Desktop/cerebra_clients_data/data_ridgewallet/'

In [21]:
#transactions data
combined_transformed_transaction_lines_df = pd.read_parquet(path+'combined_data_points/ridgewallet_parquet_files_combined_transformed_transaction_lines_part-00000-f186deaa-9bcb-4e68-ae38-810a9582b681-c000.snappy.parquet')
combined_transformed_transaction_lines_df_2 = pd.read_parquet(path+'combined_data_points/latest_combined_transaction_lines_20210301_20210708.parquet')
dfs = [combined_transformed_transaction_lines_df, combined_transformed_transaction_lines_df_2]
combined_transactions = pd.concat(dfs)

#customer data to get customer's affinity to marketing and their order count
customer_df = pd.read_json(path+'/customers/ridgewallet_raw_data_customers_customers.json')

#merchandise data for product information
merchandise_df = pd.read_parquet(path+'/combined_data_points/ridgewallet_parquet_files_combined_merchandise_part-00000-c2c7c29c-d500-45b8-b045-08a71464a5da-c000.snappy.parquet')

### Cleaning the input data

- Getting rid of customer ids with '#' and 'null' customer ids in transactions_df
- Dropping columns which arent being used from customer_df and coverting customer_ids in it to int format

In [26]:
#dropping # and none customer_ids
combined_transactions = combined_transactions[combined_transactions['customer_id'].str.contains("#")==False]
combined_transactions = combined_transactions[combined_transactions['customer_id'].str.contains("None")==False]
combined_transactions = combined_transactions[combined_transactions['customer_id'].str.contains("nan")==False]

#actual metric file has these checks
#combined_transactions = combined_transactions[combined_transactions.customer_id != 'None']
#combined_transactions = combined_transactions[combined_transactions.customer_id != 'nan']

#dropping unnecessary columns which arent being used
"""
customer_df = customer_df.drop(['email','created_at','updated_at', 'state', 'last_order_id', 'note', 'multipass_identifier',                                
                              'tax_exempt','tags','last_order_name','currency','addresses','accepts_marketing_updated_at',
                             'marketing_opt_in_level','tax_exemptions','admin_graphql_api_id','default_address'], axis=1)
"""
#converting customer_id and product_id to int
customer_df = customer_df.rename(columns={"id":"customer_id"}) 
combined_transactions['customer_id']=combined_transactions.customer_id.astype(str)


# RFM calculation & Custom Heuristics

- Recency : 

    1. Calculate the last datetime and use it instead of present datetime
    2. Calculate the number of days from overall last orderdate of the entire transaction dataframe
    3. Recency score is inversely proportional to number of days   
 
 
- Frequency :
    1. For each customer check how many times they placed an order (each order id) 
    2. Could be over different dates
    3. Quantity of items bought not taken into account
  
 
- Monetary :
    1. Calculate mean of the revenue bought in by each customer over entire transaction date = avg sales attributed to a customer
 
 
- Parameters considered :
    1. accepts marketing = true
    2. orders_count > 1
    3. verified email = true
    
    
## Merging 
- Merge customer_loyalty_df with customer_df to get a final_output_df of the customers who fulfill the parameters considered.
- Merge final_output_df with merchandise_df to match the product ids of the various orders to products

In [29]:
#assuming that the transactions are always updated

#we use the last date in transactions for the recency calculation instead of using the present date, to relatively balance the recency values
last_date_time_in_transactions = combined_transactions['order_date'].max() + timedelta(days=1)

#need ways to optimize this function/calculation for a large dataframe it would take a long time
#for starters, using batching to divide the data into small batches while doing this calculation
rfm = combined_transactions.groupby(['customer_id']).agg({'order_date': lambda x: (last_date_time_in_transactions - x.max()).days,
                                                                             'order_id': 'count',
                                                                             'revenue': 'mean'})
rfm.rename(columns = {'order_date': '#days_since_last_order',
                     'order_id': 'frequency',
                     'revenue': 'avg_monetary_value'}, inplace=True)

#changing the index of customer-ids into a column
rfm = rfm.reset_index(level=['customer_id'])

In [30]:
rfm

Unnamed: 0,customer_id,#days_since_last_order,frequency,avg_monetary_value
0,1000000028746,849,1,115.0
1,1000000290890,849,1,72.0
2,1000000880714,849,1,230.0
3,1000001568842,849,1,7.0
4,1000003371082,849,1,115.0
...,...,...,...,...
774509,999976304714,503,2,98.5
774510,999980957770,91,4,55.5
774511,999991050314,849,1,105.0
774512,999997669450,849,1,105.0


In [11]:
rfm

Unnamed: 0,customer_id,frequency,order_id,avg_monetary_value
0,#17029090181194,206,1,87.0
1,#17032455553098,206,1,210.0
2,#17042727403594,206,1,75.0
3,#17065714057290,206,1,500.0
4,#17071264792650,206,1,105.0
...,...,...,...,...
774638,999976304714,503,2,98.5
774639,999980957770,91,4,55.5
774640,999991050314,849,1,105.0
774641,999997669450,849,1,105.0


In [31]:
r_score_range = range(1,6)
#f_score_range = range(1,3)
m_score_range = range(1,4)

r_score = pd.qcut(rfm['#days_since_last_order'], q=5, labels=r_score_range)

#q=5 for frequency because otherwise it doesn't split the distribution into two groups
#f_score = pd.qcut(rfm['frequency'], q=10, labels=f_score_range, duplicates='drop')

#taking frequency directly instead of qcutting it

m_score = pd.qcut(rfm['avg_monetary_value'], q=3, labels=m_score_range, duplicates='drop')

#'R_rev' because higher number of days reflects lower recency score 
rfm_with_labels = rfm.assign(R_rev = r_score.values, M = m_score.values)

#converting all the columns to int
cols = ['R_rev','frequency','M']
rfm_with_labels[cols]=rfm_with_labels[cols].apply(pd.to_numeric, errors="raise", axis=1)

#adjusting the R score
rfm_with_labels['R'] = 6 - rfm_with_labels['R_rev']

#adding RFM scores
rfm_with_labels['loyalty_score'] = rfm_with_labels[['R', 'frequency', 'M']].sum(axis=1)

#cleaning up the final dataframe
rfm_with_score = rfm_with_labels.reset_index()
customer_loyalty_df = rfm_with_score.drop(['index','R_rev'], axis = 1)


In [159]:
#setting up custom heuristics
customer_loyalty_df['customer_id'] = customer_loyalty_df.customer_id.astype(str)

#input for custom heuristic
#when customer_df and customer_loyalty_df are merged, all customer_ids (current customers) are expected to be present in the customer_df
custom_heuristic_input_df = pd.merge(customer_loyalty_df, customer_df, on='customer_id', how='left')

#the three heuristic rules
custom_heuristic_output_df = custom_heuristic_input_df.loc[custom_heuristic_input_df['orders_count']>1]
custom_heuristic_output_df = custom_heuristic_output_df[custom_heuristic_output_df['accepts_marketing']==True]
custom_heuristic_output_df = custom_heuristic_output_df[custom_heuristic_output_df['verified_email']==True]

#output
custom_heuristic_output_df = custom_heuristic_output_df.reset_index()
final_output_df = custom_heuristic_output_df.drop(['index'],axis=1)

In [160]:
final_output_df

Unnamed: 0,customer_id,#days_since_last_order,frequency,avg_monetary_value,M,R,loyalty_score,accepts_marketing,orders_count,total_spent,verified_email
0,3893916467274,163,4,101.500000,2,4,10,True,2.0,223.00,True
1,3893928722506,129,4,85.750000,2,4,10,True,2.0,240.08,True
2,3893943894090,184,7,20.285714,1,4,12,True,2.0,32.80,True
3,3893965029450,142,2,115.000000,3,4,9,True,2.0,243.80,True
4,3893975646282,112,4,93.250000,2,5,11,True,2.0,237.23,True
...,...,...,...,...,...,...,...,...,...,...,...
2932,5083002994762,79,4,87.500000,2,5,11,True,2.0,194.20,True
2933,5083456569418,79,6,47.000000,1,5,12,True,2.0,149.48,True
2934,5084289564746,75,16,91.500000,2,5,23,True,2.0,423.28,True
2935,5084759785546,77,4,115.000000,3,5,12,True,2.0,124.03,True


In [188]:
#matching customers with products they bought
merge_output_with_transaction_data = pd.merge(final_output_df, combined_transactions, on='customer_id', how='left')

merchandise_df['product_id'] = merchandise_df.product_id.astype(int)
merge_output_with_transaction_data['product_id'] = merge_output_with_transaction_data.product_id.astype(int)
merge_output_with_transaction_data['customer_id'] = merge_output_with_transaction_data.customer_id.astype(int)


product_loyalty_df = pd.merge(merge_output_with_transaction_data, merchandise_df, how='left', left_on = ['customer_id','product_id'],right_on=['product_id'])

ValueError: len(right_on) must equal len(left_on)

In [184]:
merchandise_df

Unnamed: 0,variant_id,product_id,category_id,variant_name,product_name,category_name,original_price,image_link
0,32552653881418,4810467541066,Aluminum,Both,18 KARAT GOLD PLATED,Aluminum,225.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
1,32663668752458,4900303536202,Accessories,Burnt,Accessory Kit,Accessories,54.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
2,32663675174986,4900303536202,Accessories,Carbon,Accessory Kit,Accessories,54.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
3,29021750689866,3737038782538,Aluminum,Cash Strap,Aluminum - Matte White,Aluminum,85.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
4,29021750722634,3737038782538,Aluminum,Money Clip,Aluminum - Matte White,Aluminum,85.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
...,...,...,...,...,...,...,...,...
1895,702,3712449806410,Gift Card,Replacement Money Clip,Wallet Parts,Gift Card,0.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
1896,703,3712449806410,Gift Card,T5 Torx Driver,Wallet Parts,Gift Card,0.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
1897,711,3712449806410,Gift Card,Both Bundle Items,Wallet Parts,Gift Card,12.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
1898,507,4672560955466,Aluminum,Cash Strap,Woodland Camo,Aluminum,85.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...


In [179]:
product_loyalty_df.to_excel('prd.xlsx')


Ignoring URL 'https://cdn.shopify.com/s/files/1/0613/6213/products/burnt-collection_f5bca21b-2e1b-4600-95e3-4da27de973fc.jpg?v=1617139677' since it exceeds Excel's limit of 65,530 URLS per worksheet.


Ignoring URL 'https://cdn.shopify.com/s/files/1/0613/6213/products/cavity-tray-3.jpg?v=1604696953' since it exceeds Excel's limit of 65,530 URLS per worksheet.


Ignoring URL 'https://cdn.shopify.com/s/files/1/0613/6213/products/NEW-RENDERS_Collection_template_damascus.jpg?v=1609888123' since it exceeds Excel's limit of 65,530 URLS per worksheet.


Ignoring URL 'https://cdn.shopify.com/s/files/1/0613/6213/products/damascus-collection.jpg?v=1620329459' since it exceeds Excel's limit of 65,530 URLS per worksheet.


Ignoring URL 'https://cdn.shopify.com/s/files/1/0613/6213/products/black_lightning-usb_1.jpg?v=1612302599' since it exceeds Excel's limit of 65,530 URLS per worksheet.


Ignoring URL 'https://cdn.shopify.com/s/files/1/0613/6213/products/black_micro-usb_1.jpg?v=1612302810' since


Ignoring URL 'https://cdn.shopify.com/s/files/1/0613/6213/products/6.7-2.jpg?v=1620251153' since it exceeds Excel's limit of 65,530 URLS per worksheet.


Ignoring URL 'https://cdn.shopify.com/s/files/1/0613/6213/products/koozie1_copy.jpg?v=1587575356' since it exceeds Excel's limit of 65,530 URLS per worksheet.


Ignoring URL 'https://cdn.shopify.com/s/files/1/0613/6213/products/20210202_walletinsuranceArtboard1.jpg?v=1612896094' since it exceeds Excel's limit of 65,530 URLS per worksheet.


Ignoring URL 'https://cdn.shopify.com/s/files/1/0613/6213/products/1expedition1.jpg?v=1573161100' since it exceeds Excel's limit of 65,530 URLS per worksheet.


Ignoring URL 'https://cdn.shopify.com/s/files/1/0613/6213/products/hat-black-1.jpg?v=1606155004' since it exceeds Excel's limit of 65,530 URLS per worksheet.


Ignoring URL 'https://cdn.shopify.com/s/files/1/0613/6213/products/black_back.jpg?v=1574120907' since it exceeds Excel's limit of 65,530 URLS per worksheet.


Ignoring URL 'https://

In [178]:
merge_output_with_transaction_data

Unnamed: 0,customer_id,#days_since_last_order,frequency,avg_monetary_value_x,M,R,loyalty_score,accepts_marketing,orders_count,total_spent,verified_email,order_id,product_id,variant_id,source_id,store_id,order_date,selling_price,quantity,currency,client_id,revenue,shipping_zip,avg_monetary_value_y
0,3893916467274,163,4,101.50,2,4,10,True,2.0,223.00,True,2994887065674,4497685643338,R913,web,1997985808458,2021-01-01,28.0,1,USD,59,28.0,04870,1
1,3893916467274,163,4,101.50,2,4,10,True,2.0,223.00,True,2994887065674,3795636617290,615,web,1997985808458,2021-01-01,175.0,1,USD,59,175.0,04870,1
2,3893916467274,163,4,101.50,2,4,10,True,2.0,223.00,True,3029780627530,4497685643338,R913,shopify draft order,,2021-01-27,28.0,1,USD,59,28.0,04870,1
3,3893916467274,163,4,101.50,2,4,10,True,2.0,223.00,True,3029780627530,3795636617290,615,shopify draft order,,2021-01-27,175.0,1,USD,59,175.0,04870,1
4,3893928722506,129,4,85.75,2,4,10,True,2.0,240.08,True,3646916558922,4725951463498,509,web,2696761802826,2021-03-02,105.0,1,USD,59,105.0,85296,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15241,5085681320010,77,8,85.00,2,5,15,True,3.0,88.83,True,3722026614858,347698423,232,web,2696761802826,2021-04-23,85.0,1,USD,59,85.0,96701,1
15242,5085681320010,77,8,85.00,2,5,15,True,3.0,88.83,True,3721982148682,347698423,232,web,2696761802826,2021-04-23,85.0,1,USD,59,85.0,96701,1
15243,5085681320010,77,8,85.00,2,5,15,True,3.0,88.83,True,3721974317130,347698423,232,web,2696761802826,2021-04-23,85.0,1,USD,59,85.0,96701,1
15244,5085681320010,77,8,85.00,2,5,15,True,3.0,88.83,True,3722026614858,347698423,232,web,2696761802826,2021-04-23,85.0,1,USD,59,85.0,96701,1


In [172]:
#merge_output_with_transaction_data.to_excel('mrg.xlsx')

# Plots

1. Use customer_loyalty_df for loyalty level distribution

In [10]:
rfm.loc[rfm['customer_id']=="3778760212554"].head(50)

Unnamed: 0,customer_id,frequency,avg_monetary_value
442785,3778760212554,41,344.02439


## Analysing the data

- Checking for a particular customer - how to calculate average order value
    - There seems to be multiple orders for a certain order_date by the same customer

In [12]:
combined_transactions.loc[combined_transactions['customer_id']=="3778760212554"].head(100)

Unnamed: 0,order_id,product_id,variant_id,customer_id,source_id,store_id,order_date,selling_price,quantity,currency,client_id,revenue,shipping_zip
2,2975452364874,347692039,442,3778760212554,web,1997985808458.0,2020-12-16,125.0,6,USD,59,750.0,73069
3,2975452364874,90267582478,282,3778760212554,web,1997985808458.0,2020-12-16,75.0,6,USD,59,450.0,73069
82953,2920320041034,347698423,232,3778760212554,web,1997985808458.0,2020-11-20,75.0,15,USD,59,1125.0,73069
82954,2920320041034,90267582478,282,3778760212554,web,1997985808458.0,2020-11-20,75.0,4,USD,59,300.0,73069
82955,2920320041034,11471829006,S262,3778760212554,web,1997985808458.0,2020-11-20,75.0,5,USD,59,375.0,73069
84090,2918329155658,11471829006,S262,3778760212554,web,1997985808458.0,2020-11-19,75.0,20,USD,59,1500.0,73069
84091,2918329155658,347698423,232,3778760212554,web,1997985808458.0,2020-11-19,75.0,10,USD,59,750.0,73069
84092,2918329155658,90267582478,282,3778760212554,web,1997985808458.0,2020-11-19,75.0,5,USD,59,375.0,73069
84093,2918329155658,347148895,S222,3778760212554,web,1997985808458.0,2020-11-19,75.0,2,USD,59,150.0,73069
211115,2907921711178,90267582478,282,3778760212554,shopify draft order,,2020-11-12,75.0,4,USD,59,300.0,73069


In [36]:
#rfm score check - 'recency'
#no of days = 849, while calculating recency
#low recency score
combined_transactions.loc[combined_transactions['customer_id']=="999991050314"].head(50)

Unnamed: 0,order_id,product_id,variant_id,customer_id,source_id,store_id,order_date,selling_price,quantity,currency,client_id,revenue,shipping_zip
790577,859213758538,11361067278,332,999991050314,web,796006023242,2019-03-13,105.0,1,USD,59,105.0,60657


In [37]:
#rfm score check - 'recency'
#no of days = 91 , while calculating recency
#high recency score
combined_transactions.loc[combined_transactions['customer_id']=="999980957770"].head(50)

Unnamed: 0,order_id,product_id,variant_id,customer_id,source_id,store_id,order_date,selling_price,quantity,currency,client_id,revenue,shipping_zip
282892,2904745705546,4810375757898,C66,999980957770,web,1997985808458.0,2020-11-10,50.0,1,USD,59,50.0,46835
420535,3701402173514,4810375757898,C66,999980957770,4072793,,2021-04-09,50.0,1,USD,59,50.0,46835
790578,859203993674,347148895,222,999980957770,web,796006023242.0,2019-03-13,72.0,1,USD,59,72.0,46835
1083,3701402173514,4810375757898,C66,999980957770,4072793,,2021-04-09,50.0,1,USD,59,50.0,46835


In [38]:
#what item did customer_id - 999980957770 repeatedly buy? = phone cases
merchandise_df.loc[merchandise_df['product_id']=='4810375757898']

Unnamed: 0,variant_id,product_id,category_id,variant_name,product_name,category_name,original_price,image_link
64,32551106248778,4810375757898,Phone Case,Brown / 12 Mini,The Card Case - iPhone 12,Phone Case,50.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
65,32551106281546,4810375757898,Phone Case,Black / 12 Mini,The Card Case - iPhone 12,Phone Case,50.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
66,32551106314314,4810375757898,Phone Case,Brown / 12 / 12 Pro,The Card Case - iPhone 12,Phone Case,50.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
67,32551106347082,4810375757898,Phone Case,Black / 12 / 12 Pro,The Card Case - iPhone 12,Phone Case,50.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
68,32551106379850,4810375757898,Phone Case,Brown / 12 Pro Max,The Card Case - iPhone 12,Phone Case,50.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
69,32551106412618,4810375757898,Phone Case,Black / 12 Pro Max,The Card Case - iPhone 12,Phone Case,50.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
251,32551106248778,4810375757898,Phone Case,Brown / 12 Mini,The Card Case - iPhone 12,Phone Case,50.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
252,32551106281546,4810375757898,Phone Case,Black / 12 Mini,The Card Case - iPhone 12,Phone Case,50.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
253,32551106314314,4810375757898,Phone Case,Brown / 12 / 12 Pro,The Card Case - iPhone 12,Phone Case,50.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...
254,32551106347082,4810375757898,Phone Case,Black / 12 / 12 Pro,The Card Case - iPhone 12,Phone Case,50.0,https://cdn.shopify.com/s/files/1/0613/6213/pr...


In [46]:
# frequency check
# order date for customer id 999976304714 = 503 = number of days since last purchase
# order_id = 2 = how frequently they bought items
combined_transactions.loc[combined_transactions['customer_id']=="999976304714"].head(50)

Unnamed: 0,order_id,product_id,variant_id,customer_id,source_id,store_id,order_date,selling_price,quantity,currency,client_id,revenue,shipping_zip
643792,859200651338,90266435598,271,999976304714,web,796006023242,2019-03-13,72.0,1,USD,59,72.0,89011-2407
849516,2175442092106,347692039,442,999976304714,web,796006023242,2020-02-22,125.0,1,USD,59,125.0,89011-2407


In [5]:
combined_transactions

Unnamed: 0,order_id,product_id,variant_id,customer_id,source_id,store_id,order_date,selling_price,quantity,currency,client_id,revenue,shipping_zip
0,2975528452170,3798739451978,443,3874371567690,web,1997985808458,2020-12-16,125.0,1,USD,59,125.0,34291
1,2975482183754,347692039,K254,3874330476618,web,1997985808458,2020-12-16,137.0,1,USD,59,137.0,96003
2,2975452364874,347692039,442,3778760212554,web,1997985808458,2020-12-16,125.0,6,USD,59,750.0,73069
3,2975452364874,90267582478,282,3778760212554,web,1997985808458,2020-12-16,75.0,6,USD,59,450.0,73069
4,2975444926538,3798739451978,444,3874301509706,web,1997985808458,2020-12-16,125.0,1,USD,59,125.0,78665
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1839,3758043955274,347698423,231,5114385236042,web,2696761802826,2021-05-20,85.0,1,USD,59,85.0,77372
1840,3722800758858,3795636617290,615,5086285430858,web,2696761802826,2021-04-24,175.0,1,USD,59,175.0,08820
1841,3690327081034,3795636617290,615,3366304219210,web,2696761802826,2021-04-01,175.0,1,USD,59,175.0,46835
1842,3690327081034,4826176749642,R139,3366304219210,web,2696761802826,2021-04-01,15.0,1,USD,59,15.0,46835


In [181]:
merge_output_with_transaction_data.loc[merge_output_with_transaction_data['customer_id']=="3893916467274"].head(10)

Unnamed: 0,customer_id,#days_since_last_order,frequency,avg_monetary_value_x,M,R,loyalty_score,accepts_marketing,orders_count,total_spent,verified_email,order_id,product_id,variant_id,source_id,store_id,order_date,selling_price,quantity,currency,client_id,revenue,shipping_zip,avg_monetary_value_y


In [7]:
combined_transactions

Unnamed: 0,order_id,product_id,variant_id,customer_id,source_id,store_id,order_date,selling_price,quantity,currency,client_id,revenue,shipping_zip
0,2975528452170,3798739451978,443,3874371567690,web,1997985808458,2020-12-16,125.0,1,USD,59,125.0,34291
1,2975482183754,347692039,K254,3874330476618,web,1997985808458,2020-12-16,137.0,1,USD,59,137.0,96003
2,2975452364874,347692039,442,3778760212554,web,1997985808458,2020-12-16,125.0,6,USD,59,750.0,73069
3,2975452364874,90267582478,282,3778760212554,web,1997985808458,2020-12-16,75.0,6,USD,59,450.0,73069
4,2975444926538,3798739451978,444,3874301509706,web,1997985808458,2020-12-16,125.0,1,USD,59,125.0,78665
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1839,3758043955274,347698423,231,5114385236042,web,2696761802826,2021-05-20,85.0,1,USD,59,85.0,77372
1840,3722800758858,3795636617290,615,5086285430858,web,2696761802826,2021-04-24,175.0,1,USD,59,175.0,08820
1841,3690327081034,3795636617290,615,3366304219210,web,2696761802826,2021-04-01,175.0,1,USD,59,175.0,46835
1842,3690327081034,4826176749642,R139,3366304219210,web,2696761802826,2021-04-01,15.0,1,USD,59,15.0,46835


In [1]:
l = [["a", 12, 12], [None, 12.3, 33.], ["b", 12.3, 123], ["a", 1, 1]]
df = pd.DataFrame(l, columns=["a", "b", "c"])

NameError: name 'pd' is not defined