## Product Cost of Customer Acquisition
---
#### Another Practical Data Science Primer

The marketing team for an eCommerce platform has asked you to help identify how much they should spend to acquire 1 new customer.

This eCommerce site charges 10% from their customer's sales as their fee.

You are given three tables:
1. Invoice Table: information on every transaction
2. Product Table: contains details about the individual products sold.
3. Customer Table: details about the customer and their location.

##### Questions:
---
1. What is the eCommerce company's customer acquisition cost (CAC)?

    1.1 CAC = (Sales and Marketing Expense) / (Number of New Customers)

2. What is average Life Time Value (LTV) of a customer?

    2.1 What is the LTV to CAC ratio?
    
    2.2 Can the company afford to spend more to acquire a new customer?

3. What is the return rate, and which product is returned the most?

    3.1 Return rate = (total items returned) / (total items sold)

4. If the company decides to extend its market to another country, what is the feasible choice, and why?

5. Which was the most successful quarter in acquiring new customers?

    5.1 Note that this depends on multiple factors.

6. Devise a recommendation system based on the purchase data:

    6.1 If a customer buys product A and B, what is the probability that the customer will buy product C?
    
    6.2 What are the most purchased items by people who purchased product D? Hint: consider collaborative filtering methods.

In [None]:
# import needed libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
# import local files for analysis

df_customer = pd.read_csv('./Customer_info_table.csv')
df_prod = pd.read_csv('./Product_info_table.csv')
df_inv = pd.read_csv('./Invoice_info_table.csv')

In [None]:
# View Customer Info table & stats:

print(f'^^^Customer Info: {df_customer.info()}\n')
print(f'Size of Cust Info Table: {df_customer.shape}')
print(f'# of unique customers: {df_customer.CustomerID.nunique()}')
print(f'Are null values listed: {df_customer.CustomerID.isnull().any()}\n')
df_customer.head()

In [None]:
# Take a look at the Product Table:

print(f'^^^Product Info: {df_prod.info()}\n')
print(f'Size of Product Table: {df_prod.shape}')
print(f'# of unique Products: \n{df_prod.nunique()}\n')
df_prod.head()

In [None]:
# Take a look at the Customer Invoice Table:
print(f'Invoice Data: {df_inv.info()}\n')
print(f'Size of Invoice Table: {df_inv.shape}')
print(f'# of unique customers: {df_inv.CustomerID.nunique()}\n')
df_inv.head()

In [None]:
# notice 2 values for each table are the same.

print(f'# of unique customers by invoice: {df_inv.CustomerID.nunique()}')
print(f'# of unique customers by customer database: {df_customer.CustomerID.nunique()}')

#### Question 1: Customer Aquisition Cost
---

In [None]:
# Customer Info Table and the Customer Invoice table:
# Share the same amount of unique entries
# Perform a merge on the Customer ID colummn
# reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

df_ci = pd.merge(df_customer, df_inv, how='left', left_on='CustomerID', right_on='CustomerID')
print(f'Size of df_ci: {df_ci.shape}')

# reference for duplicates: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html
print(f'Check for duplicated values: {df_ci.duplicated().value_counts()}\n')
df_ci.head()

# Size of the dataframe and # of non-unique values are equal
# So the merge went well: all entries are unique in some way.

In [None]:
df_ci.InvoiceDate.value_counts()