# Project Summary

In [20]:
import pandas as pd

## Overall Approach

This project first preprocess the raw data, including steps of checking null values, compute relevant features and aggregating the collected features. 

### Preprocess
The steps of preprocess are:
1. Aggregate the given consumers and merchants data.
2. Obtain the type and take rate for the merchants from the tags. 
3. Calculate the total income, total number of transactions, mean transaction amount for merchants from the transaction dataset.
4. Calculate the monthly profit by: total_income * take_rate / number_of_months

In [21]:
# The preprocessed data
merchant_detail = pd.read_parquet('../data/curated/merchant_detail.parquet')
merchant_detail

Unnamed: 0,merchant_abn,merchant_name,type,take_rate,total_income,total_transactions,mean_transaction_amount,BNPL_monthly_profit
0,79827781481,Amet Risus Inc.,"furniture, home furnishings and equipment shop...",6.82,8657277.10,4251,2037.0,32801.0
1,48534649627,Dignissim Maecenas Foundation,"opticians, optical goods, and eyeglasses",6.64,8316735.67,58685,142.0,30680.0
2,32361057556,Orci In Consequat Corporation,"gift, card, novelty, and souvenir shops",6.61,8339994.52,75853,110.0,30626.0
3,86578477987,Leo In Consulting,"watch, clock, and jewelry repair shops",6.43,8443178.70,241336,35.0,30161.0
4,38700038932,Etiam Bibendum Industries,tent and awning shops,6.31,8482176.66,6341,1338.0,29735.0
...,...,...,...,...,...,...,...,...
4021,46391946761,Libero Nec Ligula LLP,"stationery, office supplies and printing and w...",0.37,13442.35,15,896.0,3.0
4022,64196096120,Turpis Incorporated,"gift, card, novelty, and souvenir shops",0.23,16159.06,156,104.0,2.0
4023,25019506172,Proin Vel Inc.,"stationery, office supplies and printing and w...",0.16,21967.78,31,709.0,2.0
4024,57079678065,Hendrerit Donec Limited,"digital goods: books, movies, music",0.18,20034.94,590,34.0,2.0


Then, according to each type of products provided by the merchants, we separted them into 5 main categories: 'Technology and Equipment, Luxury, Entertainment, Services, Stationeries'.

In [22]:
Technology_and_Equipment = ['opticians, optical goods, and eyeglasses',\
                           'furniture, home furnishings and equipment shops, and manufacturers, except appliances',\
                           'telecom',\
                           'computers, computer peripheral equipment, and software',\
                           'equipment, tool, furniture, and appliance rent al and leasing']

Entertainment = ['digital goods: books, movies, music',\
           'tent and awning shops',\
           'hobby, toy and game shops',\
           'florists supplies, nursery stock, and flowers',\
           'music shops - musical instruments, pianos, and sheet music']

Luxury = ['antique shops - sales, repairs, and restoration services',\
          'art dealers and galleries',\
          'artist supply and craft shops',\
          'jewelry, watch, clock, and silverware shops',\
          'motor vehicle supplies and new parts',\
          'bicycle shops - sales and service']

Stationaries = ['books, periodicals, and newspapers',\
             'gift, card, novelty, and souvenir shops',\
             'stationery, office supplies and printing and writing paper',\
             'shoe shops']

Service = ['cable, satellite, and other pay television and radio services',\
           'watch, clock, and jewelry repair shops',\
           'computer programming , data processing, and integrated systems design services',\
           'lawn and garden supply outlets, including nurseries',\
           'health and beauty spas']

Below is the weekly income of Australian by postcodes, and we noticed that it might have some relationship with the consumers purchasing power.  

In [23]:
import webbrowser
webbrowser.open('../plots/income_weekly.html', new=2)

True

Therefore, the postcodes with the most number of transaction of each merchant is taken as a feature for further analysis.

## The Ranking System

Our final ranking system uses the follow 3 features: Monthly income, Monthly profit growth, and fraud probability.
With the formula below:

$ EstimatedProfit = (MonthlyProfit + 12 * ProfitGrowth) * (1 - FraudProbability) $

we have 3 main assumptions:

Firstly we assume the growth rate and fraud probability will not change next year.

Secondly, we use the mean income over 18 months to represent the current monthly profit for BNPL produced by each merchant.

Thirdly, the fraud probability is used to determine the proportion of actual profit provided by the merchants.


### Monthly Income

This has been calculated in the preprocess steps.

In [24]:
merchant_detail[['merchant_name', 'BNPL_monthly_profit']]

Unnamed: 0,merchant_name,BNPL_monthly_profit
0,Amet Risus Inc.,32801.0
1,Dignissim Maecenas Foundation,30680.0
2,Orci In Consequat Corporation,30626.0
3,Leo In Consulting,30161.0
4,Etiam Bibendum Industries,29735.0
...,...,...
4021,Libero Nec Ligula LLP,3.0
4022,Turpis Incorporated,2.0
4023,Proin Vel Inc.,2.0
4024,Hendrerit Donec Limited,2.0


### Monthly Profit Growth

To obtain the monthly profit growth, we treat the month as independent varaibles, the profit as responses, calculate the gradient of the line of best fit.

In [25]:
profit_growth = pd.read_csv('../data/curated/merchant_detail_with_monthly_growth.csv')[['merchant_name','monthly_profit_growth']]
profit_growth

Unnamed: 0,merchant_name,monthly_profit_growth
0,Amet Risus Inc.,259.730041
1,Dignissim Maecenas Foundation,423.618912
2,Orci In Consequat Corporation,373.577904
3,Leo In Consulting,386.294528
4,Etiam Bibendum Industries,407.926070
...,...,...
4021,Libero Nec Ligula LLP,-0.209182
4022,Turpis Incorporated,0.040707
4023,Proin Vel Inc.,0.166143
4024,Hendrerit Donec Limited,0.032130


### Fraud Probability

To impute the fraud probability, we select the personal income of the main consumer group, the number and mean value of transactions as attributes. Then fit a linear regression to compute the fraud probability for other merchants. During the modeling process, we found that the consumers’ income is insignificant (The p-value is large and the BIC for the redueced model is smaller). Therefore, this attribute is eliminated. 

In [26]:
fraud_probability = pd.read_csv('../data/curated/merchant_detail_with_fraud_prob.csv')[['merchant_name','fraud_probability']]
fraud_probability

Unnamed: 0,merchant_name,fraud_probability
0,Amet Risus Inc.,29.735159
1,Dignissim Maecenas Foundation,28.567933
2,Orci In Consequat Corporation,28.107352
3,Leo In Consulting,26.106311
4,Etiam Bibendum Industries,32.140727
...,...,...
3950,Donec Vitae Company,77.093856
3951,Arcu Iaculis Enim Institute,76.710452
3952,Ac Eleifend PC,84.947045
3953,Ante Ipsum Ltd,74.575535


## Ranking System Results

The result based on the ranking system is given below:


In [27]:
ranking_result = pd.read_csv('../data/curated/ranking_result.csv')
ranking_result

Unnamed: 0,merchant_abn,take_rate,merchant_name,type,fraud_probability,monthly_profit_growth,monthly_profit,estimated profit
0,79827781481,6.82,Amet Risus Inc.,"furniture, home furnishings and equipment shop...",29.735159,259.730041,32801.0,25237.557189
1,48534649627,6.64,Dignissim Maecenas Foundation,"opticians, optical goods, and eyeglasses",28.567933,423.618912,30680.0,25546.555107
2,32361057556,6.61,Orci In Consequat Corporation,"gift, card, novelty, and souvenir shops",28.107352,373.577904,30626.0,25240.743106
3,86578477987,6.43,Leo In Consulting,"watch, clock, and jewelry repair shops",26.106311,386.294528,30161.0,25712.443039
4,38700038932,6.31,Etiam Bibendum Industries,tent and awning shops,32.140727,407.926070,29735.0,23499.742704
...,...,...,...,...,...,...,...,...
3950,45504841435,0.72,Donec Vitae Company,"books, periodicals, and newspapers",77.093856,0.350559,6.0,2.337963
3951,80597774208,0.76,Arcu Iaculis Enim Institute,bicycle shops - sales and service,76.710452,0.174572,6.0,1.885258
3952,29166700531,1.33,Ac Eleifend PC,telecom,84.947045,-0.616296,6.0,-0.210072
3953,61968317984,0.40,Ante Ipsum Ltd,motor vehicle supplies and new parts,74.575535,0.005671,5.0,1.288524


As expected, the preferred merchants generally have low fraud probability , high profit and profit growth. In contrast, merchants in the tail of the ranking have the opposite behaviors.

In [28]:
stationery_ranking_result = pd.read_csv('../data/curated/Stationeries_ranking_result.csv')
stationery_ranking_result

Unnamed: 0,merchant_abn,take_rate,merchant_name,type,fraud_probability,monthly_profit_growth,monthly_profit,estimated profit
0,32361057556,6.61,Orci In Consequat Corporation,"gift, card, novelty, and souvenir shops",28.107352,373.577904,30626.0,25240.743106
1,45629217853,6.98,Lacus Consulting,"gift, card, novelty, and souvenir shops",27.141809,378.858105,28839.0,24323.923751
2,94493496784,5.65,Dictum Phasellus In Institute,"gift, card, novelty, and souvenir shops",30.579032,323.889667,25281.0,20248.482980
3,79417999332,4.95,Phasellus At Company,"gift, card, novelty, and souvenir shops",28.048800,296.003626,22227.0,18548.331067
4,60956456424,4.69,Ultricies Dignissim LLP,"gift, card, novelty, and souvenir shops",28.944705,245.817366,18505.0,15244.777396
...,...,...,...,...,...,...,...,...
683,41081202656,1.44,Nam Ligula Industries,"stationery, office supplies and printing and w...",78.890998,-0.835324,10.0,-0.005041
684,60196678179,0.90,Ac Fermentum Associates,"stationery, office supplies and printing and w...",77.121389,-0.002306,8.0,1.823957
685,75522897148,1.05,Vestibulum Lorem Inc.,"stationery, office supplies and printing and w...",78.928707,0.315215,8.0,2.482741
686,28504709344,0.47,Consequat Corp.,shoe shops,74.568771,-0.030909,6.0,1.431546


We further check the most profitable merchants for different categories. For example, by looking at the rank of stationery merchants, we found that the type of gift and souvenir shops are considered more profitable. This is expected since these merchants have lower fraud probability and a higher net profit which result in a higher take rate.