# Upselling and Cross Selling

Upselling is the sales technique of encouraging customers to purchase a higher-end version or add premium features to their initial product selection, while cross-selling aims to sell complementary or related products to customers alongside their primary purchase. Association rule mining provides a powerful data-driven approach to identify these opportunities by uncovering hidden patterns and relationships between products in transaction data. By analyzing which items are frequently purchased together, businesses can generate personalized product recommendations that naturally align with customer preferences and purchase history. The insights derived from association rule mining enable more targeted marketing strategies, improved product bundling decisions, and strategic placement of complementary products, ultimately driving increased average order value, enhanced customer satisfaction, and higher revenue per customer interaction.

We'll explore 2 methods of association rule mining:
1. Apriori
2. Eclat

## Importing Libraries and Data

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori # pip install apyori

In our dataset, we're working with a grocery store owner who wants to increase her sales by offering BOGO deals. She doesn't know which items she should create BOGO deals with, and asked us to help. The store owner tracked every transaction at her point of sale for 7 days, and gave us a CSV where each row is a transaction, each row containing n items that were purchased in that transaction. We'll import this data and start exploring:

In [2]:
# The header=None argument tells read_csv that our input data doesn't have column labels. There are no features nor a second dimension to this dataset--just row upon row of transaction basket data
df = pd.read_csv('transaction_data.csv', header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [3]:
df.shape

(7501, 20)

## Apriori

In its simplest form, Apriori is 'If this, then that.' Apriori algorithms look for sets of items that exist together, and create heuristics based on the frequency and consistency of those itemsets. Apriori relies on the idea that `If a set of items (itemset) is frequent, then all of its subsets must also be frequent. Conversely, if a set of items is infrequent, none of its supersets can be frequent.` A common use case is analyzing point of sale transactions to make deal recommendations, like in our scenario. We'll be using the `apriori` class from the `apyori` package. The `apriori` class expects a list of lists, where each sub-list is a transaction. Each item in the transaction list needs to be str, so we do some reformatting to pull the data out of the df and into an all-str list of lists.

### Preprocessing

In [6]:
transactions = []
# For each row in the dataset, append the items to the transactions list
for i in range(len(df)):
  transactions.append([str(df.values[i,j]) for j in range(len(df.columns))])

print(transactions[0])

['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes', 'whole weat flour', 'yams', 'cottage cheese', 'energy drink', 'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad', 'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie', 'spinach', 'olive oil']


### Training/Mining

The `apriori` class expects 4 main arguments:
1. transactions: a list of transactions
2. min_support: support is the number of times a subset of items appears in the transaction list. The `apriori` class expects a minimum value to bound the list of output upsell rules to only include associations that are considered significant, expressed as a fraction of total transactions.
3. min_confidence: confidence refers to the strength of a given association rule, which describes the required % of applicable transactions where the rule applies. IE if min_confidence is 0.8, then the rule must apply to 80% of the transactions where the antecedent (requisite itemset) is present. This is a hyperparameter we can tune to end up with a higher or lower count of rules.
4. min_lift: lift measures the importance of the rule; how much more likely the second item is to be purchased when the first item is purchased. A lift value of 1 indicates that the items are unrelated (neither positive nor negative correlation). A lift value of more than 1 indicates that the second item is `lift_value` more times likely to be purchased when the first item is purchased. A lift value of less than 1 indicates that second item is (1/`lift_value`) times less likely to be purchased when the first item is purchased. 

In [7]:
# Building a heuristic for min_support: let's consider itemsets that were bought at least 3 times a day. Our dataset is for 7 days total, so valid itemsets should occur at least 21 times. min_support is expressed as a percentage of transactions, so we divide 21 by the total transaction count and round.
min_support = round(3*7/len(transactions), 4)
min_confidence = 0.2
# min_length and max_length are the minimum and maximum number of items in the itemset. These values are heavily dependent on the business problem that underlies the apriori analysis. In our case, we want to reccomend rules that correlate with BOGO deals, so we want to create rules that are only 2 items long: buy item A, get item B.
rules = apriori(transactions=transactions, min_support=min_support, min_confidence=min_confidence, min_lift=3, min_length=2, max_length=2)

### Visualization

In [8]:
# Displaying the first results coming directly from the output of the apriori function
results = list(rules)
for rule in results[:10]:
    print(rule)

RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])
RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)])
RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)])
RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0.245098

In [9]:
print("Number of rules: ",len(results))

Number of rules:  9


In [10]:
# Putting the results well organised into a Pandas DataFrame
# This is a cool bit of code from stack overflow that helps visualize the rules. It extracts the left and right hand side products, as well as the support, confidence, and lift for each rule, then zips those together into a list of tuples. The final line (outside the inspect function) casts that tuple list to df.
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])
resultsinDataFrame

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,light cream,chicken,0.004533,0.290598,4.843951
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
2,pasta,escalope,0.005866,0.372881,4.700812
3,fromage blanc,honey,0.003333,0.245098,5.164271
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,light cream,olive oil,0.0032,0.205128,3.11471
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
8,pasta,shrimp,0.005066,0.322034,4.506672


## Eclat

### Preprocessing

### Training/Mining

### Visualization

## Conclusion