# Apriori Algorithm

## Main Task
> Identify the best deals to maximize the chance that customers will get the deals. (Buy the product X and get the product Y for free, this can optimize the sales and the profit).

### Data Understanding  

**1.0. What is the domain area of the dataset?**  
The dataset *Market_Basket_Optimisation.csv* contains information about people who have visited a shopping centrum. 

**2.0. Which data format?**  
The dataset is in *csv* format!  

**2.1. Do the files have headers or another file describing the data?**  
The files does have headers that describes the data! Each column has a name that describes the data it contains!  

**2.2. Are the data values separated by commas, semicolon, or tabs?**  
The data values are separated by commas!  
Example: 
*burgers,meatballs,eggs*

**3.0 How many features and how many observations does the dataset have?**  
The dataset has:  
* 5 features or columns!
* 200 observations or rows!  

**4.0 Does it contain numerical features? How many?**  
No

**5.0. Does it contain categorical features?  How many?**  
No, but they can be categorized.

In [None]:
# In case apyori module is not installed on your machine!
# !pip install apyori

In [18]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from apyori import apriori

### Data Preprocessing

In [9]:
dataset = pd.read_csv("../Datasets/Market_Basket_Optimisation.csv")

In [10]:
dataset.head()

Unnamed: 0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
0,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
1,chutney,,,,,,,,,,,,,,,,,,,
2,turkey,avocado,,,,,,,,,,,,,,,,,,
3,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
4,low fat yogurt,,,,,,,,,,,,,,,,,,,


In [13]:
print(f"Number of features in the dataset is {dataset.shape[1]} and the number of observations/rows in the dataset is {dataset.shape[0]}")

Number of features in the dataset is 20 and the number of observations/rows in the dataset is 7500


In [15]:
# for using aprori , need to convert data in list format..
# transactions = [['apple','almonds'],['apple'],['banana','apple']]....

transactions = []
for i in range(0, len(dataset)):
    temp = []
    for j in range(0, 20):
        tempStr = str(dataset.values[i, j]) #dataset.values[rows, columns]
        if(tempStr != 'nan'):
            temp.append(tempStr)
    transactions.append(temp)

In [16]:
transactions[0]

['burgers', 'meatballs', 'eggs']

### Training the Apriori Model on the dataset

**min_support:** This is the minimum support value for an itemset to be considered frequent. If the support of an itemset is lower than this value, it’s ignored. The support of an itemset is the proportion of transactions in the dataset that contain the itemset. 

**min_confidence:** This is the minimum confidence value for a rule to be considered significant. If the confidence of a rule is lower than this value, it’s ignored. The confidence of a rule is the proportion of transactions that contain the antecedent of the rule, which also contain the consequent.

**min_lift:** This is the minimum lift value for a rule to be considered significant. If the lift of a rule is lower than this value, it’s ignored. The lift of a rule is the ratio of the observed support to that expected if the antecedent and the consequent were independent.

**min_length** and **max_length:** These parameters specify the minimum and maximum length of the itemsets/rules to be considered.  

> The appropriate value for hypoerparameters depends on your domain and how strict you want to be with your rules.

In [19]:
rules = apriori(transactions=transactions,
                min_support = 0.003, min_confidence = 0.2,
                min_lift = 3, min_length = 2, max_length = 2)

### Visualising the results

In [20]:
results = list(rules)
results

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004533333333333334, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.2905982905982906, lift=4.843304843304844)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005733333333333333, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.30069930069930073, lift=3.7903273197390845)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005866666666666667, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.37288135593220345, lift=4.700185158809287)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.0033333333333333335, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confiden

sdfsdf 