# <center> MARKET BASKET ANALYSIS <center>


### DEFINITION:
- Retailers utilize market basket analysis, a data mining approach, to boost sales by better understanding client buying patterns.
- Large data sets,such as purchase histories,must be analyzed to identify productgroups and items that are most likely to be bought together.

### Question:
Apriori is a statistical algorithm for implementing associate rule mining, that primarily relies on 
three components: Life, Support and Confidence. Using this algorithm try to find the rules that 
describe the relation between each of the products that were brought by the customers as 
described in
Dataset Link: Store Data
https://drive.google.com/file/d/1y5DYn0dGoSbC22xowBq2d4po6h1JxcTQ/view?usp=sharin


### Apriori
- Definition:
Association rules analysis is a technique to uncover how items are associated to each other. There are three common ways to measure association.

- Support. This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears. In the list below, the support of {apple} is 4 out of 8, or 50%. Itemsets can also contain multiple items. For instance, the support of {apple, beer, rice} is 2 out of 8, or 25%. Support(apple)= 4/8 = 50% support{apple, beer, rice} = 2/8 = 25%

- List 1: Apple, Beer, Rice, Ham
- List 2: Apple, Beer, Rice
- List 3: Apple, Beer
- List 4: Apple, Pear
- List 5: Milk, Beer, Rice, Ham
- List 6: Milk, Beer, Rice
- List 7: Milk, Beer
- List 8: Milk, Pear

- Confidence. This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears. In the above list, the confidence of {apple -> beer} is 3 out of 4, or 75%.

- Support(X, Y)/ (Support (X)

One drawback of the confidence measure is that it might misrepresent the importance of an association. This is because it only accounts for how popular apples are, but not beers. If beers are also very popular in general, there will be a higher chance that a transaction containing apples will also contain beers, thus inflating the confidence measure. To account for the base popularity of both constituent items,a third measure called lift is used.

- Lift. This says how likely item Y is purchased when item X is purchased, while controlling for how popular item Y is. In Table 1, the lift of {apple -> beer} is 1, which implies no association between items. A lift value greater than 1 means that item Y is likely to be bought if item X is bought, while a value less than 1 means that item Y is unlikely to be bought if item X is bought.

- Lift(X -> Y) = Support(X, Y)/ (Support (X) x SupportY))


### PROCEDURE:
- Importing Libraries
- Data Preprocessing
- Training Apriori Model on the Dataset
- Visualising the rules
- Displaying the rules in Dataframe



### Importing Libraries

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### Data Preprocessing

In [4]:
df=pd.read_csv("/kaggle/input/suggestions/Market_Basket_Optimisation.csv",header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [5]:
df.shape

(7501, 20)

In [14]:
transactions = []
for i in range(0, 7501):
    transactions.append([str(df.values[i,j]) for j in range(0, 20)])

### Training the Apriori model on the dataset

In [15]:
!pip install apyori



In [16]:
from apyori import apriori
rules=apriori(transactions=transactions,min_support=0.003,min_confidence=0.2,min_lift=3,min_length=2,max_length=2)

transactions: This parameter should be a list of lists, where each inner list represents a transaction and contains the items bought in that transaction. 

min_support: This is the minimum support threshold, which is used to filter out itemsets with low support. Support is the proportion of transactions that contain the itemset. A lower value for min_support will result in more itemsets being considered.

min_confidence: This is the minimum confidence threshold, which is used to filter out weak association rules. Confidence measures the likelihood of one itemset occurring given the occurrence of another itemset. A lower value for min_confidence will result in more association rules being generated.

min_lift: This is the minimum lift threshold, which is used to filter out rules with low lift. Lift measures how much more likely one itemset is to occur when another itemset occurs than when it doesn't. A higher value for min_lift will result in more relevant association rules.

min_length: This is the minimum length of the itemsets to be considered. It specifies the minimum number of items in an itemset. A higher value for min_length will result in longer itemsets and potentially more specific rules.

After running the apriori function, we will obtain an apriori object, which can be converted to a list or processed further to extract the association rules. To extract the rules, we can convert the apriori object into a list and then process each item in the list to get relevant information about the association rules.

### Visualising the results

In [17]:
results=list(rules)
results

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

### Displaying in Pandas Dataframe

The inspect function takes the results generated from applying the Apriori algorithm to a dataset. It extracts the left-hand side (LHS) item, right-hand side (RHS) item, support, confidence, and lift metrics for each association rule found by the Apriori algorithm.

The pd.to_numeric function is used to convert the "Lift" column to a numeric type. The reason for converting it to a numeric type is to ensure that the column is treated as numerical data rather than as an object, which allows for proper sorting and numerical comparisons.

The function returns a list of tuples containing the LHS item, RHS item, support, confidence, and lift for each association rule.

The resultsindataframe DataFrame is created using the inspect function's output. It contains the information for each association rule in a tabular format.

Finally, the nlargest method is applied to the resultsindataframe DataFrame to get the top 10 association rules with the highest lift value.


In [26]:
def inspect(results):
    lhs = [tuple(result[2][0][0])[0] for result in results]
    rhs = [tuple(result[2][0][1])[0] for result in results]
    supports = [result[1] for result in results]
    confidence = [result[2][0][2] for result in results]  # Corrected confidence extraction
    lifts = [result[2][0][3] for result in results]

    # Convert the Lift column to numeric type
    lifts = pd.to_numeric(lifts)

    return list(zip(lhs, rhs, supports, confidence, lifts))

resultsindataframe = pd.DataFrame(inspect(results), columns=['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])
resultsindataframe.nlargest(n=10, columns='Lift')

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471


In summary, a DataFrame resultsindataframe contains association rules mined by the Apriori algorithm, sorted in descending order based on the lift value. The DataFrame will show the top 10 association rules with the highest lift. Lift is a measure of how much more likely one item is to be bought when another item is bought, and high lift values indicate strong associations between items.