## Association Rules.

### Objective :
The Objective of this assignment is to introduce and understanding to rule mining techniques, particularly focusing on market basket analysis and provide hands on experience.

### Task-1 Data Preprocessing :
Pre-process the dataset to ensure it is suitable for Association rules, this may include handling missing values, removing duplicates, and converting the data to appropriate format.

In [48]:
# import the required libraries.
import pandas as pd

In [49]:
# Load and read the given dataset
path='Online retail.xlsx'
data=pd.read_excel(path)

In [50]:
data.head()

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt


In [51]:
data

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt
...,...
7495,"butter,light mayo,fresh bread"
7496,"burgers,frozen vegetables,eggs,french fries,ma..."
7497,chicken
7498,"escalope,green tea"


In [52]:
# Handling Missing Values.
data.dropna(inplace=True)

In [53]:
data

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt
...,...
7495,"butter,light mayo,fresh bread"
7496,"burgers,frozen vegetables,eggs,french fries,ma..."
7497,chicken
7498,"escalope,green tea"


In [54]:
# Removing Duplicate Rows
data.drop_duplicates(inplace=True)

In [55]:
data

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt
...,...
7492,"burgers,salmon,pancakes,french fries,frozen sm..."
7493,"turkey,burgers,dessert wine,shrimp,pasta,tomat..."
7495,"butter,light mayo,fresh bread"
7496,"burgers,frozen vegetables,eggs,french fries,ma..."


#### Converting the Data into Appropriate format.

In [57]:
data.columns=['Items']

In [58]:
# Splitting the Items into Lists.
data['Items']=data['Items'].str.split(',')

In [59]:
# Converting to Transaction Encoding format.
from mlxtend.preprocessing import TransactionEncoder
Te=TransactionEncoder()
data_Te=Te.fit(data['Items']).transform(data['Items'])
binary_data=pd.DataFrame(data_Te,columns=Te.columns_)

In [60]:
data.head(20)

Unnamed: 0,Items
0,"[burgers, meatballs, eggs]"
1,[chutney]
2,"[turkey, avocado]"
3,"[mineral water, milk, energy bar, whole wheat ..."
4,[low fat yogurt]
5,"[whole wheat pasta, french fries]"
6,"[soup, light cream, shallot]"
7,"[frozen vegetables, spaghetti, green tea]"
8,[french fries]
9,"[eggs, pet food]"


In [61]:
binary_data.head()

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,True,False,False,False,False,False,...,True,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


### Task-2 Association Rule Mining :
1. Implement an Apriori algorithm using tool like python with libraries such as Pandas and Mlxtend etc.
2. Apply association rule mining techniques to the pre-processed dataset to discover interesting relationships between products purchased together.
3. Set appropriate threshold for support, confidence and lift to extract meaning full rules.

#### 1.Implement an Apriori algorithm using tool like python with libraries such as Pandas and Mlxtend etc.

In [64]:
# Import required libraries
from mlxtend.frequent_patterns import apriori,association_rules

In [65]:
# Applying Apriori algorith to find frequent itemsets.
freq_itemsets=apriori(binary_data,min_support=0.01,use_colnames=True)

#### 2.Apply association rule mining techniques to the pre-processed dataset to discover interesting relationships between products purchased together.
#### 3.Set appropriate threshold for support, confidence and lift to extract meaning full rules.

In [67]:
# Generation of Association Rules.
rules=association_rules(freq_itemsets,metric="lift",min_threshold=1)

In [68]:
# Filtering rules Based on Metrics.
rules=rules[(rules['support']>=0.01)&(rules['confidence']>=0.3)&(rules['lift']>=3)]

In [69]:
rules[['antecedents','consequents','support','confidence','lift']]

Unnamed: 0,antecedents,consequents,support,confidence,lift


In [70]:
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric


In [71]:
rules_sort=rules.sort_values(by='lift',ascending=False)

In [72]:
rules_sort[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head()

Unnamed: 0,antecedents,consequents,support,confidence,lift


In [73]:
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric


### Task-3 Analysis and Interpretation :
1. Analyse the generated rules to identify interesting patterns and relationships between the products.
2. Interpret the results and provide insights into customer purchasing behaviour based on the discovered rules


In [104]:
print(rules_sort.head(10))

Empty DataFrame
Columns: [antecedents, consequents, antecedent support, consequent support, support, confidence, lift, leverage, conviction, zhangs_metric]
Index: []


### Interview Questions:
1.	What is lift and why is it important in Association rules?
2.	What is support and Confidence. How do you calculate them?
3.	What are some limitations or challenges of Association rules mining?

#### 1.What is lift and why is it important in Association rules?
Lift is a measure that helps you understand how much more likely two items are to appear together compared to if they were independent. It tells you if there's a meaningful relationship between two items or events beyond random chance.

Lift indicates how much more likely one item is found with another than by pure chance. A lift greater than 1 shows a positive association (the items often appear together), while a lift less than 1 suggests a negative association (they rarely appear together). A lift of exactly 1 implies no relationship.

#### 2.What is support and Confidence. How do you calculate them?
Support measures how frequently an item or a combination of items appears in the dataset. It’s a measure of the general frequency of occurrence, which helps in identifying common itemsets.

Formula: Support(𝐴→𝐵)=Transactions containing 𝐴 and 𝐵/Total transactions​

Confidence is a measure of how often items in an association rule are found together. Specifically, it measures the likelihood of finding item B in a transaction if item A is already present.

Formula: Confidence(𝐴→𝐵)=Transactions containing 𝐴 and 𝐵/Transactions containing 𝐴

#### 3.What are some limitations or challenges of Association rules mining?
Scalability: With large datasets, the number of possible item combinations can be enormous, leading to high computational costs.

Threshold Selection: Setting support and confidence thresholds is tricky. Too low, and you get an overwhelming number of rules; too high, and you might miss potentially useful associations.

Relevancy: Not all discovered rules are useful or interesting; filtering out trivial or irrelevant rules can be challenging.