# Lab Report: Association Analysis using APRIORI Algorithm

### 1. Association Analysis
Association analysis is a rule-based machine learning technique used to discover interesting relationships, patterns, or associations between items in large datasets. It is widely used in market basket analysis to identify frequent item combinations.

---

### 2. Support
Support measures the frequency of an item or item set appearing in the dataset. It is calculated as:

$Support(X) = \frac{\text{Number of transactions containing } X}{\text{Total number of transactions}}$

Where \(X\) is the item or item set of interest.

---

### 3. Confidence
Confidence indicates the likelihood of item \(Y\) being purchased when item \(X\) is purchased. It is calculated as:

$Confidence(X \rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X)}$

Where \(X \rightarrow Y\) represents an association rule.

---

### 4. The Apriori Principle
The Apriori principle states that if an item set is infrequent, then all its supersets must also be infrequent. This principle is used to prune the search space and eliminate unnecessary candidate item sets during frequent item set generation.

---

### 5. Candidate Set
A candidate set is a collection of potential item sets generated during the iterative process of the Apriori algorithm. These candidates are tested against the dataset to determine if they meet the minimum support threshold.

---

### 6. Min Support Threshold
The minimum support threshold is a user-defined parameter that specifies the minimum frequency an item set must have to be considered frequent. Item sets with support values below this threshold are discarded.

---

### 7. Frequent Item Set
A frequent item set is an item set that satisfies the minimum support threshold. These item sets form the basis for generating strong association rules that meet confidence criteria.

---


### Apriori Algorithm:
$ Notation =  C_{k} $
$Candidate \;itemset \;of \;size \; k \; and\; L_{k} =  frequent \;itemset\; of\; size\; k $
$  C_{k+1} =\; candidates \;generated \;from \;L_{k}$
$ L_{k +1} = candidates \;in \; C_{k+1} \;satisfying\; minsup $

1. Read the transaction database and get support for each itemset, compare the support with minimum support to generate frequent itemset at level 1.
2.  Use join to generate a set of candidate k-item sets of length $ K+1 ( C_{k+1} ) $at next level.
3. Generate frequent item sets sets of length $ K+1 (L_{k +1})$at next level using minimum support.
     In this step
        3.1 scan original database  to count support for k+1 candidates
        3.2 prune candidates below minsup
4.  Repeat step 2 and 3 until no frequent item sets can be generated.
5. Generate rules form frequent itemsets from level 2 onwards using minimum confidence.


### Implement apriori algorithm for market basket analysis

In [9]:
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: apyori
  Building wheel for apyori (pyproject.toml): started
  Building wheel for apyori (pyproject.toml): finished with status 'done'
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=c666f116bfab4208662c7e7eda8dbeeae24e9780c7f921187e15610db91d8a69
  Stored in directory: c:\users\yakuma\appdata\local\pip\cache\wheels\c4\1a\79\20f55c470a50bb3702a8cb7c94d8ada15573538c7f4baebe2d
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [None]:
import pandas as pd
from apyori import apriori

In [11]:
data_frame = pd.read_csv('C:/Users/yakuma/Desktop/Newfolder/college assignments/7th sem assignments/data warehousing/DbDW lab/market_basket.csv', header =None)
data_frame.head()

Unnamed: 0,0,1,2,3,4,5
0,Wine,Chips,Bread,Butter,Milk,Apple
1,Wine,,Bread,Butter,Milk,
2,,,Bread,Butter,Milk,
3,,Chips,,Butter,,Apple
4,Wine,Chips,Bread,Butter,Milk,Apple


In [12]:
data_frame.shape

(22, 6)

### Convert Pandas dataframe into nested lists  

In [13]:
lsts = []
for i in range (0,22):
    lsts.append([str(data_frame.values[i,j]) for j in range (0,6)])

print (lsts)

[['Wine', 'Chips', 'Bread', 'Butter', 'Milk', 'Apple'], ['Wine', 'nan', 'Bread', 'Butter', 'Milk', 'nan'], ['nan', 'nan', 'Bread', 'Butter', 'Milk', 'nan'], ['nan', 'Chips', 'nan', 'Butter', 'nan', 'Apple'], ['Wine', 'Chips', 'Bread', 'Butter', 'Milk', 'Apple'], ['Wine', 'Chips', 'nan', 'nan', 'Milk', 'nan'], ['Wine', 'Chips', 'Bread', 'Butter', 'nan', 'Apple'], ['Wine', 'Chips', 'nan', 'nan', 'Milk', 'nan'], ['Wine', 'nan', 'Bread', 'nan', 'nan', 'Apple'], ['nan', 'nan', 'Bread', 'Butter', 'Milk', 'nan'], ['Wine', 'Chips', 'Bread', 'Butter', 'nan', 'Apple'], ['Wine', 'nan', 'nan', 'Butter', 'Milk', 'Apple'], ['Wine', 'Chips', 'Bread', 'Butter', 'Milk', 'Apple'], ['Wine', 'nan', 'Bread', 'nan', 'Milk', 'nan'], ['Wine', 'nan', 'Bread', 'Butter', 'Milk', 'Apple'], ['Wine', 'Chips', 'Bread', 'Butter', 'Milk', 'Apple'], ['nan', 'Chips', 'Bread', 'Butter', 'Milk', 'Apple'], ['nan', 'Chips', 'nan', 'Butter', 'Milk', 'Apple'], ['Wine', 'Chips', 'Bread', 'Butter', 'Milk', 'Apple'], ['Wine', 'n

### Goal of Association Rule Mining and its Application to any buisness
When you apply Association Rule Mining on a given set of transactions T your goal will be to find all rules with:

1. Support greater than or equal to min_support
2. Confidence greater than or equal to min_confidence

One of the algorithm for Association Rule Mining implemented here is APRIORI

### Support, Confidence, Strong Rules, and Lift in Association Analysis

#### Support
Support defines the popularity of an item within the dataset. It is calculated as the proportion of transactions that contain the item or item set.

#### Confidence
Confidence indicates the likelihood of how often items X and Y occur together, given the number of times X has occurred. It helps assess the strength of the association rule.

#### Strong rules
A rule A ⇒ B is considered a strong rule if it satisfies the minimum support (min_sup) and minimum confidence (min_confidence) thresholds. Strong rules indicate a reliable relationship between item sets.

#### Lift 
Lift measures the correlation between A and B in the rule A ⇒ B. It shows how one item set A affects the item set B. It is calculated as:

$
\text{Lift(A ⇒ B)} = \frac{\text{Support(A ∩ B)}}{\text{Support(A)} \times \text{Support(B)}}
$

If the lift is greater than 1, then A and B are dependent on each other, and the degree of dependence is indicated by the lift value.


#### Interpretation of Lift

- **Lift** indicates the certainty of a rule. It shows how much the sale of item **X** has increased when item **Y** is sold.

The formula for lift can also be expressed as:

$
\text{Lift(X ⇒ Y)} = \frac{\text{Confidence(X, Y)}}{\text{Support(Y)}}
$

#### Example

For the rule **X ⇒ Y** with:

- **Support = 60%**: This means that **60%** of all transactions show that **X** and **Y** have been bought together.
- **Confidence = 90%**: This indicates that **90%** of the customers who bought **X** also bought **Y**.



### Make APRIORI MODEL  for RULE GENERATION

In [14]:
asscsn_rules = apriori(lsts, min_support =0.50, min_confidence = 0.7, min_lift = 1.2, min_length = 2)
asscsn_results = list(asscsn_rules)

In [16]:
 import json
 print(json.dumps(asscsn_results, default=str, indent=4))

[
    [
        "frozenset({'Bread', 'Apple', 'Wine'})",
        0.5,
        [
            [
                "frozenset({'Apple'})",
                "frozenset({'Bread', 'Wine'})",
                0.7333333333333334,
                1.241025641025641
            ],
            [
                "frozenset({'Bread', 'Apple'})",
                "frozenset({'Wine'})",
                0.9166666666666667,
                1.2604166666666667
            ],
            [
                "frozenset({'Apple', 'Wine'})",
                "frozenset({'Bread'})",
                0.9166666666666667,
                1.2604166666666667
            ],
            [
                "frozenset({'Bread', 'Wine'})",
                "frozenset({'Apple'})",
                0.8461538461538461,
                1.241025641025641
            ]
        ]
    ]
]


### Result Interpretation -  Market basket analysis

#### Consumer behavior insights
Consumer behavior insights from the  given dataset and its applicaiton as interpreted below:

---

#### Frequent Item Set

The frequent item set identified from the market basket dataset analysis is:

- **{‘Wine’, ‘Apple’, ‘Bread’}, support = 0.5**
  - This means these items are bought together **50%** of the time across all transactions.

---

#### Strong Association Rules

##### Rule 1: {‘Apple’} → {‘Bread’, ‘Wine’}
- **Confidence**: 0.7333 (or 73.33%)
  - This indicates that **73.33%** of the consumers who bought **Apple** also bought **Bread & Wine**.
- **Lift**: 1.241
  - This means that **Bread & Wine** is **1.24 times more likely** to be bought by customers who buy **Apple**.
  - A lift greater than 1 indicates a strong correlation between the items.

---

##### Rule 2: {‘Apple’, ‘Bread’} → {‘Wine’}
- **Confidence**: 0.9167 (or 91.67%)
  - This implies that **91.67%** of the customers who bought **Apple & Bread** also bought **Wine**.
- **Lift**: 1.260
  - This suggests a strong association, indicating that customers who buy both **Apple & Bread** are highly likely to also purchase **Wine**.

---

##### Rule 3: {‘Apple’, ‘Wine’} → {‘Bread’}
- **Confidence**: 0.9167 (or 91.67%)
  - This means that **91.67%** of the customers who bought **Apple & Wine** also bought **Bread**.
- **Lift**: 1.260
  - Similar to Rule 2, this indicates a strong correlation, suggesting that customers who buy **Apple & Wine** are also very likely to buy **Bread**.

---

##### Rule 4: {‘Bread’, ‘Wine’} → {‘Apple’}
- **Confidence**: 0.8462 (or 84.62%)
  - This implies that **84.62%** of the customers who bought **Bread & Wine** also bought **Apple**.
- **Lift**: 1.241
  - This indicates that customers who purchase **Bread & Wine** are also likely to buy **Apple**, with a significant correlation.

