<a href="https://colab.research.google.com/github/anjha1/Data-Science/blob/main/GLA/Non-Negative%20Matrix%20Factorization%20(NMF)/9_and_10_FP_Growth_Algorithm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Objective: Generate frequent itemsets from transaction data using the FP-Growth algorithm.**

In [5]:
import pandas as pd
from mlxtend.frequent_patterns import fpgrowth

In [7]:
# Step 1: Sample transaction dataset
# Each transaction contains a list of purchased items
data = pd.DataFrame({
    'Transaction': ['T1', 'T2', 'T3', 'T4', 'T5'],
    'Items': [['A', 'B', 'D'],
              ['B', 'C', 'E'],
              ['A', 'B', 'D', 'E'],
              ['A', 'E'],
              ['A', 'B', 'C', 'E']]
})

In [8]:
# Step 2: Convert item lists into one-hot encoded format
# This format is required by the FP-Growth algorithm
one_hot = data['Items'].str.join('|').str.get_dummies()

In [9]:
# Step 3: Generate frequent itemsets using FP-Growth
# Minimum support is set to 0.4 (i.e., 40% of transactions)
frequent_itemsets = fpgrowth(one_hot, min_support=0.4, use_colnames=True)



In [10]:
# Step 4: Display the resulting frequent itemsets
print(frequent_itemsets)


    support   itemsets
0       0.8        (A)
1       0.8        (B)
2       0.4        (D)
3       0.8        (E)
4       0.4        (C)
5       0.6     (E, A)
6       0.6     (B, A)
7       0.6     (B, E)
8       0.4  (B, E, A)
9       0.4     (B, D)
10      0.4     (D, A)
11      0.4  (B, D, A)
12      0.4     (C, B)
13      0.4     (C, E)
14      0.4  (C, B, E)


**10. Recommender System using Collaborative Filtering**

Objective: Find users with similar movie preferences using distance metrics (Euclidean and Cosine)

 Code for Similarity Using Euclidean Distance

In [2]:
from scipy import spatial

# User ratings for two movies
a = [1, 2]
b = [2, 4]
c = [2.5, 4]
d = [4.5, 5]

# Calculate Euclidean distances between user C and others
print(spatial.distance.euclidean(c, a))  # Distance between C and A
print(spatial.distance.euclidean(c, b))  # Distance between C and B
print(spatial.distance.euclidean(c, d))  # Distance between C and D


2.5
0.5
2.23606797749979


Code for Similarity Using Cosine Distance

In [3]:
from scipy import spatial

# Same user ratings
a = [1, 2]
b = [2, 4]
c = [2.5, 4]
d = [4.5, 5]

# Calculate Cosine distances between C and others
print(spatial.distance.cosine(c, a))  # C vs A
print(spatial.distance.cosine(c, b))  # C vs B
print(spatial.distance.cosine(c, d))  # C vs D
print(spatial.distance.cosine(a, b))  # A vs B


0.004504527406047898
0.004504527406047898
0.015137225946083022
0.0


**Interpretation:**
Cosine distance suggests that C is equally close to A and B, and much farther from D.
Interestingly, A and B have zero cosine distance, meaning they are directionally identical in preferences.



**Euclidean Distance** considers absolute rating values.

**Cosine Distance** focuses on the angle (preference pattern).

For recommender systems, cosine similarity is often preferred when rating scales vary among users.