# 1.Business Problem

In [1]:
#1.1. Business Objective

#The objective is to help a film distribution company target their audience based on movie preferences (likes/dislikes).
#By analyzing the data, we will identify frequent movie-watching patterns 
#and generate rules that show which movies are often liked together.
#This will help recommend relevant movies to specific customer segments.

In [3]:
#1.2 Constraits

#Availability and completeness of data (missing data or limited customer preferences).

# 2.Data Dictionary

In [4]:
#All of the columns are categorical because they represent a yes/no (binary) choice indicating user preferences for different movies.

In [6]:
import pandas as pd  # Used for data manipulation and analysis
from mlxtend.frequent_patterns import apriori, association_rules 

In [None]:
#  Load the dataset
file_path = 'my_movies.csv'  
df = pd.read_csv(file_path)

# 3.Data Modeling

In [14]:
# Data Cleaning & Feature Engineering
#The dataset is clean with no missing values. 
#The data is already binary, which suits the requirements for association rule mining using the Apriori algorithm. 

df.isnull().sum()


Sixth Sense      0
Gladiator        0
LOTR1            0
Harry Potter1    0
Patriot          0
LOTR2            0
Harry Potter2    0
LOTR             0
Braveheart       0
Green Mile       0
dtype: int64

In [15]:
# Step 2: Convert integer columns (0 and 1) to boolean (False and True)
# Since the dataset is already in the form of 0's and 1's, we can convert it to boolean.
df = df.astype(bool)

In [16]:
# Step 3: Apply the Apriori algorithm to find frequent itemsets
# We can set min_support to a reasonable threshold, such as 0.1 (i.e., 10%)
frequent_itemsets = apriori(df, min_support=0.1, use_colnames=True)

In [18]:
# Step 4: Generate association rules from the frequent itemsets
# We use 'confidence' as the metric, with a minimum confidence threshold (e.g., 70%)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Gladiator),(Sixth Sense),0.7,0.6,0.5,0.714286,1.190476,0.08,1.4,0.533333
1,(Sixth Sense),(Gladiator),0.6,0.7,0.5,0.833333,1.190476,0.08,1.8,0.400000
2,(LOTR),(Sixth Sense),0.1,0.6,0.1,1.000000,1.666667,0.04,inf,0.444444
3,(Green Mile),(Sixth Sense),0.2,0.6,0.2,1.000000,1.666667,0.08,inf,0.500000
4,(Gladiator),(Patriot),0.7,0.6,0.6,0.857143,1.428571,0.18,2.8,1.000000
...,...,...,...,...,...,...,...,...,...,...
124,"(Sixth Sense, LOTR2)","(Harry Potter1, Green Mile, LOTR1)",0.1,0.1,0.1,1.000000,10.000000,0.09,inf,1.000000
125,"(Harry Potter1, Sixth Sense)","(Green Mile, LOTR2, LOTR1)",0.1,0.1,0.1,1.000000,10.000000,0.09,inf,1.000000
126,"(LOTR2, Green Mile)","(Harry Potter1, Sixth Sense, LOTR1)",0.1,0.1,0.1,1.000000,10.000000,0.09,inf,1.000000
127,"(Harry Potter1, LOTR2)","(Green Mile, Sixth Sense, LOTR1)",0.1,0.1,0.1,1.000000,10.000000,0.09,inf,1.000000


Lift
Lift indicates the strength of an association between items. A lift value greater than 1 suggests a strong positive association, meaning the presence of the antecedent increases the likelihood of the consequent being present.

Confidence
Confidence measures the probability that a customer will purchase the consequent given that they have already purchased the antecedent. A higher confidence value signifies a stronger association between the two items.

In [19]:
# Step 5: Output the results
print("Frequent Itemsets:")
print(frequent_itemsets)



Frequent Itemsets:
    support                                           itemsets
0       0.6                                      (Sixth Sense)
1       0.7                                        (Gladiator)
2       0.2                                            (LOTR1)
3       0.2                                    (Harry Potter1)
4       0.6                                          (Patriot)
5       0.2                                            (LOTR2)
6       0.1                                    (Harry Potter2)
7       0.1                                             (LOTR)
8       0.1                                       (Braveheart)
9       0.2                                       (Green Mile)
10      0.5                           (Gladiator, Sixth Sense)
11      0.1                               (Sixth Sense, LOTR1)
12      0.1                       (Harry Potter1, Sixth Sense)
13      0.4                             (Patriot, Sixth Sense)
14      0.1                         

The analysis of frequent itemsets reveals crucial insights into viewer preferences and behavior. By leveraging this information, the film distribution company can enhance its recommendation systems and targeted marketing strategies to increase viewer engagement and satisfaction.

In [20]:
print("\nAssociation Rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])



Association Rules:
                      antecedents                          consequents  \
0                     (Gladiator)                        (Sixth Sense)   
1                   (Sixth Sense)                          (Gladiator)   
2                          (LOTR)                        (Sixth Sense)   
3                    (Green Mile)                        (Sixth Sense)   
4                     (Gladiator)                            (Patriot)   
..                            ...                                  ...   
124          (Sixth Sense, LOTR2)   (Harry Potter1, Green Mile, LOTR1)   
125  (Harry Potter1, Sixth Sense)           (Green Mile, LOTR2, LOTR1)   
126           (LOTR2, Green Mile)  (Harry Potter1, Sixth Sense, LOTR1)   
127        (Harry Potter1, LOTR2)     (Green Mile, Sixth Sense, LOTR1)   
128   (Harry Potter1, Green Mile)          (Sixth Sense, LOTR2, LOTR1)   

     support  confidence       lift  
0        0.5    0.714286   1.190476  
1        0.5   

The association rules analysis reveals significant relationships between movie preferences, indicating strong positive associations (lift values > 1) between various films. For example, if customers watch **'Gladiator,'** they are likely to also watch **'Sixth Sense,'** with a confidence of 71.43%. Additionally, the rules with a lift of 10 suggest a highly synergistic viewing experience, indicating that combinations like **'Harry Potter1'** and **'Green Mile'** are strongly linked, providing opportunities for targeted marketing and recommendations.

In [22]:
#Business Impact


#1.Increased Customer Satisfaction: By offering personalized movie recommendations, customers are more likely to find films they enjoy, leading to repeat purchases.

#2.Higher Sales: Targeted marketing efforts based on association rules can boost sales through effective promotions.

#3.Enhanced Customer Insights: Understanding audience preferences enables the company to make informed decisions about future movie acquisitions and marketing strategies.