Problem Statement: 

A film distribution company wants to target audience based on their likes and dislikes, you as a Chief Data Scientist Analyze the data and come up with different rules of movie list so that the business objective is achieved.


Business Problem:

This dataset can help solve various business problems, especially for streaming platforms or movie theaters:

Recommendation Systems: Build a recommendation engine that suggests movies to users based on their viewing history or ratings. This helps improve user engagement and retention.

Understanding Popularity: Analyzing which genres, actors, or directors are most popular among viewers helps in deciding which content to invest in or acquire.

Content Strategy: For streaming platforms, the data can be used to decide which movies to produce, license, or retire based on viewership trends.

Sentiment Analysis: By incorporating user reviews or ratings, businesses can gauge audience satisfaction and identify which movies are liked or disliked.

Constraints:

Sparse Data: Some users may have rated or watched very few movies, making it hard to personalize recommendations.

Cold Start Problem: New users or new movies added to the platform will not have enough data for accurate recommendations.

Inconsistent Metadata: Missing or incomplete data about movies (e.g., release dates, genres, or directors) can lead to difficulties in categorizing and recommending films.

Bias in Ratings: Popular movies may receive more ratings than niche ones, leading to a skew in recommendations. Also, user ratings might be biased (e.g., rating a movie higher because of a famous actor).

In [3]:
#First import all the libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori,association_rules
from mlxtend.preprocessing import TransactionEncoder

In [5]:
#load dataset
df=pd.read_csv('my_movies.csv', on_bad_lines='skip')
df.head()

Unnamed: 0,Sixth Sense,Gladiator,LOTR1,Harry Potter1,Patriot,LOTR2,Harry Potter2,LOTR,Braveheart,Green Mile
0,1,0,1,1,0,1,0,0,0,1
1,0,1,0,0,1,0,0,0,1,0
2,0,0,1,0,0,1,0,0,0,0
3,1,1,0,0,1,0,0,0,0,0
4,1,1,0,0,1,0,0,0,0,0


In [7]:
#Our dataset is in proper format that is required for Apriori
#step2:Apply the apriori algorithm to find frequent itemsets
frequent_itemsets=apriori(df,min_support=0.2,use_colnames=True)
frequent_itemsets



Unnamed: 0,support,itemsets
0,0.6,(Sixth Sense)
1,0.7,(Gladiator)
2,0.2,(LOTR1)
3,0.2,(Harry Potter1)
4,0.6,(Patriot)
5,0.2,(LOTR2)
6,0.2,(Green Mile)
7,0.5,"(Sixth Sense, Gladiator)"
8,0.4,"(Sixth Sense, Patriot)"
9,0.2,"(Sixth Sense, Green Mile)"


In [9]:
#step3:Generate association rules from the frequent itemsets
rules=association_rules(frequent_itemsets,metric="lift",min_threshold=1)

In [11]:
#step4:Output the results
print("Frequent Itemsets:")
print(frequent_itemsets)

Frequent Itemsets:
    support                           itemsets
0       0.6                      (Sixth Sense)
1       0.7                        (Gladiator)
2       0.2                            (LOTR1)
3       0.2                    (Harry Potter1)
4       0.6                          (Patriot)
5       0.2                            (LOTR2)
6       0.2                       (Green Mile)
7       0.5           (Sixth Sense, Gladiator)
8       0.4             (Sixth Sense, Patriot)
9       0.2          (Sixth Sense, Green Mile)
10      0.6               (Patriot, Gladiator)
11      0.2                     (LOTR1, LOTR2)
12      0.4  (Sixth Sense, Gladiator, Patriot)


In [13]:
print("\nAssociation Rules:")
print(rules[['antecedents','consequents','support','confidence','lift']])



Association Rules:
                 antecedents               consequents  support  confidence  \
0              (Sixth Sense)               (Gladiator)      0.5    0.833333   
1                (Gladiator)             (Sixth Sense)      0.5    0.714286   
2              (Sixth Sense)                 (Patriot)      0.4    0.666667   
3                  (Patriot)             (Sixth Sense)      0.4    0.666667   
4              (Sixth Sense)              (Green Mile)      0.2    0.333333   
5               (Green Mile)             (Sixth Sense)      0.2    1.000000   
6                  (Patriot)               (Gladiator)      0.6    1.000000   
7                (Gladiator)                 (Patriot)      0.6    0.857143   
8                    (LOTR1)                   (LOTR2)      0.2    1.000000   
9                    (LOTR2)                   (LOTR1)      0.2    1.000000   
10  (Sixth Sense, Gladiator)                 (Patriot)      0.4    0.800000   
11    (Sixth Sense, Patriot)    

Using Association Rule Learning to analyze movie viewing data empowers streaming platforms or cinemas to better understand customer preferences. By leveraging user watch history and ratings, the platform can deliver personalized recommendations that improve viewer engagement and satisfaction. Insights into popular genres, directors, or actors help refine content acquisition and marketing strategies, ensuring that the most appealing content is promoted to the right audiences. The ability to tailor recommendations based on user behavior fosters customer loyalty, increases viewership, and enhances subscription retention rates. With continuous feedback and adaptation, the platform gains a competitive edge, driving increased revenue and maintaining customer interest.