# UI for Interactive dashboard for Datascince Project

## INTRODUCTION

This project aims to uncover patterns in Netflix's vast library of films and TV shows, exploring how content types have evolved over time and identifying popular genres across different regions. We are creating visualisations to show what we found in Netflix's huge collection of shows and movies. We're aiming to spot the connections and hidden gems that could help Netflix know what its viewers are really into. The ultimate goal is to offer Netflix a strategic tool for content curation and optimization, thereby elevating viewer satisfaction and engagement. This project not only promises to refine Netflix's recommendation algorithms but also aspires to enrich the viewer's streaming experience. 


### DATA IMPORT

In [1]:
import pandas as pd
import numpy as np
from wordcloud import WordCloud, STOPWORDS
import random
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
netflix_titles_df = pd.read_csv('netflix_titles.csv')
netflix_imdb_df = pd.read_csv('netflix-imdb.csv')

In [3]:
#Cleaning the data
netflix_titles_df['director'].fillna('Unknown', inplace=True)
netflix_titles_df['cast'].fillna('Unknown', inplace=True)
netflix_titles_df['country'].fillna('Unknown', inplace=True)
netflix_titles_df['date_added'].fillna('Unknown', inplace=True)
netflix_titles_df['rating'].fillna('Unknown', inplace=True)

### DATA MERGING

In [4]:
final_df = netflix_titles_df.merge(netflix_imdb_df, on='title', how='inner')

In [5]:
headings =final_df.columns.tolist()
print(headings)

['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added', 'release_year', 'rating', 'duration', 'listed_in', 'description', 'Genre', 'Tags', 'Languages', 'Series or Movie', 'Hidden Gem Score', 'Country Availability', 'Runtime', 'Director', 'Writer', 'Actors', 'View Rating', 'IMDb Score', 'Rotten Tomatoes Score', 'Metacritic Score', 'Awards Received', 'Awards Nominated For', 'Boxoffice', 'Release Date', 'Netflix Release Date', 'Production House', 'Netflix Link', 'IMDb Link', 'Summary', 'IMDb Votes', 'Image', 'Poster', 'TMDb Trailer', 'Trailer Site']


In [6]:
final_df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,...,Netflix Release Date,Production House,Netflix Link,IMDb Link,Summary,IMDb Votes,Image,Poster,TMDb Trailer,Trailer Site
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,Unknown,United States,"September 25, 2021",2020,PG-13,90 min,...,10/2/20,,https://www.netflix.com/watch/80234465,https://www.imdb.com/title/tt11394180,"As her father nears the end of his life, filmm...",4163.0,https://occ-0-2851-1432.1.nflxso.net/dnm/api/v...,https://m.media-amazon.com/images/M/MV5BYzY5Yj...,https://www.youtube.com/watch?v=wfTmT6C5DnM,YouTube
1,s2,TV Show,Blood & Water,Unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,...,5/20/20,,https://www.netflix.com/watch/81044547,https://www.imdb.com/title/tt9839146,"After crossing paths at a party, a Cape Town t...",1799.0,https://occ-0-2579-1432.1.nflxso.net/dnm/api/v...,,,
2,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,"September 24, 2021",2021,TV-14,9 Seasons,...,10/1/15,,https://www.netflix.com/watch/80063224,https://www.imdb.com/title/tt1877368,A talented batch of amateur bakers face off in...,6815.0,https://occ-0-2219-2218.1.nflxso.net/dnm/api/v...,https://images-na.ssl-images-amazon.com/images...,,
3,s16,TV Show,Dear White People,Unknown,"Logan Browning, Brandon P. Bell, DeRon Horton,...",United States,"September 22, 2021",2021,TV-MA,4 Seasons,...,4/28/17,Duly Noted Inc.,https://www.netflix.com/watch/80095698,https://www.imdb.com/title/tt2235108,Students of color navigate the daily slights a...,24389.0,https://occ-0-2851-38.1.nflxso.net/dnm/api/v6/...,https://images-na.ssl-images-amazon.com/images...,,
4,s18,TV Show,Falsa identidad,Unknown,"Luis Ernesto Franco, Camila Sodi, Sergio Goyri...",Mexico,"September 22, 2021",2020,TV-MA,2 Seasons,...,2/20/19,,https://www.netflix.com/watch/81034775,https://www.imdb.com/title/tt8598690,Strangers Diego and Isabel flee their home in ...,177.0,https://occ-0-2851-38.1.nflxso.net/dnm/api/v6/...,https://m.media-amazon.com/images/M/MV5BZmViZT...,,


In [7]:
#droping redundant columns
columns_to_drop = ["Netflix Release Date", "Series or Movie", "Director", "Actors", "View Rating", "Rotten Tomatoes Score", "Metacritic Score", "Release Date", "Summary"] 

final_df = final_df.drop(columns=columns_to_drop)
final_df.to_csv('final_df.csv', index=False)
num_rows = final_df.shape[0]
print("Number of rows in final_merged_df:", num_rows)

Number of rows in final_merged_df: 6094


In [8]:
import dash
from dash import dcc, html, Input, Output, ctx
import plotly.express as px
import pandas as pd
from wordcloud import WordCloud
import base64
from io import BytesIO

final_df = final_df[(final_df['release_year'] >= 1980) & (final_df['release_year'] <= 2020)]

final_df['Primary Genre'] = final_df['Genre'].apply(lambda x: x.split(',')[0] if pd.notnull(x) else 'Unknown')

fig_bar = px.bar(
    final_df.groupby(['release_year', 'type']).size().reset_index(name='count'),
    x='release_year',
    y='count',
    color='type',
    title='Distribution of Netflix Content by Type and Year',
    color_discrete_map={'Movie': 'black', 'TV Show': 'red'}
)

# # Modify the trace names to remove the 'type' legend title
# for trace in fig_bar.data:
#     trace.name = trace.name.replace('Movie', 'M')
#     trace.name = trace.name.replace('TV Show', 'TV')

fig_bar.update_layout(
    title={'text': 'Distribution of Netflix Content by Type and Year', 'x': 0.5},
    legend=dict(
        x=0,
        y=0.95,
        xanchor='left',
        yanchor='top'
    )
)
side_by_side_style = {'display': 'flex', 'flex-direction': 'row'}

app = dash.Dash(__name__)

app.layout = html.Div([
    dcc.Graph(id='bar-chart', figure=fig_bar),
    html.H3(id='word-cloud-title', style={'textAlign': 'right', 'color': '#ffffff'}),
    html.Div([
        dcc.Graph(id='scatter-plot', style={'flex': 1}), 
        html.Img(id='word-cloud', style={'flex': 1})     
    ], style=side_by_side_style),
    
])

# Callback to update the scatter plot based on selected year in the bar chart
@app.callback(
    Output('scatter-plot', 'figure'),
    [Input('bar-chart', 'clickData')]
)
def update_scatter_plot(clickData):
    # Check if the callback was triggered with valid click data
    if clickData and 'points' in clickData and 'x' in clickData['points'][0]:
        selected_year = clickData['points'][0]['x']
        
        df_selected_year = final_df[final_df['release_year'] == selected_year]
        color_scale = ['black', '#8B0000', '#FF0000']
        
        if not df_selected_year.empty:
            avg_scores = df_selected_year.groupby('Primary Genre')['IMDb Score'].mean().reset_index()
            avg_scores = avg_scores.dropna(subset=['IMDb Score'])
            avg_scores['IMDb Score'] = avg_scores['IMDb Score'].round(2)
            
            fig_scatter = px.scatter(
                avg_scores,
                x='Primary Genre',
                y='IMDb Score',
                size='IMDb Score',
                color='IMDb Score',
                title=f'Average IMDb Score by Primary Genre for {selected_year}',
                color_continuous_scale=color_scale
            )
            fig_scatter.update_traces(marker=dict(line=dict(width=2, color='DarkSlateGrey')))
            fig_scatter.update_layout(title={'text': f'Average IMDb Score by Primary Genre for {selected_year}', 'x': 0.5})
            return fig_scatter
        else:
            fig_empty = px.scatter(title='Click on a year to see the average IMDb scores')
            fig_empty.update_layout(title={'text': 'Click on a year to see the average IMDb scores', 'x': 0.5})
            return fig_empty
    fig_empty = px.scatter(title='Click on a year to see the average IMDb scores')
    fig_empty.update_layout(title={'text': 'Click on a year to see the average IMDb scores', 'x': 0.5})
    return fig_empty


# Callback to update the word cloud based on selected genre in the scatter plot
@app.callback(
    [Output('word-cloud', 'src'),
     Output('word-cloud-title', 'children')],
    [Input('scatter-plot', 'clickData')]
)
def update_word_cloud(clickData):
    if clickData:
        selected_genre = clickData['points'][0]['x']
        df_genre = final_df[final_df['Primary Genre'] == selected_genre]
        text = ' '.join(df_genre['description'])
        
        def red_color_func(word, font_size, position, orientation, random_state=None, **kwargs):
            return "hsl(0, 100%%, %d%%)" % random.randint(10, 50)
        
        wordcloud = WordCloud(width=800, height=400, background_color='black', color_func=red_color_func).generate(text)
        img = BytesIO()
        wordcloud.to_image().save(img, format='PNG')
        src = 'data:image/png;base64,{}'.format(base64.b64encode(img.getvalue()).decode())
        title = f'Word Cloud for "{selected_genre}" Genre'
        return src, title
    else:
        return None, 'Select a genre to see the word cloud'
    
if __name__ == '__main__':
    app.run_server(debug=True)