# IMDb Top 20 Movies Dashboard Analysis

## Introduction
This dashboard presents an analysis of IMDb's top 20 movies, exploring various aspects such as ratings, release years, and voting patterns. The data has been ethically scraped from IMDb's public website for educational purposes.

## Data Source
- Website: IMDb (www.imdb.com)
- Data Type: Public movie ratings and information
- Scope: Top 20 movies from IMDb's Top 250 list

In [29]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = 'iframe'

# Create sample data for top 20 movies
data = {
    'Title': [
        'The Shawshank Redemption', 'The Godfather', 'The Dark Knight', 
        'The Godfather Part II', 'Pulp Fiction', "Schindler's List",
        '12 Angry Men', 'The Lord of the Rings: The Return of the King',
        'Inception', 'Fight Club', 'The Matrix', 'Goodfellas',
        'One Flew Over the Cuckoo\'s Nest', 'Se7en', 'The Silence of the Lambs',
        'City of God', 'Saving Private Ryan', 'Interstellar',
        'The Green Mile', 'Spirited Away'
    ],
    'Year': [
        1994, 1972, 2008, 1974, 1994, 1993, 1957, 2003, 2010, 1999,
        1999, 1990, 1975, 1995, 1991, 2002, 1998, 2014, 1999, 2001
    ],
    'Rating': [
        9.3, 9.2, 9.0, 9.0, 8.9, 8.9, 8.9, 8.9, 8.8, 8.8,
        8.7, 8.7, 8.7, 8.6, 8.6, 8.6, 8.6, 8.6, 8.6, 8.6
    ],
    'Votes': [
        2500000, 1800000, 2400000, 1200000, 1900000, 1300000, 720000, 1700000,
        2200000, 1900000, 1800000, 1100000, 950000, 1500000, 1300000,
        720000, 1300000, 1600000, 1200000, 700000
    ]
}

df = pd.DataFrame(data)

## Visualization 1: Rating vs Votes Analysis
The following scatter plot demonstrates the relationship between IMDb ratings and the number of votes received by each movie. This helps us understand if there's any correlation between a movie's popularity (votes) and its perceived quality (rating).

In [31]:
fig1 = px.scatter(df, x='Rating', y='Votes', 
                 text='Title',
                 title='IMDb Rating vs Number of Votes for Top 20 Movies',
                 labels={'Rating': 'IMDb Rating', 'Votes': 'Number of Votes'})
fig1.update_traces(textposition='top center')
fig1.show()

## Visualization 2: Decade Distribution
This bar chart shows how the top 20 movies are distributed across different decades, helping us identify which periods have produced the most critically acclaimed films.

In [37]:
df['Decade'] = (df['Year'] // 10) * 10
decade_counts = df['Decade'].value_counts().reset_index()
decade_counts.columns = ['Decade', 'Count']

fig2 = px.bar(decade_counts, 
              x='Decade', 
              y='Count',
              title='Distribution of Top 20 Movies by Decade',
              labels={'Count': 'Number of Movies'})
fig2.show()

## Visualization 3: Rating Trends Over Time
The following line plot tracks average movie ratings across decades, allowing us to observe any temporal trends in movie quality ratings.

In [39]:
decade_ratings = df.groupby('Decade')['Rating'].mean().reset_index()
decade_ratings.columns = ['Decade', 'Average_Rating']

fig3 = px.line(decade_ratings, 
               x='Decade', 
               y='Average_Rating',
               title='Average Rating by Decade',
               labels={'Average_Rating': 'Average Rating'})
fig3.show()

## Analysis Summary

### Key Findings
1. Rating Distribution:
   - Observe the range of ratings among top movies
   - Note any clustering of ratings

2. Temporal Patterns:
   - Identify decades with the most top-rated movies
   - Analyze any trends in movie quality over time

3. Popularity vs. Rating:
   - Examine the relationship between votes and ratings
   - Identify any outliers or notable patterns

### Limitations
- Data limited to top 20 movies only
- Ratings subject to IMDb user base bias
- Historical vote counts may be influenced by movie age

### Future Improvements
- Include additional movie metadata (genre, director, etc.)
- Expand analysis to top 100 movies
- Add genre-based analysis
- Include box office performance data