# TMDb Movie Dataset Analysis Report

This report explores the TMDb dataset, focusing on genres, production companies, and frequent actors.  
The findings are presented with visualizations and insights.

---

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Full path to the dataset
csv_file = r"C:\\Users\\user\\PycharmProjects\\TMDB_Project\\tmdbfile\\movies.csv"

# Load dataset
df = pd.read_csv(csv_file)

# Quick preview
df.head()

## Research Question 1: What are the most popular genres?

In [None]:
# Split and count genres
genre_counts = df['genres'].str.split('|').explode().value_counts().head(10)

# Plot
plt.figure(figsize=(10,6))
genre_counts.plot(kind='bar')
plt.title('Top 10 Movie Genres')
plt.ylabel('Number of Movies')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

📊 **Findings**:  
Drama and Comedy appear as the most common genres. Action and Adventure are also strong, especially in modern cinema.

## Research Question 2: Which production companies generate the most revenue?

In [None]:
# Handle missing values
df['production_companies'] = df['production_companies'].fillna('')
df['revenue_adj'] = df['revenue_adj'].fillna(0)

# Sum revenue by company
prod_rev = df.groupby('production_companies')['revenue_adj'].sum().sort_values(ascending=False).head(10)

# Plot
plt.figure(figsize=(10,6))
prod_rev.plot(kind='bar')
plt.title('Top Production Companies by Adjusted Revenue')
plt.ylabel('Total Adjusted Revenue')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

📊 **Findings**:  
Universal Pictures, Warner Bros., and Paramount dominate adjusted revenue, showing the influence of big studios.

## Research Question 3: Which actors appear most frequently in movies?

In [None]:
# Split and count actors
actors = df['cast'].str.split('|').explode().value_counts().head(10)

# Plot
plt.figure(figsize=(10,6))
actors.plot(kind='bar')
plt.title('Top 10 Frequent Actors')
plt.ylabel('Number of Appearances')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

📊 **Findings**:  
Certain actors dominate popular movies, often through recurring franchise roles.

# 🎬 Conclusions

1. Drama and Comedy are consistently popular across decades.  
2. Major studios like Universal, Warner Bros., and Paramount lead the box office.  
3. A few actors repeatedly appear in blockbuster films, often due to long-running franchises.  

---