# Film Endüstrisi ve Makroekonomik Koşullar Analizi

## DSA210 - Term Project

Bu proje, film endüstrisinin finansal performansını (ROI, gişe geliri) global ekonomik koşullarla (GDP büyümesi, işsizlik, enflasyon) ilişkilendirmeyi amaçlamaktadır.

### Hipotezler

- **H1:** Ekonomik durgunluk yıllarında düşük bütçeli filmler daha yüksek ROI elde eder
- **H2:** Yüksek işsizlik dönemlerinde escapist türler (komedi, fantastik, animasyon) gişede daha başarılı olur
- **H3:** 2008 krizi döneminde film endüstrisinin ROI dağılımı normal yıllara göre anlamlı farklılık gösterir
- **H4:** Ülke GDP'si ile o ülke yapımı filmlerin ortalama bütçesi arasında pozitif korelasyon vardır

## 1. Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

import warnings
warnings.simplefilter('ignore')

%matplotlib inline

## 2. Loading the Data

### 2.1 Movie Data (IMDB 5000)

In [None]:
# Load movie data
df_movies = pd.read_csv('data/movie_metadata.csv')

print(f"Rows: {df_movies.shape[0]}, Columns: {df_movies.shape[1]}")
df_movies.head()

In [None]:
df_movies.info()

In [None]:
df_movies.describe()

### 2.2 World Bank Economic Data

In [None]:
# Load World Bank data
df_wb_raw = pd.read_csv('data/world_bank_data.csv')

print(f"Rows: {df_wb_raw.shape[0]}, Columns: {df_wb_raw.shape[1]}")
df_wb_raw.head()

## 3. Data Cleaning and Preparation

### 3.1 Cleaning Movie Data

In [None]:
# Check missing values
missing = df_movies.isnull().sum()
missing_pct = (missing / len(df_movies) * 100).round(2)
missing_df = pd.DataFrame({'Missing Count': missing, 'Percentage (%)': missing_pct})
missing_df[missing_df['Missing Count'] > 0].sort_values('Percentage (%)', ascending=False)

In [None]:
# Select relevant columns
movie_cols = ['movie_title', 'title_year', 'country', 'budget', 'gross', 
              'genres', 'director_name', 'actor_1_name', 'imdb_score']

df_movies_clean = df_movies[movie_cols].copy()

# Drop rows with missing budget, gross, year, or country
df_movies_clean = df_movies_clean.dropna(subset=['budget', 'gross', 'title_year', 'country'])

# Remove rows where budget or gross is 0
df_movies_clean = df_movies_clean[(df_movies_clean['budget'] > 0) & (df_movies_clean['gross'] > 0)]

# Convert year to integer
df_movies_clean['title_year'] = df_movies_clean['title_year'].astype(int)

print(f"Cleaned movie data: {df_movies_clean.shape[0]} rows")

In [None]:
# Calculate ROI and profit
df_movies_clean['roi'] = (df_movies_clean['gross'] - df_movies_clean['budget']) / df_movies_clean['budget']
df_movies_clean['profit'] = df_movies_clean['gross'] - df_movies_clean['budget']
df_movies_clean['is_profitable'] = df_movies_clean['profit'] > 0

df_movies_clean.head()

In [None]:
# Create budget categories
def budget_category(budget):
    if budget < 10_000_000:
        return 'Low'
    elif budget < 50_000_000:
        return 'Medium'
    else:
        return 'High'

df_movies_clean['budget_category'] = df_movies_clean['budget'].apply(budget_category)

print("Budget Category Distribution:")
print(df_movies_clean['budget_category'].value_counts())

In [None]:
# Extract main genre
df_movies_clean['main_genre'] = df_movies_clean['genres'].apply(lambda x: x.split('|')[0] if pd.notna(x) else 'Unknown')

# Define escapist genres
escapist_genres = ['Comedy', 'Animation', 'Fantasy', 'Family', 'Adventure', 'Musical']
df_movies_clean['is_escapist'] = df_movies_clean['main_genre'].isin(escapist_genres)

print("Genre Distribution (Top 10):")
print(df_movies_clean['main_genre'].value_counts().head(10))

In [None]:
# Country distribution
print("Country Distribution (Top 10):")
print(df_movies_clean['country'].value_counts().head(10))

### 3.2 Cleaning World Bank Data

In [None]:
# Transform World Bank data from wide to long format
year_cols = [col for col in df_wb_raw.columns if 'YR' in col]
id_cols = ['Country Name', 'Country Code', 'Series Name', 'Series Code']

df_wb_long = df_wb_raw.melt(
    id_vars=id_cols,
    value_vars=year_cols,
    var_name='Year_Raw',
    value_name='Value'
)

# Extract year from "1990 [YR1990]" format
df_wb_long['Year'] = df_wb_long['Year_Raw'].str.extract(r'(\d{4})').astype(int)

# Convert '..' to NaN
df_wb_long['Value'] = pd.to_numeric(df_wb_long['Value'], errors='coerce')

df_wb_long.head()

In [None]:
# Map series names to shorter names
series_mapping = {
    'GDP growth (annual %)': 'gdp_growth',
    'GDP per capita (current US$)': 'gdp_per_capita',
    'Unemployment, total (% of total labor force) (modeled ILO estimate)': 'unemployment',
    'Inflation, consumer prices (annual %)': 'inflation'
}

df_wb_long['Indicator'] = df_wb_long['Series Name'].map(series_mapping)

# Pivot to get each indicator as a column
df_economic = df_wb_long.pivot_table(
    index=['Country Name', 'Country Code', 'Year'],
    columns='Indicator',
    values='Value'
).reset_index()

df_economic.columns.name = None

print(f"Economic data: {df_economic.shape[0]} rows")
df_economic.head()

In [None]:
# Define economic periods
def economic_period(year):
    if year in [2008, 2009]:
        return 'Crisis_2008'
    elif year == 2020:
        return 'Crisis_COVID'
    else:
        return 'Normal'

df_economic['economic_period'] = df_economic['Year'].apply(economic_period)

# Define GDP status
def gdp_status(gdp_growth):
    if pd.isna(gdp_growth):
        return 'Unknown'
    elif gdp_growth < 0:
        return 'Recession'
    elif gdp_growth < 2:
        return 'Slow_Growth'
    else:
        return 'Growth'

df_economic['gdp_status'] = df_economic['gdp_growth'].apply(gdp_status)

print("Economic Period Distribution:")
print(df_economic['economic_period'].value_counts())

### 3.3 Merging Datasets

In [None]:
# Country name mapping
country_mapping = {
    'USA': 'United States',
    'UK': 'United Kingdom',
    'South Korea': 'Korea, Rep.',
    'Hong Kong': 'Hong Kong SAR, China',
    'Russia': 'Russian Federation',
    'Iran': 'Iran, Islamic Rep.',
    'Czech Republic': 'Czechia',
    'West Germany': 'Germany'
}

df_movies_clean['country_mapped'] = df_movies_clean['country'].replace(country_mapping)

# Merge datasets
df_merged = pd.merge(
    df_movies_clean,
    df_economic,
    left_on=['country_mapped', 'title_year'],
    right_on=['Country Name', 'Year'],
    how='left'
)

print(f"Merged data: {df_merged.shape[0]} rows")
print(f"Movies with economic data: {df_merged['gdp_growth'].notna().sum()}")

In [None]:
df_merged.head()

## 4. Exploratory Data Analysis (EDA)

### 4.1 Movie Data Overview

In [None]:
print("=" * 50)
print("MOVIE DATA SUMMARY STATISTICS")
print("=" * 50)
print(f"Total movies: {len(df_movies_clean)}")
print(f"Year range: {df_movies_clean['title_year'].min()} - {df_movies_clean['title_year'].max()}")
print(f"Average budget: ${df_movies_clean['budget'].mean():,.0f}")
print(f"Average gross: ${df_movies_clean['gross'].mean():,.0f}")
print(f"Average ROI: {df_movies_clean['roi'].mean():.2f}")
print(f"Profitable movie ratio: {df_movies_clean['is_profitable'].mean()*100:.1f}%")

In [None]:
# ROI Distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# ROI Histogram
roi_filtered = df_movies_clean[df_movies_clean['roi'].between(-1, 10)]['roi']
axes[0].hist(roi_filtered, bins=50, edgecolor='black', alpha=0.7)
axes[0].axvline(x=0, color='red', linestyle='--', label='Break-even')
axes[0].set_xlabel('ROI')
axes[0].set_ylabel('Number of Movies')
axes[0].set_title('ROI Distribution')
axes[0].legend()

# Budget vs Gross
axes[1].scatter(df_movies_clean['budget']/1e6, df_movies_clean['gross']/1e6, alpha=0.5)
axes[1].plot([0, 300], [0, 300], 'r--', label='Break-even')
axes[1].set_xlabel('Budget (Million $)')
axes[1].set_ylabel('Gross (Million $)')
axes[1].set_title('Budget vs Gross Revenue')
axes[1].legend()

plt.tight_layout()
plt.show()

In [None]:
# Yearly trends
yearly_stats = df_movies_clean.groupby('title_year').agg({
    'movie_title': 'count',
    'roi': 'mean',
    'budget': 'mean',
    'gross': 'mean'
}).rename(columns={'movie_title': 'film_count'})

fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Number of movies per year
axes[0].bar(yearly_stats.index, yearly_stats['film_count'], color='steelblue', alpha=0.7)
axes[0].axvspan(2008, 2009, alpha=0.3, color='red', label='2008 Crisis')
axes[0].set_ylabel('Number of Movies')
axes[0].set_title('Number of Movies per Year')
axes[0].legend()

# Average ROI per year
axes[1].plot(yearly_stats.index, yearly_stats['roi'], marker='o', color='green')
axes[1].axhline(y=0, color='red', linestyle='--')
axes[1].axvspan(2008, 2009, alpha=0.3, color='red', label='2008 Crisis')
axes[1].set_xlabel('Year')
axes[1].set_ylabel('Average ROI')
axes[1].set_title('Average ROI per Year')
axes[1].legend()

plt.tight_layout()
plt.show()

In [None]:
# ROI by genre
genre_roi = df_movies_clean.groupby('main_genre').agg({
    'roi': ['mean', 'median', 'count']
}).round(2)
genre_roi.columns = ['mean_roi', 'median_roi', 'count']
genre_roi = genre_roi[genre_roi['count'] >= 20].sort_values('median_roi', ascending=False)

plt.figure(figsize=(12, 6))
colors = ['green' if g in escapist_genres else 'steelblue' for g in genre_roi.index]
plt.barh(genre_roi.index, genre_roi['median_roi'], color=colors, alpha=0.7)
plt.axvline(x=0, color='red', linestyle='--')
plt.xlabel('Median ROI')
plt.title('Median ROI by Genre (Green = Escapist Genres)')
plt.tight_layout()
plt.show()

print(genre_roi)

### 4.2 Economic Data Analysis

In [None]:
# USA economic data and movie performance
df_usa = df_merged[df_merged['country'] == 'USA'].copy()

usa_yearly = df_usa.groupby('title_year').agg({
    'roi': 'mean',
    'budget': 'mean',
    'gross': 'mean',
    'gdp_growth': 'first',
    'unemployment': 'first',
    'inflation': 'first'
}).dropna()

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# GDP Growth vs ROI
axes[0, 0].scatter(usa_yearly['gdp_growth'], usa_yearly['roi'])
axes[0, 0].set_xlabel('GDP Growth (%)')
axes[0, 0].set_ylabel('Average ROI')
axes[0, 0].set_title('GDP Growth vs Movie ROI (USA)')

# Unemployment vs ROI
axes[0, 1].scatter(usa_yearly['unemployment'], usa_yearly['roi'])
axes[0, 1].set_xlabel('Unemployment Rate (%)')
axes[0, 1].set_ylabel('Average ROI')
axes[0, 1].set_title('Unemployment vs Movie ROI (USA)')

# GDP Growth over time
axes[1, 0].plot(usa_yearly.index, usa_yearly['gdp_growth'], 'b-')
axes[1, 0].axhline(y=0, color='red', linestyle='--')
axes[1, 0].axvspan(2008, 2009, alpha=0.3, color='red')
axes[1, 0].set_xlabel('Year')
axes[1, 0].set_ylabel('GDP Growth (%)')
axes[1, 0].set_title('USA GDP Growth')

# Unemployment over time
axes[1, 1].plot(usa_yearly.index, usa_yearly['unemployment'], 'orange')
axes[1, 1].axvspan(2008, 2009, alpha=0.3, color='red')
axes[1, 1].set_xlabel('Year')
axes[1, 1].set_ylabel('Unemployment Rate (%)')
axes[1, 1].set_title('USA Unemployment Rate')

plt.tight_layout()
plt.show()

In [None]:
# Correlation matrix
correlation_cols = ['budget', 'gross', 'roi', 'imdb_score', 'gdp_growth', 'unemployment', 'inflation', 'gdp_per_capita']
corr_data = df_merged[correlation_cols].dropna()

corr_matrix = corr_data.corr()

plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='RdYlBu_r', center=0, fmt='.2f')
plt.title('Correlation Matrix')
plt.tight_layout()
plt.show()

## 5. Hypothesis Testing

### 5.1 H1: Low-budget films have higher ROI during economic recession

**Null Hypothesis (H0):** There is no difference in ROI for low-budget films between recession and normal periods.

**Alternative Hypothesis (H1):** Low-budget films have higher ROI during recession periods.

In [None]:
# Define recession years
df_merged['is_recession'] = (df_merged['gdp_growth'] < 0) | (df_merged['title_year'].isin([2008, 2009]))

# Get ROI for low-budget films
low_budget = df_merged[df_merged['budget_category'] == 'Low']

roi_recession = low_budget[low_budget['is_recession'] == True]['roi'].dropna()
roi_normal = low_budget[low_budget['is_recession'] == False]['roi'].dropna()

print("H1: Low-budget films have higher ROI during economic recession")
print("=" * 60)
print(f"Recession period - Low budget film count: {len(roi_recession)}")
print(f"Normal period - Low budget film count: {len(roi_normal)}")
print(f"\nRecession period mean ROI: {roi_recession.mean():.2f}")
print(f"Normal period mean ROI: {roi_normal.mean():.2f}")

In [None]:
# Mann-Whitney U test (non-parametric)
stat, p_value = stats.mannwhitneyu(roi_recession, roi_normal, alternative='greater')

print(f"Mann-Whitney U Test:")
print(f"Test statistic: {stat:.2f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("\nWe reject the null hypothesis. Low-budget films have significantly higher ROI during recession.")
else:
    print("\nWe fail to reject the null hypothesis. No significant difference found.")

In [None]:
# Visualization for H1
fig, ax = plt.subplots(figsize=(10, 6))

budget_cats = ['Low', 'Medium', 'High']
recession_means = []
normal_means = []

for cat in budget_cats:
    cat_data = df_merged[df_merged['budget_category'] == cat]
    recession_means.append(cat_data[cat_data['is_recession'] == True]['roi'].mean())
    normal_means.append(cat_data[cat_data['is_recession'] == False]['roi'].mean())

x = np.arange(len(budget_cats))
width = 0.35

bars1 = ax.bar(x - width/2, recession_means, width, label='Recession Period', color='salmon')
bars2 = ax.bar(x + width/2, normal_means, width, label='Normal Period', color='steelblue')

ax.set_ylabel('Average ROI')
ax.set_title('ROI by Budget Category: Recession vs Normal Period')
ax.set_xticks(x)
ax.set_xticklabels(budget_cats)
ax.legend()
ax.axhline(y=0, color='black', linestyle='--', alpha=0.3)

plt.tight_layout()
plt.show()

### 5.2 H2: Escapist genres are more successful during high unemployment periods

**Null Hypothesis (H0):** There is no difference in ROI between escapist and non-escapist genres during high unemployment.

**Alternative Hypothesis (H1):** Escapist genres have higher ROI during high unemployment periods.

In [None]:
# Define high unemployment (above median)
unemployment_median = df_merged['unemployment'].median()
df_merged['high_unemployment'] = df_merged['unemployment'] > unemployment_median

# Get ROI for escapist vs non-escapist during high unemployment
high_unemp = df_merged[df_merged['high_unemployment'] == True]

escapist_roi = high_unemp[high_unemp['is_escapist'] == True]['roi'].dropna()
non_escapist_roi = high_unemp[high_unemp['is_escapist'] == False]['roi'].dropna()

print("H2: Escapist genres are more successful during high unemployment")
print("=" * 60)
print(f"During high unemployment:")
print(f"  Escapist film count: {len(escapist_roi)}")
print(f"  Non-escapist film count: {len(non_escapist_roi)}")
print(f"\n  Escapist mean ROI: {escapist_roi.mean():.2f}")
print(f"  Non-escapist mean ROI: {non_escapist_roi.mean():.2f}")

In [None]:
# Mann-Whitney U test
stat, p_value = stats.mannwhitneyu(escapist_roi, non_escapist_roi, alternative='greater')

print(f"Mann-Whitney U Test:")
print(f"Test statistic: {stat:.2f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("\nWe reject the null hypothesis. Escapist genres are significantly more successful.")
else:
    print("\nWe fail to reject the null hypothesis. No significant difference found.")

In [None]:
# Visualization for H2
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Boxplot
plot_data = df_merged[df_merged['high_unemployment'] == True][['is_escapist', 'roi']].dropna()
plot_data = plot_data[plot_data['roi'].between(-1, 10)]

sns.boxplot(data=plot_data, x='is_escapist', y='roi', ax=axes[0])
axes[0].set_xticklabels(['Non-Escapist', 'Escapist'])
axes[0].set_ylabel('ROI')
axes[0].set_title('ROI During High Unemployment: Escapist vs Non-Escapist')

# Escapist ratio by unemployment
yearly_escapist = df_merged.groupby('title_year').agg({
    'is_escapist': 'mean',
    'unemployment': 'first'
}).dropna()

axes[1].scatter(yearly_escapist['unemployment'], yearly_escapist['is_escapist'])
axes[1].set_xlabel('Unemployment Rate (%)')
axes[1].set_ylabel('Escapist Film Ratio')
axes[1].set_title('Unemployment vs Escapist Film Ratio')

plt.tight_layout()
plt.show()

### 5.3 H3: ROI distribution is different during the 2008 crisis

**Null Hypothesis (H0):** ROI distribution during 2008-2009 is the same as other years.

**Alternative Hypothesis (H1):** ROI distribution during 2008-2009 is significantly different.

In [None]:
# Compare 2008-2009 vs other years
crisis_years = [2008, 2009]
roi_crisis = df_movies_clean[df_movies_clean['title_year'].isin(crisis_years)]['roi'].dropna()
roi_other = df_movies_clean[~df_movies_clean['title_year'].isin(crisis_years)]['roi'].dropna()

print("H3: ROI distribution is different during the 2008 crisis")
print("=" * 60)
print(f"2008-2009 film count: {len(roi_crisis)}")
print(f"Other years film count: {len(roi_other)}")
print(f"\n2008-2009 mean ROI: {roi_crisis.mean():.2f}")
print(f"2008-2009 median ROI: {roi_crisis.median():.2f}")
print(f"\nOther years mean ROI: {roi_other.mean():.2f}")
print(f"Other years median ROI: {roi_other.median():.2f}")

In [None]:
# Independent t-test
t_stat, p_value_t = stats.ttest_ind(roi_crisis, roi_other)

print(f"Independent t-test:")
print(f"t-statistic: {t_stat:.2f}")
print(f"P-value: {p_value_t:.4f}")

# Mann-Whitney U test
u_stat, p_value_u = stats.mannwhitneyu(roi_crisis, roi_other)

print(f"\nMann-Whitney U Test:")
print(f"U-statistic: {u_stat:.2f}")
print(f"P-value: {p_value_u:.4f}")

if p_value_u < 0.05:
    print("\nWe reject the null hypothesis. ROI distributions are significantly different.")
else:
    print("\nWe fail to reject the null hypothesis. No significant difference found.")

In [None]:
# Visualization for H3
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram comparison
axes[0].hist(roi_crisis[roi_crisis.between(-1, 10)], bins=30, alpha=0.7, label='2008-2009', density=True)
axes[0].hist(roi_other[roi_other.between(-1, 10)], bins=30, alpha=0.7, label='Other Years', density=True)
axes[0].set_xlabel('ROI')
axes[0].set_ylabel('Density')
axes[0].set_title('ROI Distribution: 2008-2009 vs Other Years')
axes[0].legend()

# Boxplot
df_movies_clean['is_crisis'] = df_movies_clean['title_year'].isin(crisis_years)
plot_data = df_movies_clean[df_movies_clean['roi'].between(-1, 10)]

sns.boxplot(data=plot_data, x='is_crisis', y='roi', ax=axes[1])
axes[1].set_xticklabels(['Other Years', '2008-2009'])
axes[1].set_ylabel('ROI')
axes[1].set_title('ROI Comparison')

plt.tight_layout()
plt.show()

### 5.4 H4: Positive correlation between country GDP and movie budget

**Null Hypothesis (H0):** There is no correlation between country GDP per capita and average movie budget.

**Alternative Hypothesis (H1):** There is a positive correlation between country GDP per capita and average movie budget.

In [None]:
# Calculate country-level statistics
country_stats = df_merged.groupby('country').agg({
    'budget': 'mean',
    'gdp_per_capita': 'mean',
    'movie_title': 'count'
}).rename(columns={'movie_title': 'film_count'})

# Keep countries with at least 5 films
country_stats = country_stats[country_stats['film_count'] >= 5].dropna()

print("H4: Positive correlation between country GDP and movie budget")
print("=" * 60)
print(f"Number of countries: {len(country_stats)}")

In [None]:
# Pearson correlation
corr, p_value = stats.pearsonr(country_stats['gdp_per_capita'], country_stats['budget'])

print(f"Pearson Correlation Coefficient: {corr:.3f}")
print(f"P-value: {p_value:.4f}")

# Spearman correlation
corr_s, p_value_s = stats.spearmanr(country_stats['gdp_per_capita'], country_stats['budget'])

print(f"\nSpearman Correlation Coefficient: {corr_s:.3f}")
print(f"P-value: {p_value_s:.4f}")

if p_value < 0.05 and corr > 0:
    print("\nWe reject the null hypothesis. There is a significant positive correlation.")
else:
    print("\nWe fail to reject the null hypothesis.")

In [None]:
# Visualization for H4
plt.figure(figsize=(12, 8))

plt.scatter(country_stats['gdp_per_capita'], country_stats['budget']/1e6, 
            s=country_stats['film_count']*3, alpha=0.6)

# Add country labels for major countries
for idx, row in country_stats.iterrows():
    if row['film_count'] > 20:
        plt.annotate(idx, (row['gdp_per_capita'], row['budget']/1e6), fontsize=9)

# Trend line
z = np.polyfit(country_stats['gdp_per_capita'], country_stats['budget']/1e6, 1)
p = np.poly1d(z)
x_line = np.linspace(country_stats['gdp_per_capita'].min(), country_stats['gdp_per_capita'].max(), 100)
plt.plot(x_line, p(x_line), 'r--', alpha=0.8, label=f'Trend (r={corr:.2f})')

plt.xlabel('GDP per Capita ($)')
plt.ylabel('Average Movie Budget (Million $)')
plt.title('Country GDP vs Average Movie Budget\n(Circle size = number of films)')
plt.legend()
plt.tight_layout()
plt.show()

## 6. Conclusion

### Summary of Hypothesis Tests

| Hypothesis | Test Used | Result |
|------------|-----------|--------|
| H1: Low-budget films have higher ROI during recession | Mann-Whitney U | [Result] |
| H2: Escapist genres more successful during high unemployment | Mann-Whitney U | [Result] |
| H3: ROI distribution different during 2008 crisis | t-test, Mann-Whitney U | [Result] |
| H4: GDP correlates with movie budget | Pearson, Spearman | [Result] |

### Key Findings

1. [Finding 1]
2. [Finding 2]
3. [Finding 3]

### Limitations

- The movie data is heavily weighted towards USA films
- Economic data may be missing for some countries
- ROI calculation only includes box office revenue, not other sources (DVD, streaming)

## References

1. **IMDB 5000 Movie Dataset** - Kaggle: https://www.kaggle.com/datasets/carolzhangdc/imdb-5000-movie-dataset
2. **World Development Indicators** - World Bank: https://databank.worldbank.org/source/world-development-indicators