![alt text](../movie-3057394_1280.jpg)

## Business Understanding

The objective of this project is to analyze historical movie data to generate actionable insights for a new movie studio venture. Specifically, the analysis aims to identify the key factors that contribute to the commercial success of movies, focusing on profitability, audience reception, and production efficiency. This will help the studio make data-driven decisions regarding budget allocation, genre selection, and release strategies to maximize return on investment (ROI) and minimize financial risks.

## Key Business Questions:

- Which genres consistently generate the highest revenue?
- How do budgets correlate with worldwide gross?
- What are the most profitable release windows for movies?
- Which studios have the highest profit margins?
- Does the original language of a movie influence its global performance?
- Can I use movie attributes to predict revenue?

## Dataframes for Analysis

Based on the analysis goals, the following datasets will be most useful for deriving insights:

 1. **Box Office Mojo Data**  
- **Key Variables**: `title`, `studio`, `domestic_gross`, `foreign_gross`, `year`  
- **Usage**: Analyze box office performance, studio performance, and trends in domestic vs. international earnings.

 2. **The Numbers Data**  
- **Key Variables**: `movie`, `production_budget`, `domestic_gross`, `worldwide_gross`, `release_date`  
- **Usage**: Analyze the correlation between production budgets and box office revenues to assess profitability.

 3. **Rotten Tomatoes Movie Info Data**  
- **Key Variables**: `rating`, `genre`, `director`, `runtime`, `box_office`  
- **Usage**: Analyze how different factors like genre, director, and runtime impact box office performance.

 4. **TheMovieDB Data**  
- **Key Variables**: `title`, `popularity`, `vote_average`, `vote_count`, `release_date`  
- **Usage**: Investigate how popularity, audience ratings, and vote counts correlate with box office success.
<img src="../movie_data_erd.jpeg" alt="Movie Data ERD" width="400"/>

 5. **im.db.zip**
  * Zipped SQLite database (you will need to unzip then query using SQLite)
  * `movie_basics` and `movie_ratings` tables are most relevant

## Data preparation

In [1]:

# Import libraries
# Data manipulation and analysis
import pandas as pd  # pandas is used for handling and processing data in DataFrame structures
import numpy as np  # numpy is useful for numerical computations and handling arrays
import gzip  # gzip is for handling compressed files

# Data visualization
import matplotlib.pyplot as plt  # matplotlib is used for creating static, interactive, and animated visualizations
import seaborn as sns  # seaborn provides a high-level interface for drawing attractive statistical graphics

# Database interaction
import sqlite3  # sqlite3 is used to connect to SQLite databases
import nbconvert  # nbconvert is used to convert Jupyter Notebooks into various formats
import os
import re

# Set visualization style
sns.set_theme(style="whitegrid")



### 1. **Box Office Mojo Data**

In [2]:
# Define the path to your raw zipped data
file_path = 'C:/Users/USER/Desktop/Movie-Project/data/raw/zippedData/bom.movie_gross.csv.gz'

# Load the gzipped CSV directly
bom_gross = pd.read_csv(file_path, compression='gzip')

# Display the first few rows of the data
display(bom_gross.head())
bom_gross.dtypes

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year
0,Toy Story 3,BV,415000000.0,652000000,2010
1,Alice in Wonderland (2010),BV,334200000.0,691300000,2010
2,Harry Potter and the Deathly Hallows Part 1,WB,296000000.0,664300000,2010
3,Inception,WB,292600000.0,535700000,2010
4,Shrek Forever After,P/DW,238700000.0,513900000,2010


title              object
studio             object
domestic_gross    float64
foreign_gross      object
year                int64
dtype: object

### 2. **The Numbers Data**

In [3]:
# Load The Numbers (movie budgets) dataset
tn_budgets = pd.read_csv('C:/Users/USER/Desktop/Movie-Project/data/raw/zippedData/tn.movie_budgets.csv.gz', compression='gzip') 
print("The Numbers Data:")
display(tn_budgets.head())  # Display the first few rows
print(tn_budgets.info())  # Get an overview of the dataset

The Numbers Data:


Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross
0,1,"Dec 18, 2009",Avatar,"$425,000,000","$760,507,625","$2,776,345,279"
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,"$410,600,000","$241,063,875","$1,045,663,875"
2,3,"Jun 7, 2019",Dark Phoenix,"$350,000,000","$42,762,350","$149,762,350"
3,4,"May 1, 2015",Avengers: Age of Ultron,"$330,600,000","$459,005,868","$1,403,013,963"
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,"$317,000,000","$620,181,382","$1,316,721,747"


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5782 entries, 0 to 5781
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   id                 5782 non-null   int64 
 1   release_date       5782 non-null   object
 2   movie              5782 non-null   object
 3   production_budget  5782 non-null   object
 4   domestic_gross     5782 non-null   object
 5   worldwide_gross    5782 non-null   object
dtypes: int64(1), object(5)
memory usage: 271.2+ KB
None


### 3. **Rotten Tomatoes Movie Info Data**

In [4]:
# Load Rotten Tomatoes Reviews dataset
rt_reviews = pd.read_csv('C:/Users/USER/Desktop/Movie-Project/data/raw/zippedData/rt.reviews.tsv.gz', compression='gzip', sep='\t', encoding='latin-1') 
print("Rotten Tomatoes Reviews Data:")
display(rt_reviews.head(), "\n")  # Display the first few rows
print(rt_reviews.info())  # Get an overview of the dataset

Rotten Tomatoes Reviews Data:


Unnamed: 0,id,review,rating,fresh,critic,top_critic,publisher,date
0,3,A distinctly gallows take on contemporary fina...,3/5,fresh,PJ Nabarro,0,Patrick Nabarro,"November 10, 2018"
1,3,It's an allegory in search of a meaning that n...,,rotten,Annalee Newitz,0,io9.com,"May 23, 2018"
2,3,... life lived in a bubble in financial dealin...,,fresh,Sean Axmaker,0,Stream on Demand,"January 4, 2018"
3,3,Continuing along a line introduced in last yea...,,fresh,Daniel Kasman,0,MUBI,"November 16, 2017"
4,3,... a perverse twist on neorealism...,,fresh,,0,Cinema Scope,"October 12, 2017"


'\n'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54432 entries, 0 to 54431
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          54432 non-null  int64 
 1   review      48869 non-null  object
 2   rating      40915 non-null  object
 3   fresh       54432 non-null  object
 4   critic      51710 non-null  object
 5   top_critic  54432 non-null  int64 
 6   publisher   54123 non-null  object
 7   date        54432 non-null  object
dtypes: int64(2), object(6)
memory usage: 3.3+ MB
None


### 4. **TheMovieDB Data**

In [5]:
# Load Rotten Tomatoes Movie Info dataset
rt_info = pd.read_csv('C:/Users/USER/Desktop/Movie-Project/data/raw/zippedData/rt.movie_info.tsv.gz', compression='gzip', sep='\t') 
print("Rotten Tomatoes Movie Info Data:")
display(rt_info.head())  # Display the first few rows
print(rt_info.info())  # Get an overview of the dataset

Rotten Tomatoes Movie Info Data:


Unnamed: 0,id,synopsis,rating,genre,director,writer,theater_date,dvd_date,currency,box_office,runtime,studio
0,1,"This gritty, fast-paced, and innovative police...",R,Action and Adventure|Classics|Drama,William Friedkin,Ernest Tidyman,"Oct 9, 1971","Sep 25, 2001",,,104 minutes,
1,3,"New York City, not-too-distant-future: Eric Pa...",R,Drama|Science Fiction and Fantasy,David Cronenberg,David Cronenberg|Don DeLillo,"Aug 17, 2012","Jan 1, 2013",$,600000.0,108 minutes,Entertainment One
2,5,Illeana Douglas delivers a superb performance ...,R,Drama|Musical and Performing Arts,Allison Anders,Allison Anders,"Sep 13, 1996","Apr 18, 2000",,,116 minutes,
3,6,Michael Douglas runs afoul of a treacherous su...,R,Drama|Mystery and Suspense,Barry Levinson,Paul Attanasio|Michael Crichton,"Dec 9, 1994","Aug 27, 1997",,,128 minutes,
4,7,,NR,Drama|Romance,Rodney Bennett,Giles Cooper,,,,,200 minutes,


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1560 entries, 0 to 1559
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   id            1560 non-null   int64 
 1   synopsis      1498 non-null   object
 2   rating        1557 non-null   object
 3   genre         1552 non-null   object
 4   director      1361 non-null   object
 5   writer        1111 non-null   object
 6   theater_date  1201 non-null   object
 7   dvd_date      1201 non-null   object
 8   currency      340 non-null    object
 9   box_office    340 non-null    object
 10  runtime       1530 non-null   object
 11  studio        494 non-null    object
dtypes: int64(1), object(11)
memory usage: 146.4+ KB
None


### 5. **tmdb.movies**

In [6]:
# Load TMDB dataset
tmdb_movies = pd.read_csv('C:/Users/USER/Desktop/Movie-Project/data/raw/zippedData/tmdb.movies.csv.gz', compression='gzip') 
print("TheMovieDB Data:")
print(tmdb_movies.info())  # Get an overview of the dataset
display(tmdb_movies.head(), "\n")  # Display the first few rows

TheMovieDB Data:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26517 entries, 0 to 26516
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Unnamed: 0         26517 non-null  int64  
 1   genre_ids          26517 non-null  object 
 2   id                 26517 non-null  int64  
 3   original_language  26517 non-null  object 
 4   original_title     26517 non-null  object 
 5   popularity         26517 non-null  float64
 6   release_date       26517 non-null  object 
 7   title              26517 non-null  object 
 8   vote_average       26517 non-null  float64
 9   vote_count         26517 non-null  int64  
dtypes: float64(2), int64(3), object(5)
memory usage: 2.0+ MB
None


Unnamed: 0.1,Unnamed: 0,genre_ids,id,original_language,original_title,popularity,release_date,title,vote_average,vote_count
0,0,"[12, 14, 10751]",12444,en,Harry Potter and the Deathly Hallows: Part 1,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788
1,1,"[14, 12, 16, 10751]",10191,en,How to Train Your Dragon,28.734,2010-03-26,How to Train Your Dragon,7.7,7610
2,2,"[12, 28, 878]",10138,en,Iron Man 2,28.515,2010-05-07,Iron Man 2,6.8,12368
3,3,"[16, 35, 10751]",862,en,Toy Story,28.005,1995-11-22,Toy Story,7.9,10174
4,4,"[28, 878, 12]",27205,en,Inception,27.92,2010-07-16,Inception,8.3,22186


'\n'

### **6.im.db.zip** 
  * Zipped SQLite database 

In [7]:
# Path to the SQL database file
db_path = 'C:/Users/USER/Desktop/dsc-phase-2-project-v3-main/unzipped/im.db'

# Connecting to the database
conn = sqlite3.connect(db_path)

# Load tables from the database
movie_basics = pd.read_sql_query("SELECT * FROM movie_basics", conn)
movie_ratings = pd.read_sql_query("SELECT * FROM movie_ratings", conn)

In [8]:
display(movie_basics.head())  # Display the first few rows
print(movie_basics.info())  # Get an overview of the dataset

Unnamed: 0,movie_id,primary_title,original_title,start_year,runtime_minutes,genres
0,tt0063540,Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama"
1,tt0066787,One Day Before the Rainy Season,Ashad Ka Ek Din,2019,114.0,"Biography,Drama"
2,tt0069049,The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama
3,tt0069204,Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama"
4,tt0100275,The Wandering Soap Opera,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy"


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146144 entries, 0 to 146143
Data columns (total 6 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   movie_id         146144 non-null  object 
 1   primary_title    146144 non-null  object 
 2   original_title   146123 non-null  object 
 3   start_year       146144 non-null  int64  
 4   runtime_minutes  114405 non-null  float64
 5   genres           140736 non-null  object 
dtypes: float64(1), int64(1), object(4)
memory usage: 6.7+ MB
None


In [9]:
display(movie_ratings.head())  # Display the first few rows
print(movie_ratings.info())  # Get an overview of the dataset

Unnamed: 0,movie_id,averagerating,numvotes
0,tt10356526,8.3,31
1,tt10384606,8.9,559
2,tt1042974,6.4,20
3,tt1043726,4.2,50352
4,tt1060240,6.5,21


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 73856 entries, 0 to 73855
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   movie_id       73856 non-null  object 
 1   averagerating  73856 non-null  float64
 2   numvotes       73856 non-null  int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 1.7+ MB
None


<img src="https://media.giphy.com/media/l4RKhOL0xiBdbgglFi/giphy.gif" alt="Excited GIF" width="400"/>

I will proceed to data cleaning


## Data Cleaning

### 1.1 **Box Office Mojo Data**

In [10]:

# Convert 'foreign_gross' to numeric by removing non-numeric characters
bom_gross['foreign_gross'] = bom_gross['foreign_gross'].replace('[^0-9]', '', regex=True).astype(float)


# Convert 'year' column to integer type, handling non-convertible values
bom_gross['year'] = pd.to_numeric(bom_gross['year'], errors='coerce').astype('Int64')

# Drop duplicate rows
bom_gross = bom_gross.drop_duplicates()

# Remove all rows with any NaN values
bom_gross.dropna(inplace=True)

# Display the first few rows of the data
display(bom_gross.head())

# Check for missing values
print(bom_gross.isnull().sum())

# Display dataset info
bom_gross.info()

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year
0,Toy Story 3,BV,415000000.0,652000000.0,2010
1,Alice in Wonderland (2010),BV,334200000.0,691300000.0,2010
2,Harry Potter and the Deathly Hallows Part 1,WB,296000000.0,664300000.0,2010
3,Inception,WB,292600000.0,535700000.0,2010
4,Shrek Forever After,P/DW,238700000.0,513900000.0,2010


title             0
studio            0
domestic_gross    0
foreign_gross     0
year              0
dtype: int64
<class 'pandas.core.frame.DataFrame'>
Index: 2007 entries, 0 to 3353
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   title           2007 non-null   object 
 1   studio          2007 non-null   object 
 2   domestic_gross  2007 non-null   float64
 3   foreign_gross   2007 non-null   float64
 4   year            2007 non-null   Int64  
dtypes: Int64(1), float64(2), object(2)
memory usage: 96.0+ KB


### 1.2. **The Numbers Data**

In [11]:
# Remove '$' and ',' from financial columns and convert them to numeric
for col in ['production_budget', 'domestic_gross', 'worldwide_gross']:
    tn_budgets[col] = tn_budgets[col].replace(r'[\$,]', '', regex=True).astype(float)

# Convert 'release_date' to datetime
tn_budgets['release_date'] = pd.to_datetime(tn_budgets['release_date'], errors='coerce')

# Display the first few rows of the data
display(tn_budgets.head())

# Check for missing values
print(tn_budgets.isnull().sum())

# Display dataset info
tn_budgets.info()



Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross
0,1,2009-12-18,Avatar,425000000.0,760507625.0,2776345000.0
1,2,2011-05-20,Pirates of the Caribbean: On Stranger Tides,410600000.0,241063875.0,1045664000.0
2,3,2019-06-07,Dark Phoenix,350000000.0,42762350.0,149762400.0
3,4,2015-05-01,Avengers: Age of Ultron,330600000.0,459005868.0,1403014000.0
4,5,2017-12-15,Star Wars Ep. VIII: The Last Jedi,317000000.0,620181382.0,1316722000.0


id                   0
release_date         0
movie                0
production_budget    0
domestic_gross       0
worldwide_gross      0
dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5782 entries, 0 to 5781
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   id                 5782 non-null   int64         
 1   release_date       5782 non-null   datetime64[ns]
 2   movie              5782 non-null   object        
 3   production_budget  5782 non-null   float64       
 4   domestic_gross     5782 non-null   float64       
 5   worldwide_gross    5782 non-null   float64       
dtypes: datetime64[ns](1), float64(3), int64(1), object(1)
memory usage: 271.2+ KB


### 1.3. **TMDB Dataset**

In [12]:
# Drop the unnecessary 'Unnamed: 0' & 'genre_ids'columns 
tmdb_movies.drop(columns=['Unnamed: 0','genre_ids'], inplace=True)


# Convert 'release_date' to datetime 
tmdb_movies['release_date'] = pd.to_datetime(tmdb_movies['release_date'], errors='coerce')

# Display the first few rows 
display(tmdb_movies.head())

# Check for missing values 
print(tmdb_movies.isnull().sum())

# Display dataset info 
tmdb_movies.info()


Unnamed: 0,id,original_language,original_title,popularity,release_date,title,vote_average,vote_count
0,12444,en,Harry Potter and the Deathly Hallows: Part 1,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788
1,10191,en,How to Train Your Dragon,28.734,2010-03-26,How to Train Your Dragon,7.7,7610
2,10138,en,Iron Man 2,28.515,2010-05-07,Iron Man 2,6.8,12368
3,862,en,Toy Story,28.005,1995-11-22,Toy Story,7.9,10174
4,27205,en,Inception,27.92,2010-07-16,Inception,8.3,22186


id                   0
original_language    0
original_title       0
popularity           0
release_date         0
title                0
vote_average         0
vote_count           0
dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26517 entries, 0 to 26516
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   id                 26517 non-null  int64         
 1   original_language  26517 non-null  object        
 2   original_title     26517 non-null  object        
 3   popularity         26517 non-null  float64       
 4   release_date       26517 non-null  datetime64[ns]
 5   title              26517 non-null  object        
 6   vote_average       26517 non-null  float64       
 7   vote_count         26517 non-null  int64         
dtypes: datetime64[ns](1), float64(2), int64(2), object(3)
memory usage: 1.6+ MB


### 1.3. **Rotten Tomatoes Reviews Data**

In [13]:
# Drop rows where `review` or `rating` is missing
rt_reviews.dropna(subset=['review', 'rating', 'critic'], inplace=True)

# Fill missing `critic` and `publisher` with "Unknown"
rt_reviews['critic'] = rt_reviews['critic'].fillna('Unknown')
rt_reviews['publisher'] = rt_reviews['publisher'].fillna('Unknown')


# Convert `date` column to datetime
rt_reviews['date'] = pd.to_datetime(rt_reviews['date'], errors='coerce')

# Convert Data Types
# Parse `rating` to extract numeric scores (e.g., '3/5' -> 3.0)
def parse_rating(rating):
    try:
        return float(rating.split('/')[0]) if '/' in rating else None
    except:
        return None

rt_reviews['rating'] = rt_reviews['rating'].apply(parse_rating)

# Drop rows with missing  `rating_score`
rt_reviews.dropna(subset=['rating'], inplace=True)

# Remove Duplicates
rt_reviews.drop_duplicates(inplace=True)

# Rename Columns to snake_case
rt_reviews.rename(columns={
    'review': 'review_text',
    'rating': 'rating_score',
    'fresh': 'is_fresh',
    'critic': 'critic_name',
    'top_critic': 'is_top_critic',
    'publisher': 'publisher_name',
    'date': 'review_date'
}, inplace=True)

print("Rotten Tomatoes Reviews Data:")
print(rt_reviews.info())  # Get an overview of the dataset
display(rt_reviews.head(), "\n")  # Display the first few rows


Rotten Tomatoes Reviews Data:
<class 'pandas.core.frame.DataFrame'>
Index: 27745 entries, 0 to 54424
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   id              27745 non-null  int64         
 1   review_text     27745 non-null  object        
 2   rating_score    27745 non-null  float64       
 3   is_fresh        27745 non-null  object        
 4   critic_name     27745 non-null  object        
 5   is_top_critic   27745 non-null  int64         
 6   publisher_name  27745 non-null  object        
 7   review_date     27745 non-null  datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(2), object(4)
memory usage: 1.9+ MB
None


Unnamed: 0,id,review_text,rating_score,is_fresh,critic_name,is_top_critic,publisher_name,review_date
0,3,A distinctly gallows take on contemporary fina...,3.0,fresh,PJ Nabarro,0,Patrick Nabarro,2018-11-10
7,3,Cronenberg is not a director to be daunted by ...,2.0,rotten,Matt Kelemen,0,Las Vegas CityLife,2013-04-21
12,3,Robert Pattinson works mighty hard to make Cos...,2.0,rotten,Christian Toto,0,Big Hollywood,2013-01-15
14,3,For those who like their Cronenberg thick and ...,3.0,fresh,Marty Mapes,0,Movie Habit,2012-10-20
15,3,For better or worse - often both - Cosmopolis ...,3.0,fresh,Adam Ross,0,The Aristocrat,2012-09-27


'\n'

In [18]:
### 1.5. **Rotten Tomatoes Movie Info Data**

In [None]:
# Convert to datetime and coerce invalid dates to NaT
rt_info['theater_date'] = pd.to_datetime(rt_info['theater_date'], errors='coerce')
rt_info['dvd_date'] = pd.to_datetime(rt_info['dvd_date'], errors='coerce')

# Clean 'runtime' to extract numerical values
rt_info['runtime'] = rt_info['runtime'].str.extract(r'(\d+)').astype(float)  # Use a raw string


# Clean 'box_office' to extract numerical values
rt_info['box_office'] = rt_info['box_office'].replace(r'[\$,]', '', regex=True).astype(float)

# Drop all rows with any NaN values
rt_info.dropna(inplace=True)

# Display the cleaned dataset
print("After dropping all rows with NaN values:")
print(rt_info.info())
display(rt_info.head())



After dropping all rows with NaN values:
<class 'pandas.core.frame.DataFrame'>
Index: 235 entries, 1 to 1545
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   id            235 non-null    int64 
 1   synopsis      235 non-null    object
 2   rating        235 non-null    object
 3   genre         235 non-null    object
 4   director      235 non-null    object
 5   writer        235 non-null    object
 6   theater_date  235 non-null    object
 7   dvd_date      235 non-null    object
 8   currency      235 non-null    object
 9   box_office    235 non-null    object
 10  runtime       235 non-null    object
 11  studio        235 non-null    object
dtypes: int64(1), object(11)
memory usage: 23.9+ KB
None


Unnamed: 0,id,synopsis,rating,genre,director,writer,theater_date,dvd_date,currency,box_office,runtime,studio
1,3,"New York City, not-too-distant-future: Eric Pa...",R,Drama|Science Fiction and Fantasy,David Cronenberg,David Cronenberg|Don DeLillo,"Aug 17, 2012","Jan 1, 2013",$,600000,108 minutes,Entertainment One
6,10,Some cast and crew from NBC's highly acclaimed...,PG-13,Comedy,Jake Kasdan,Mike White,"Jan 11, 2002","Jun 18, 2002",$,41032915,82 minutes,Paramount Pictures
7,13,"Stewart Kane, an Irishman living in the Austra...",R,Drama,Ray Lawrence,Raymond Carver|Beatrix Christian,"Apr 27, 2006","Oct 2, 2007",$,224114,123 minutes,Sony Pictures Classics
15,22,Two-time Academy Award Winner Kevin Spacey giv...,R,Comedy|Drama|Mystery and Suspense,George Hickenlooper,Norman Snider,"Dec 17, 2010","Apr 5, 2011",$,1039869,108 minutes,ATO Pictures
18,25,"From ancient Japan's most enduring tale, the e...",PG-13,Action and Adventure|Drama|Science Fiction and...,Carl Erik Rinsch,Chris Morgan|Hossein Amini,"Dec 25, 2013","Apr 1, 2014",$,20518224,127 minutes,Universal Pictures
