# Predicting the formula for a Next Billion-Dollar Movie
<img src="images/Bishamon.jpg" alt="Drawing" style="width: 200px;"/>

##### So this is my thesis work: "Making an ML model to predict in cinematography". I already scrapped a data from the Wiki pages (top 10 films by gross from 1950 to 2025 years). Our main tasks is clean the data, provide analysis, make a ML classificator, make an API and deploy it to HTML page. Task is complited when the model can say what our film needs to get a Billion $ Gross
----------

## What are we exploring today?
- **Top directors and actors**: Find the most frequent directors and actors.
- **Trends over time**: Analyze trends in budget, box office revenue, and movie runtime by year.
- **Country-wise production**: Identify top countries producing the most films.
- **Budget vs. Box Office**: Identify patterns between film budgets and earnings.
- **Genre-based performance**: Genre impact on revenue.
----------

### Objective
##### Develop a machine learning model that predicts whether a new movie has the potential to earn over $1 billion at the global box office. 
------

##### Tasks
- Loading and first look at the data
    - Importing relevant libraries
    - Downloading dataset
    - Checking overall info
    - Looking for missing, wrong and bad values
- Data Cleaning
    - Handle missing values
    - Convert date fields
    - Normalize financial data
    - Split date
    - Split multi-value fields
- Analysis and visualization
    - Top directors and actors
    - Trends over time
    - Country-wise production
    - Budget vs. Box Office
    - Genre-based performance
- Processing and preparation data for modeling
    - Encoding for categorical values
    - Choosing a variables for the model, and making new
    - Normalization values
    - Balancing classes if they are not
- Classification creation to predict sleep disorder
    - Train/Test split
    - Trying different algorithms
    - Model scoring
- Optimization and choosing hyperparameters
    - Optimizing
    - Model productive scoring
- Conclusion
    - Making confusion matrix
    - Looking for feature importance
    - Conclusion: What Features affect?
-------

## Importing the libraries

In [2]:
# to work with Data
import pandas as pd
import numpy as np

# for visualization
import seaborn as sns
import matplotlib as plt

## Loading and look at the data

In [4]:
df = pd.read_csv('data/All_Movies_Info')
df.head()

Unnamed: 0,title,Directed by,Screenplay by,Based on,Produced by,Starring,Cinematography,Edited by,Music by,Color process,...,Box office,year,Release date,Narrated by,Story by,Written by,Production companies,Countries,Languages,Layouts by
0,Samson and Delilah,Cecil B. DeMille,"['Jesse L. Lasky Jr.', 'Fredric M. Frank', 'Ha...","['Samson the Nazirite , a novel by Vladimir Ja...",Cecil B. DeMille,"['Hedy Lamarr', 'Victor Mature', 'George Sande...",George Barnes,Anne Bauchens,Victor Young,Technicolor,...,$25.6 million [ 4 ],1950,,,,,,,,
1,King Solomon's Mines,"['Compton Bennett', 'Andrew Marton']",Helen Deutsch,King Solomon's Mines 1885 novel by H. Rider Ha...,Sam Zimbalist,"['Deborah Kerr', 'Stewart Granger', 'Richard C...",Robert Surtees,"['Ralph E. Winters', 'Conrad A. Nervig']",Mischa Spoliansky,,...,$15.1 million [ 3 ],1950,,,,,,,,
2,Annie Get Your Gun,George Sidney Busby Berkeley (uncredited) Char...,Sidney Sheldon,Annie Get Your Gun 1946 book by Dorothy Fields...,Arthur Freed Roger Edens,Betty Hutton Howard Keel Louis Calhern Keenan ...,Charles Rosher,James E. Newcom,Songs: (lyrics and music by) Irving Berlin Mus...,,...,"$7,756,000 [ 1 ]",1950,"['May 17, 1950 ( 1950-05-17 )']",,,,,,,
3,Cheaper by the Dozen,Walter Lang,Lamar Trotti,Cheaper by the Dozen 1948 novel by Ernestine G...,Lamar Trotti,Clifton Webb Jeanne Crain Myrna Loy Betty Lynn...,Leon Shamroy,James Watson Webb Jr.,Cyril J. Mockridge,Technicolor,...,$4.3-4.425 million (U.S. and Canada rentals) [...,1950,"['March 31, 1950 ( 1950-03-31 )']",Jeanne Crain,,,,,,
4,Cinderella,"['Wilfred Jackson', 'Hamilton Luske', 'Clyde G...",,""" Cinderella "" by Charles Perrault","['Walt Disney', 'Ben Sharpsteen']","['Ilene Woods', 'Eleanor Audley', 'Verna Felto...",,Donald Halliday,"['Oliver Wallace', 'Paul Smith']",,...,$182 million [ 2 ],1950,,,"['William Peet', 'Ted Sears', 'Homer Brightman...",,,,,
