<table><tr>
<td> <img src="https://upload.wikimedia.org/wikipedia/fr/thumb/e/e5/Logo_%C3%A9cole_des_ponts_paristech.svg/676px-Logo_%C3%A9cole_des_ponts_paristech.svg.png" width="200"  height="200" hspace="200"/> </td>
<td> <img src="https://pbs.twimg.com/profile_images/1156541928193896448/5ihYIbCQ_200x200.png" width="200" height="200" /> </td>
</tr></table>

<br/><br/><br/><br/>

<h1><center>Session 3 - Data Sourcing</center></h1>

<font size="3">
The goal of this session is to get and consolidate the data we need for our "first week" boxoffice prediction project. We will:

- first read sales data provided by one of our company department
- then look for additional data to complete our understanding of the movies involved
- finally consolidate all the data into a single database

</font>

# 1 - Movies and their first week boxoffice

The market study service from your company has shared with you a list of french movies released during the last 5 years, with their boxoffice results. The file is available through a Gdrive [link](https://drive.google.com/file/d/1Pxs-C6l47wjWfBvsuQxJePlY1AjFBC5R/view?usp=sharing).

### 1-1. Download the file

What kind of file format it is? 
Are the data included a priori easy to understand? 
Would you need further documentation to grasp their meaning?
What are the advantages and drawbacks of using these kinds of file and format?
What data do you need additionnally for your project?

> It is a json file. A priori the data is quite easy to understand and we get the order of magnitudes (number of theaters entries). There should not need for further documentation. 

> The json format helps dealing with less structured data, or at least hierarchichal data, while a csv file would have had a hard time representing nested dictionaries. This file is also static, so it will be costly to refresh if we decide to automate the boxoffice prediction.

> There is not so much information except an idea, the year, the sales and the movie title. Our algorithm would benefit from additional information (cast, genre, budget, duration, language, production country...)


### 1-2 Put the file in a proper location

Ideally, where would you like to place it?
Should you use `git`, would you push this file to remote? If not, how do you properly avoid to do it?

> We should put this file in a data folder in our repository. It is not that big but it is usually not recommended to version data files in git as the diff & conflict resolution algorithm may have a hard time finding and solving conflicts. To avoid versionning it, we can use a .gitignore file.

### 1-3. Open the file

How do you open `.json` files?
Write a simple function which will open any json file located at a `path` given as argument and return back the file content. Test it on our file. How can you verify it worked?


In [3]:
import json 
from typing import Union

def read_from_json(path: str) -> Union[dict, list]:
    '''
    Read and cast a json into a python object
    
    Parameters
    ----------
    path: str
        Path to json file

    Returns
    -------
    data: Union[dict, list]
        Json casted as python object
    '''
    with open(path, 'r') as infile:
        data = json.load(infile)
    return data

In [4]:
# Test it here
from pprint import pprint
boxoffice = read_from_json("../../data/french-box-office-29nov2020.json")
pprint(boxoffice[:10])

[{'first_day_sales': 630478,
  'first_screening_sales': None,
  'first_week_sales': 3252896,
  'first_weekend_sales': 2559370,
  'id': '17528',
  'max_theaters_used': 820,
  'rank': '1',
  'release_date': '2019-07-17',
  'title': 'Le Roi Lion (2019)',
  'total_sales': 10017995,
  'year': 2019},
 {'first_day_sales': 66164,
  'first_screening_sales': None,
  'first_week_sales': 786485,
  'first_weekend_sales': 548537,
  'id': '19073',
  'max_theaters_used': 575,
  'rank': '13',
  'release_date': '2019-10-16',
  'title': 'Maléfique : Le Pouvoir du Mal',
  'total_sales': 2718604,
  'year': 2019},
 {'first_day_sales': 370691,
  'first_screening_sales': None,
  'first_week_sales': 1261701,
  'first_weekend_sales': 1056143,
  'id': '18875',
  'max_theaters_used': 757,
  'rank': '12',
  'release_date': '2019-05-01',
  'title': 'Nous finirons ensemble',
  'total_sales': 2800004,
  'year': 2019},
 {'first_day_sales': 453503,
  'first_screening_sales': None,
  'first_week_sales': 1370178,
  'firs

# 2 - Movie features

We are still missing additional features that would help us our algorithms better predict movie theaters sales during the first week of a movie release. 

### 2-1. Source identification

Where could we find such information? Try to find ideas on the web.

> We could use the [movie db API](https://developers.themoviedb.org/3/getting-started/introduction).

### 2-2. Terms of contracts and usage

A priori, could we use the service provided by the website we have just found?
Take a moment to look at the API documentation. List all information we could retrieve which could be interesting for our project.
How would you do to:

1. extract the information you need. Think of the different functions you would need to perform this.
2. link back these features to our boxoffice data? Which feature from this dataset should we save along with the movie features?

> The service can be used freely as long as we attribute TMDb as the source of our data. Eventually if we make numerous and large calls to the API and use it commercially at one point, we might get in touch with the owners to further discuss terms of use.

> There are a lot of [movie](https://developers.themoviedb.org/3/movies/get-movie-details) features we could extract: cast, crew, plot keywords, budget, ... . The only difficulty is that we only have an id (not the same as TMDb one) and a movie title. As the GET /movie/ route needs a TMDb id to search for a specific film, we'll need to first search for a movie by title using the GET /search/movie route, then extract the Id and use it with the GET /movie route. 

> To properly perform this, we should loop through all the movies in our boxoffice dataset, use the title in our search and keep the boxoffice id to ease the merge of both datasets at the end.

### 2-3. Connectors

Create an account to proceed.
How would you connect to the API?

> We can use [tmdbv3api](https://github.com/AnthonyBloomer/tmdbv3api) package from Python. It will prevent us from recreating all the API calls, we'll just need to use its functions and provide the API key provided by TMDb after having created our account.

### 2-4. Data extraction

Write several function which will extract the information you need in a structured object. What kind of python object would you use? Test these functions on a movie of your choice.

In [5]:
from tmdbv3api import TMDb, Movie, Search
from typing import Union

class TMDbClient:

    def __init__(self, api_key: str):
        '''
        A custom client for The Movie Database API for French movies
        '''
        self.client = TMDb()
        self.client.api_key = api_key
        self.client.language = 'fr-FR'
        self.movie_db = Movie()
        self.search = Search()

    def find_movie_id(self, movie: str) -> Union[int, None]:
        '''
        Looks for a particular movie in TMDb

        Parameters
        ----------
        movie: str
            a movie title

        Returns
        -------
        id: Union[int, None]
            the most relevant movie id if we found one, or None
        '''
        results = self.search.movies({'query': movie})
        if len(results) > 0:
            return results[0].id

    def get_movie_details(self, movie_id: int):
        '''
        Get a movie main features

        Parameters
        ----------
        movie_id: int
            a movie id (the tmdb one)

        Returns
        -------
        details: MovieDetails
            A movie details
        '''
        details = self.movie_db.details(movie_id)
        details = unmarshal_details(details)
        return details

    def get_movie_cast(self, movie_id: int):
        '''
        Get a movie cast

        Parameters
        ----------
        movie_id: int
            a movie id (the tmdb one)

        Returns
        -------
        cast: MovieCast
            A movie cast
        '''
        credits = self.movie_db.credits(movie_id)
        cast = unmarshal_credits(credits)
        return cast

    def find_movie_features(self, movie: str):
        '''
        Find all relevant features (details and cast)
        given a movie title

        Parameters
        ----------
        movie: str
            a movie title

        Returns
        -------
        card: Union[None, MovieCard]
            A movie card. None if no movie was found
        '''
        movie_id = self.find_movie_id(movie)
        if movie_id:
            card = self.get_movie_details(movie_id)
            card['cast'] = self.get_movie_cast(movie_id)
            return card
        else:
            return None




def unmarshal_details(details: dict):
    '''
    Decompose a TMDb movie details response into a properly
    structured movie card

    Parameters
    ----------
    details: dict
        the movie details response

    Returns
    -------
    card: MovieDetails
        A movie card
    '''
    return {
        'tmdb_id': details.id,
        'adult': details.adult,
        'budget': details.budget,
        'imdb_id': details.imdb_id,
        'original_language': details.original_language,
        'original_title': details.original_title,
        'overview': details.overview,
        'tmdb_popularity': details.popularity,
        'release_date': details.release_date,
        'revenue': details.revenue,
        'runtime': details.runtime,
        'status': details.status,
        'tagline': details.tagline,
        'title': details.title,
        'tmdb_vote_count': details.vote_count,
        'tmdb_vote_average': details.vote_average
    }


def unmarshal_credits(credits: dict):
    '''
    Decompose a TMDb movie credits response into a properly
    structured movie cast

    Parameters
    ----------
    credits: dict
        the movie credits response

    Returns
    -------
    cast: MovieCast
        A list of actors
    '''
    return [
        {
            'adult': cast['adult'],
            'gender': cast['gender'],
            'tmdb_id': cast['id'],
            'name': cast['name'],
            'tmdb_popularity': cast['popularity'],
            'order': cast['order']
        }
    for cast in credits.cast ] if credits.cast else []


In [8]:
# Test them here
client = TMDbClient('api_key')
movie_details = client.find_movie_features('Titanic')

Exception: Invalid API key: You must be granted a valid key.

### 2-5. Run

Write the loop which will extract all the data required. Test it on a subset of 20 movies. No need to run it on the whole scope, we will avoid overloading the API servers and provide you with the end data directly.

In which format would you store the data? Where would you save it? Write a function to perform this operation and test it.

In [9]:
# Write the loop here
from tqdm import tqdm
sample = []
for movie in tqdm(boxoffice[:10]):
    sample.append(client.find_movie_features(movie['title']))
pprint(sample)

  0%|          | 0/10 [00:00<?, ?it/s]


Exception: Invalid API key: You must be granted a valid key.

In [None]:
def write_json_to_file(payload: Union[dict, list], path: str) -> None:
    '''
    Write a dict or list of dicts into a json file
    
    Parameters
    ----------
    path: str
        Path to file

    Returns
    -------
    None
    '''
    with open(path, 'w', encoding='utf-8') as f:
        json.dump(payload, f, ensure_ascii=False, indent=4)

In [None]:
write_json_to_file(sample, "../../data/movie_features_sample.json")

# 3 - Data consolidation

### 3-1. Next steps?

You can download the whole movies features database [here](https://drive.google.com/file/d/1hv_FCcJi--n0S9GXu75zS7hRGcCrPB6d/view?usp=sharing). What should we do next before moving to the data preparation and modelling parts?

> Consolidate all data into one single object would be nice.

### 3-2. Data reading

`pandas` (https://pandas.pydata.org) is very well known Python library to perform data analysis and manipulation. It mainly works leveraging two powerful objects: `series` and `DataFrame` (https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#intro-to-data-structures). For more information and a 10 min introduction you can watch this [video](https://www.youtube.com/watch?v=_T8LGqJtuGc&feature=emb_title).

Read again your movie features dataset. Cast it into a `DataFrame` (`pd.DataFrame()` [supports](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas-dataframe) taking into account list of dicts).

In [10]:
import pandas as pd
movies_features = read_from_json("../../data/movie-features-29nov2020.json")
movies_features_df = pd.DataFrame(movies_features)
movies_features_df.head()

Unnamed: 0,tmdb_id,adult,belongs_to_collection,budget,genres,imdb_id,original_language,original_title,overview,tmdb_popularity,...,runtime,languages,status,tagline,title,tmdb_vote_count,tmdb_vote_average,cast,id,query
0,420809,False,"{'id': 531331, 'name': 'Maléfique - Saga'}",185000000,"[{'id': 14, 'name': 'Fantastique'}, {'id': 107...",tt4777008,en,Maleficent: Mistress of Evil,Cinq années après la conjuration de la malédic...,184.959,...,110.0,"[{'iso_code': 'en', 'name': 'English'}]",Released,Aller au-delà du conte de fée.,Maléfique 2 Le pouvoir du mal,3904,7.4,"[{'adult': False, 'gender': 1, 'tmdb_id': 1170...",19073,Maléfique : Le Pouvoir du Mal
1,542830,False,"{'id': 629597, 'name': 'Les Petits Mouchoirs -...",0,"[{'id': 35, 'name': 'Comédie'}, {'id': 18, 'na...",tt8201404,fr,Nous finirons ensemble,"Préoccupé, Max est parti dans sa maison au bor...",10.84,...,135.0,"[{'iso_code': 'fr', 'name': 'Français'}]",Released,,Nous finirons ensemble,490,6.9,"[{'adult': False, 'gender': 2, 'tmdb_id': 3316...",18875,Nous finirons ensemble
2,429617,False,"{'id': 531241, 'name': 'Spider-Man (Avengers) ...",160000000,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",tt6320628,en,Spider-Man: Far from Home,Peter et ses amis passent leurs vacances d’été...,204.536,...,129.0,"[{'iso_code': 'cs', 'name': 'Český'}, {'iso_co...",Released,Il est temps de passer à l'action.,Spider-Man : Far from Home,8750,7.5,"[{'adult': False, 'gender': 2, 'tmdb_id': 1136...",18243,Spider-Man: Far from Home
3,512200,False,"{'id': 495527, 'name': 'Jumanji - Saga'}",125000000,"[{'id': 12, 'name': 'Aventure'}, {'id': 35, 'n...",tt7975244,en,Jumanji: The Next Level,"L’équipe est de retour, mais le jeu a changé. ...",237.001,...,123.0,"[{'iso_code': 'en', 'name': 'English'}]",Released,Bienvenue à Jumanji !,Jumanji: Next Level,5056,7.0,"[{'adult': False, 'gender': 2, 'tmdb_id': 1891...",18258,Jumanji: next level
4,166428,False,"{'id': 89137, 'name': 'Dragons - Saga'}",129000000,"[{'id': 16, 'name': 'Animation'}, {'id': 10751...",tt2386490,en,How to Train Your Dragon: The Hidden World,Ce qui avait commencé comme une amitié improba...,113.912,...,104.0,"[{'iso_code': 'en', 'name': 'English'}]",Released,Une amitié légendaire,Dragons 3 : Le monde caché,4107,7.8,"[{'adult': False, 'gender': 2, 'tmdb_id': 449,...",18167,Dragons 3 : Le monde caché


### 3-3. Data manipulation

`pandas` and the affiliated `DataFrame` come with a lot of handy features. Here is a non exhaustive list. Take a few minutes to test them on your new `DataFrame`.

- `DataFrame.columns` (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.columns.html)
- `DataFrame.dtypes` (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html)
- `DataFrame.head` (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html)
- `DataFrame.tail` (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html)
- `DataFrame.size` (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.size.html)
- `DataFrame.shape` (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html)
- `DataFrame.loc` (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html)
- `DataFrame.iloc` (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html)
- `DataFrame.query` (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html)


In [11]:
# dtypes
movies_features_df.dtypes

tmdb_id                    int64
adult                       bool
belongs_to_collection     object
budget                     int64
genres                    object
imdb_id                   object
original_language         object
original_title            object
overview                  object
tmdb_popularity          float64
production_companies      object
production_countries      object
release_date              object
revenue                    int64
runtime                  float64
languages                 object
status                    object
tagline                   object
title                     object
tmdb_vote_count            int64
tmdb_vote_average        float64
cast                      object
id                        object
query                     object
dtype: object

In [12]:
# columns
movies_features_df.columns

Index(['tmdb_id', 'adult', 'belongs_to_collection', 'budget', 'genres',
       'imdb_id', 'original_language', 'original_title', 'overview',
       'tmdb_popularity', 'production_companies', 'production_countries',
       'release_date', 'revenue', 'runtime', 'languages', 'status', 'tagline',
       'title', 'tmdb_vote_count', 'tmdb_vote_average', 'cast', 'id', 'query'],
      dtype='object')

In [13]:
# head
movies_features_df.head()

Unnamed: 0,tmdb_id,adult,belongs_to_collection,budget,genres,imdb_id,original_language,original_title,overview,tmdb_popularity,...,runtime,languages,status,tagline,title,tmdb_vote_count,tmdb_vote_average,cast,id,query
0,420809,False,"{'id': 531331, 'name': 'Maléfique - Saga'}",185000000,"[{'id': 14, 'name': 'Fantastique'}, {'id': 107...",tt4777008,en,Maleficent: Mistress of Evil,Cinq années après la conjuration de la malédic...,184.959,...,110.0,"[{'iso_code': 'en', 'name': 'English'}]",Released,Aller au-delà du conte de fée.,Maléfique 2 Le pouvoir du mal,3904,7.4,"[{'adult': False, 'gender': 1, 'tmdb_id': 1170...",19073,Maléfique : Le Pouvoir du Mal
1,542830,False,"{'id': 629597, 'name': 'Les Petits Mouchoirs -...",0,"[{'id': 35, 'name': 'Comédie'}, {'id': 18, 'na...",tt8201404,fr,Nous finirons ensemble,"Préoccupé, Max est parti dans sa maison au bor...",10.84,...,135.0,"[{'iso_code': 'fr', 'name': 'Français'}]",Released,,Nous finirons ensemble,490,6.9,"[{'adult': False, 'gender': 2, 'tmdb_id': 3316...",18875,Nous finirons ensemble
2,429617,False,"{'id': 531241, 'name': 'Spider-Man (Avengers) ...",160000000,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",tt6320628,en,Spider-Man: Far from Home,Peter et ses amis passent leurs vacances d’été...,204.536,...,129.0,"[{'iso_code': 'cs', 'name': 'Český'}, {'iso_co...",Released,Il est temps de passer à l'action.,Spider-Man : Far from Home,8750,7.5,"[{'adult': False, 'gender': 2, 'tmdb_id': 1136...",18243,Spider-Man: Far from Home
3,512200,False,"{'id': 495527, 'name': 'Jumanji - Saga'}",125000000,"[{'id': 12, 'name': 'Aventure'}, {'id': 35, 'n...",tt7975244,en,Jumanji: The Next Level,"L’équipe est de retour, mais le jeu a changé. ...",237.001,...,123.0,"[{'iso_code': 'en', 'name': 'English'}]",Released,Bienvenue à Jumanji !,Jumanji: Next Level,5056,7.0,"[{'adult': False, 'gender': 2, 'tmdb_id': 1891...",18258,Jumanji: next level
4,166428,False,"{'id': 89137, 'name': 'Dragons - Saga'}",129000000,"[{'id': 16, 'name': 'Animation'}, {'id': 10751...",tt2386490,en,How to Train Your Dragon: The Hidden World,Ce qui avait commencé comme une amitié improba...,113.912,...,104.0,"[{'iso_code': 'en', 'name': 'English'}]",Released,Une amitié légendaire,Dragons 3 : Le monde caché,4107,7.8,"[{'adult': False, 'gender': 2, 'tmdb_id': 449,...",18167,Dragons 3 : Le monde caché


In [14]:
# tail
movies_features_df.tail()

Unnamed: 0,tmdb_id,adult,belongs_to_collection,budget,genres,imdb_id,original_language,original_title,overview,tmdb_popularity,...,runtime,languages,status,tagline,title,tmdb_vote_count,tmdb_vote_average,cast,id,query
7048,281338,False,"{'id': 173710, 'name': 'La Planète des Singes ...",150000000,"[{'id': 18, 'name': 'Drame'}, {'id': 878, 'nam...",tt3450958,en,War for the Planet of the Apes,César et les Singes sont contraints de mener u...,50.468,...,142.0,"[{'iso_code': 'en', 'name': 'English'}]",Released,Pour l'humanité. Pour l'espoir. Pour la planète.,La Planète des singes : Suprématie,6391,7.1,"[{'adult': False, 'gender': 2, 'tmdb_id': 1333...",15748,La Planète des Singes - Suprématie
7049,260514,False,"{'id': 87118, 'name': 'Cars - Saga'}",175000000,"[{'id': 12, 'name': 'Aventure'}, {'id': 16, 'n...",tt3606752,en,Cars 3,Dépassé par une nouvelle génération de bolides...,88.435,...,109.0,"[{'iso_code': 'en', 'name': 'English'}]",Released,La suite des aventures de Flash McQueen,Cars 3,3663,6.8,"[{'adult': False, 'gender': 2, 'tmdb_id': 887,...",16452,Cars 3
7050,432068,False,{},17200000,"[{'id': 35, 'name': 'Comédie'}]",tt5699154,fr,Le Sens de la fête,Max est traiteur depuis trente ans. Des fêtes ...,7.793,...,117.0,"[{'iso_code': 'fr', 'name': 'Français'}]",Released,,Le Sens de la fête,969,6.9,"[{'adult': False, 'gender': 2, 'tmdb_id': 2828...",17386,Le Sens de la fête
7051,341174,False,"{'id': 344830, 'name': 'Cinquante nuances - Sa...",55000000,"[{'id': 18, 'name': 'Drame'}, {'id': 10749, 'n...",tt4465564,en,Fifty Shades Darker,Dépassée par les sombres secrets de Christian ...,32.518,...,118.0,"[{'iso_code': 'en', 'name': 'English'}]",Released,,Cinquante nuances plus sombres,5891,6.4,"[{'adult': False, 'gender': 1, 'tmdb_id': 1185...",15186,Cinquante nuances plus sombres
7052,434616,False,{},0,"[{'id': 18, 'name': 'Drame'}, {'id': 35, 'name...",tt5598100,fr,Patients,"Se laver, s'habiller, marcher, jouer au basket...",7.122,...,110.0,"[{'iso_code': 'fr', 'name': 'Français'}]",Released,,Patients,469,7.5,"[{'adult': False, 'gender': 2, 'tmdb_id': 1314...",16903,Patients


In [15]:
# size
movies_features_df.size

169272

In [16]:
# shape
movies_features_df.shape

(7053, 24)

In [17]:
# loc
movies_features_df.loc[0, 'original_title']

'Maleficent: Mistress of Evil'

In [18]:
# iloc
movies_features_df.iloc[0, 0:10]

tmdb_id                                                             420809
adult                                                                False
belongs_to_collection           {'id': 531331, 'name': 'Maléfique - Saga'}
budget                                                           185000000
genres                   [{'id': 14, 'name': 'Fantastique'}, {'id': 107...
imdb_id                                                          tt4777008
original_language                                                       en
original_title                                Maleficent: Mistress of Evil
overview                 Cinq années après la conjuration de la malédic...
tmdb_popularity                                                    184.959
Name: 0, dtype: object

In [21]:
# query
movies_features_df.query('original_title == "Maleficent: Mistress of Evil"')

Unnamed: 0,tmdb_id,adult,belongs_to_collection,budget,genres,imdb_id,original_language,original_title,overview,tmdb_popularity,...,runtime,languages,status,tagline,title,tmdb_vote_count,tmdb_vote_average,cast,id,query
0,420809,False,"{'id': 531331, 'name': 'Maléfique - Saga'}",185000000,"[{'id': 14, 'name': 'Fantastique'}, {'id': 107...",tt4777008,en,Maleficent: Mistress of Evil,Cinq années après la conjuration de la malédic...,184.959,...,110.0,"[{'iso_code': 'en', 'name': 'English'}]",Released,Aller au-delà du conte de fée.,Maléfique 2 Le pouvoir du mal,3904,7.4,"[{'adult': False, 'gender': 1, 'tmdb_id': 1170...",19073,Maléfique : Le Pouvoir du Mal


### 3-4. Data merging

Reload your boxoffice dataset. Using the `DataFrame.merge` function, merge both datasets into a single one. This will be the data we'll analyse and transform in the next session. How would you prefer to save it? In which format?

In [23]:
boxoffice_df = pd.DataFrame(boxoffice)
train_set = movies_features_df.merge(boxoffice_df, on='id')
train_set.head()

Unnamed: 0,tmdb_id,adult,belongs_to_collection,budget,genres,imdb_id,original_language,original_title,overview,tmdb_popularity,...,year,rank,title_y,total_sales,release_date_y,max_theaters_used,first_screening_sales,first_day_sales,first_weekend_sales,first_week_sales
0,420809,False,"{'id': 531331, 'name': 'Maléfique - Saga'}",185000000,"[{'id': 14, 'name': 'Fantastique'}, {'id': 107...",tt4777008,en,Maleficent: Mistress of Evil,Cinq années après la conjuration de la malédic...,184.959,...,2019,13,Maléfique : Le Pouvoir du Mal,2718604,2019-10-16,575,,66164.0,548537.0,786485
1,542830,False,"{'id': 629597, 'name': 'Les Petits Mouchoirs -...",0,"[{'id': 35, 'name': 'Comédie'}, {'id': 18, 'na...",tt8201404,fr,Nous finirons ensemble,"Préoccupé, Max est parti dans sa maison au bor...",10.84,...,2019,12,Nous finirons ensemble,2800004,2019-05-01,757,,370691.0,1056143.0,1261701
2,429617,False,"{'id': 531241, 'name': 'Spider-Man (Avengers) ...",160000000,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",tt6320628,en,Spider-Man: Far from Home,Peter et ses amis passent leurs vacances d’été...,204.536,...,2019,11,Spider-Man: Far from Home,3226105,2019-07-03,833,,453503.0,1109050.0,1370178
3,512200,False,"{'id': 495527, 'name': 'Jumanji - Saga'}",125000000,"[{'id': 12, 'name': 'Aventure'}, {'id': 35, 'n...",tt7975244,en,Jumanji: The Next Level,"L’équipe est de retour, mais le jeu a changé. ...",237.001,...,2019,10,Jumanji: next level,3255668,2019-12-04,676,,103369.0,713189.0,785636
4,166428,False,"{'id': 89137, 'name': 'Dragons - Saga'}",129000000,"[{'id': 16, 'name': 'Animation'}, {'id': 10751...",tt2386490,en,How to Train Your Dragon: The Hidden World,Ce qui avait commencé comme une amitié improba...,113.912,...,2019,9,Dragons 3 : Le monde caché,3367445,2019-02-06,744,,300542.0,1020715.0,1224811


In [None]:
# Save your data here
payload = train_set.to_dict('records')
write_json_to_file(payload, "../../data/train_set.json")