# Deep Learning Tutorial

Deep Learning Tutorial based on https://spandan-madan.github.io/DeepLearningProject/.

Credits: [Spandan Madan](http://people.csail.mit.edu/smadan/web/).

## Imports

* In this section we import required packages.
* We can install these packages using the notebook itself.

In [2]:
# Inline figures
%matplotlib inline

In [3]:
# Installing Python packages from the notebook
import sys
#!conda install -c conda-forge --yes --prefix {sys.prefix} urllib2 # did not work cause of Python 2
# Change to python 3 by using urllib instead
#!{sys.executable} -m pip install urllib2 # did not work cause of Python 2
#!conda install -c conda-forge --yes --prefix {sys.prefix} wget # did not work
#!{sys.executable} -m pip install wget # worked
#!{sys.executable} -m pip install imdb # did not work
#!{sys.executable} -m pip install IMDbPY # worked

Based on [Jake Van Der Plas](http://jakevdp.github.io) blog post on [Installing Python packages on a Jupyter Notebook](http://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/).

In [4]:
# Web access packages
import urllib
import requests
import wget

# Utilities packages
import itertools
import time
import os
import json
import logging

# Movies database API packages
import imdb 
import tmdbsimple as tmdb


import numpy as np
import random as rd

# Figure and style packages
import matplotlib.pyplot as plt
import seaborn as sns

# Object serialization package
import pickle

# Custom utility functions 
from dltutorial.utils import get_movie_id_tmdb
from dltutorial.utils import get_movie_info_tmdb
from dltutorial.utils import get_movie_genres_tmdb
from dltutorial.utils import get_api_key_tmdb


## Logging

* Here we instanciate a logging object to record our logs.

In [5]:
logging.basicConfig(filename="./dltutorial/logs/dev_logs.txt",
                    level=logging.INFO,
                    format=' %(asctime)s - %(funcName)s -'
                    '%(levelname)s - %(message)s')

## Utility functions

* You may want to create a private Python module containing only an ``__init__.py`` file that consists in a string variable ``API_KEY`` that is your private API key from [TMDB](https://www.themoviedb.org/). Else you can instanciate the ``api_key`` directly in your notebook but do not share it!

* Custom utility functions have been implemented in ``dltutorial.utils``.

* We create a folder where we store the scrapped movie posters.

In [6]:
logging.info('Setting posters storage folder...')

# Create a data folder
data_folder = 'data/'
# Poster sub-folder inside data folder
poster_folder = 'posters/'
# Python package main folder
main_folder = './dltutorial/'
complete_path = main_folder  + data_folder + poster_folder


if data_folder.split('/')[0] in os.listdir(main_folder):
    logging.debug('%s in root directory...'
                  % data_folder)
    if poster_folder.split('/')[0] in os.listdir(main_folder + data_folder):
        logging.debug('%s in %s folder...' %(
            poster_folder, data_folder))
        print('%s folder already exists.' % complete_path)
    else:
        logging.debug('%s not in %s folder.'
                      'Creating relevant %s folder...' %
                     (poster_folder, data_folder, poster_folder))
        os.mkdir(complete_path)
else:
    logging.debug('%s not in root directory. Creating relevant folders...'
                 % data_folder)
    os.mkdir(main_folder + data_folder)
    os.mkdir(complete_path)

./dltutorial/data/posters/ folder already exists.


In [7]:
logging.info('Setting TMDB API key...')
if 'private' in os.listdir(main_folder): #
    logging.debug('Private folder exists...')
    from dltutorial import private
    api_key = private.API_KEY
else:
    logging.debug('No private folder found...')
    print('There is no private folder.'
          'API key will remain blank if you do not set it.')
    api_key = '' # put your own API key but do not share it
# Set the TMDB API key
tmdb.API_KEY = api_key 
logging.debug('TMDB API key is set to %s...' % api_key)

# Instanciate a search object from TMDB
search_tmdb = tmdb.Search()

### Some examples

In [8]:
# Some examples
movie_name = "The Matrix"

movie_id = get_movie_id_tmdb(movie_name=movie_name, search_tmdb=search_tmdb)
print("%s has id %s\n" % (movie_name, movie_id))
movie_info = get_movie_info_tmdb(movie_name=movie_name, search_tmdb=search_tmdb)
print("%s has these info categories:\n %s\n" % (movie_name, movie_info.keys()))
movie_genres = get_movie_genres_tmdb(movie_name=movie_name, search_tmdb=search_tmdb)
print("%s belongs to these genres:\n %s\n" % (movie_name, movie_genres))

The Matrix has id 603

The Matrix has these info categories:
 dict_keys(['id', 'poster_path', 'runtime', 'title', 'adult', 'production_companies', 'budget', 'spoken_languages', 'production_countries', 'vote_average', 'release_date', 'original_language', 'belongs_to_collection', 'video', 'vote_count', 'genres', 'revenue', 'status', 'overview', 'backdrop_path', 'imdb_id', 'popularity', 'original_title', 'homepage', 'tagline'])

The Matrix belongs to these genres:
 [{'id': 28, 'name': 'Action'}, {'id': 878, 'name': 'Science Fiction'}]



## IMDB

In [10]:
# Create the IMDB object that will be used to access the IMDb's database.
imdb_object = imdb.IMDb() # by default access the web.

# Search for a movie (get a list of Movie objects).
results = imdb_object.search_movie('The Matrix')

# As this returns a list of all movies containing the word "The Matrix", we pick the first element
movie = results[0]

# Enrich the movie infos
imdb_object.update(movie)

movie['genres']

['Action', 'Sci-Fi']

In [11]:
def get_movie_genres_imdb(movie_name, search_imdb):
    """
    Get movie genres from IMDB using movie name
    
    Parameters
    -----------
    - movie_name : string
        Name of the movie
        
    - search_imdb : imdb.IMDb object
        imdb instantiated object
        
    Returns
    ----------
    - movie_genres : list of strings
        strings list containing movie genres
    """
    logging.debug("Search movie %s in IMDB..." % movie_name)
    # Search and retrieve first result
    movie_info = search_imdb.search_movie(movie_name)[0]
    # Enrich movie infos
    search_imdb.update(movie_info)
    # Extract movie genres
    movie_genres = movie_info['genres']
    logging.debug("Retrieving movie genres %s..." % movie_genres)
    return movie_genres

def get_movie_info_imdb(movie_name, search_imdb):
    """
    Get movie infos from IMDB using movie name
    
    Parameters
    -----------
    - movie_name : string
        Name of the movie
        
    - search_imdb : imdb.IMDb object
        imdb instantiated object
        
    Returns
    ----------
    - movie_info : dictionary
        dictionary containing movie infos
    """
    logging.debug("Search movie %s in IMDB..." % movie_name)
    # Search and retrieve first result
    movie_info = search_imdb.search_movie(movie_name)[0]
    # Enrich movie infos
    search_imdb.update(movie_info)
    logging.debug("Movie info retrieved? %s..." % (movie_info is not None))
    return movie_info

### Test cases

In [15]:
movie_name = "The Matrix"
search_imdb = imdb.IMDb()

movie_genres = get_movie_genres_imdb(movie_name=movie_name, search_imdb=search_imdb)
assert isinstance(movie_genres, list)
print("%s movie belongs to genres:\n %s \n" % (movie_name, movie_genres))
movie_info = get_movie_info_imdb(movie_name=movie_name, search_imdb=search_imdb)
assert isinstance(movie_info, imdb.Movie.Movie)
print("%s movie first 10 info categories:\n %s \n" % (movie_name, movie_info.keys()[:10]))

The Matrix movie belongs to genres:
 ['Action', 'Sci-Fi'] 

The Matrix movie first 10 info categories:
 ['title', 'animation department', 'production managers ', 'producer', 'special effects companies', 'casting department', 'kind', 'cast', 'casting director', 'camera and electrical department'] 

