# Project 3: OMDB and TasteDive Mashup
## by Emmanuel Paredes Rocha

This is the final project by de course <a href="https://www.coursera.org/learn/data-collection-processing-python">Data Collection and Processing with Python</a> offered by the University of Michigan through coursera.org

This project has the task to build mash-up data from two different APIs to make movie recommendations. The TasteDive API lets you provide a movie (or bands, TV shows, etc.) as a query input, and returns a set of related items. The OMDB API lets you provide a movie title as a query input and get back data about the movie, including scores from various review sites (IMDB, Rotten Tomatoes, Metacritic, etc.).

TasteDive is used to get related movies for a whole list of titles. One will combine the resulting lists of related movies, and sort them according to their rating scores assigned by a Rating source (which will require making API calls to the OMDB API.); this Rating source can be IMDB, Rotten Tomatoes or Metacritic.

To understand the request URL, you can take a look at the APIs documentation:
*  TasteDive: https://tastedive.com/read/api
*  OMDB: https://www.omdbapi.com/

One can make a few requests to experiment with the TasteDive API, but f you want to use the API several times, you must to <a href='https://tastedive.com/account/signin'>request an access key</a> and you can perform 300 requests per hour. On another side, to make requests to the OMDB API you need to obtain a <a href='necessarily'>access key</a> necessarily.


**To overcome the project's task, I follow the next steps:**
* First, I define a function to get the movie titles related to the given movie title from the tasteDive API. With this function, one can set a defined number of related titles. The requests to the API are made with the **request** module and the responses are in JSON format. Therefore, the responses are converted to python dictionaries with the **json** module. **get_movies_from_tastedive** function.
*  Second, I define a function to extract only the movie titles from the request's response in the first step like a list. **extract_movie_titles** function.
*  Third, I define a function that takes a list of movie titles and returns a list of related movie titles using the two previous functions. It is possible that two or more movie titles of the given list have the same related titles. The final returned list by this function omitted these repeated titles. **get_related_titles** function.
*   Fourth, to get information about all obtained related titles, I define a function to get info for each movie title from the OMDB API. To avoid repeated web requests, I used a dictionary (**cache_dict**) to store the information of movies that have been previously requested as a cache file in the memory. **get_movie_data** function.
*  Fifth, I define a function to extract the rating score assigned by one of the three rating sources. This info is obtained by the data movie obtained in the fourth step. If the source rating doesn't have a rating score, then the rating score becomes zero. **get_movie_rating**
* Finally, I define a function to make a list of sorted recommended movies related to the given list of movie titles. The classification of the recommendations is based on the rating for each movie. On this function, you can choose your prefered Rating source: IMDB, Rotten Tomatoes or Metacritic. **get_sorted_recommendations** function.

To compare the Rating sources, on the final cell code I generate a list of recommendations related to the same list of movies with different Rating sources.


With this project, I obtained my <a href="https://www.coursera.org/account/accomplishments/verify/SX2JDSUCNYCN">course certification</a>, showing my abilities to get and manage information by APIs REST with python.

In [None]:
import requests
import json

cache_dict = {} #dictionary to save the cache data from the OMDB API in the internal memory

In [None]:
def get_movies_from_tastedive(q_string, q_type = 'movies', q_limit = 5):
    '''This function return the q_limit results from the TasteDive API of type q_types
    that are similar titles to the string q_string. The result info is within a dictionary.'''
    baseurl = 'https://tastedive.com/api/similar'
    d_params = {} #see the API documentation for the keys https://tastedive.com/read/api
    d_params['q'] = q_string 
    d_params['type'] = q_type
    d_params['limit'] = q_limit
    #d_params['k'] = 'yuor_api_key' #you need a key if you want to perform 300 requests per hour
    resp = requests.get(baseurl, params = d_params) #get request with the full url
    #print(resp.url)
    #print(resp.text[:100])
    return resp.json() #return the info as a python dictionary from json format

def extract_movie_titles(dict_movies):
    '''This function extract just the list of movies titles from the 
    dictionary returned by the function get_movies_from_tastedive,
    but only the movies in the key "Results".'''
    lst_movies = [r['Name'] for r in dict_movies['Similar']['Results']] #see the keys from dict_movies
    return lst_movies

print("--- tests functions ---")
print(get_movies_from_tastedive('Shrek', q_limit=3))
extract_movie_titles(get_movies_from_tastedive('Shrek'))

--- tests functions ---
{'Similar': {'Info': [{'Name': 'Shrek', 'Type': 'movie'}], 'Results': [{'Name': 'Shrek 2', 'Type': 'movie'}, {'Name': 'Shrek The Third', 'Type': 'movie'}, {'Name': 'Shrek Forever After', 'Type': 'movie'}]}}


['Shrek 2',
 'Shrek The Third',
 'Shrek Forever After',
 'Ratatouille',
 'Madagascar']

In [None]:
def get_related_titles(movies_lst, limit = 5):
    '''This function takes a list of movies as parameter and gets
    "limits" related movies for each one from TasteDive API, extracts all
    titles and returns them into a single list, without the same movie
    twice'''
    related_titles = [] #with for loop save all related titles to each movies in the list
    for movie in movies_lst:
      related_titles_movie = extract_movie_titles(get_movies_from_tastedive(movie, q_limit=limit))
      related_titles += related_titles_movie #join lists
    related_titles_not_twice = [] #delete the repeted titles in related_titles list
    for movie in related_titles:
        if movie in related_titles_not_twice: #see if the movie is more than once and continue with the next item in the for loop
            continue
        related_titles_not_twice.append(movie)
    return related_titles_not_twice

print("--- tests function ---")
get_related_titles(['Shrek','Antz'])

--- tests function ---


['Shrek 2',
 'Shrek The Third',
 'Shrek Forever After',
 'Ratatouille',
 'Madagascar',
 'Madagascar: Escape 2 Africa',
 'Over The Hedge',
 "A Bug's Life",
 'Chicken Run']

In [None]:
def get_movie_data(movie_string):
    '''This function takes the movie title movie_string and return
    a dictioary with information about that movie from the OMDB API.
    Firstly, ckecks if the movie title info is into the cache to avoid the request web,
    else make teh request GET.'''
    if movie_string in cache_dict: #checks if the moive title is into the cache dictionary
      #print('found in cache')
      resp = cache_dict[movie_string] #return the info into the cache like a dictinoary
      return resp
    else: #make the resuqest get
      #see the API documentation for the keys request, https://www.omdbapi.com/
      baseurl='http://www.omdbapi.com/'
      d_params = {}
      d_params['t'] = movie_string
      d_params['apikey'] = '8552435f' #apikey, see the desription on the begin to obtain one
      d_params['r'] = 'json' #obtain the response on json format
      resp = requests.get(baseurl, params = d_params)
      #print('add to cache')
      cache_dict[movie_string] = resp.json() #add the info movie to cache like dictinoary
      return resp.json() #return the info like a python dictionary from a json format 

print("--- tests function ---")
get_movie_data('Shrek')

--- tests function ---


{'Actors': 'Mike Myers, Eddie Murphy, Cameron Diaz',
 'Awards': 'Won 1 Oscar. 40 wins & 60 nominations total',
 'BoxOffice': '$267,665,011',
 'Country': 'United States',
 'DVD': '19 Aug 2003',
 'Director': 'Andrew Adamson, Vicky Jenson',
 'Genre': 'Animation, Adventure, Comedy',
 'Language': 'English',
 'Metascore': '84',
 'Plot': 'A mean lord exiles fairytale creatures to the swamp of a grumpy ogre, who must go on a quest and rescue a princess for the lord in order to get his land back.',
 'Poster': 'https://m.media-amazon.com/images/M/MV5BOGZhM2FhNTItODAzNi00YjA0LWEyN2UtNjJlYWQzYzU1MDg5L2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyMTQxNzMzNDI@._V1_SX300.jpg',
 'Production': 'N/A',
 'Rated': 'PG',
 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '7.9/10'},
  {'Source': 'Rotten Tomatoes', 'Value': '88%'},
  {'Source': 'Metacritic', 'Value': '84/100'}],
 'Released': '18 May 2001',
 'Response': 'True',
 'Runtime': '90 min',
 'Title': 'Shrek',
 'Type': 'movie',
 'Website': 'N/A',
 'Writer': 

In [None]:
def get_rating_format(rating_value, rating_source):
  '''Function to obtain the rating value according to the format of the Source Rating.
  rating_value is provided by the data dictionary from OMDB API in the key "Ratings".'''
  if rating_source == 'Internet Movie Database':
    return float(rating_value[:-3]) #format for IMD rating: 'x/10', x in [0,10] float
  if rating_source == 'Rotten Tomatoes':
    return int(rating_value[:-1]) #format for Rotten Tomatoes rating: 'x%', x in [0,100] int
  if rating_source == 'Metacritic':
    return  int(rating_value[:-4]) #format for Metacritic rating: 'x/100', x in [0,100] int

def get_movie_rating(OMBD_dict, rating_source = 'Internet Movie Database'):
    '''This function takes a OMDB dictionary and extracts the Rotten Tomatoes
    rating as an integer. The OMDB dictionary could be obtained from the
    the function get_movie_data. You have three sources to obtain the movie's rating:
    1.- 'Internet Movie Database' (IMDb), 2.- 'Rotten Tomatoes', 3.- 'Metacritic'.
    Assign your source in rating_source. 
    '''
    for rating in OMBD_dict['Ratings']: #see the form of the OMBD dictionary with get_movie_data function
        if rating['Source'] == rating_source:
            rat =  get_rating_format(rating['Value'], rating_source) #rating['Value'] is the assigned rating value by the rating_source
            break
        else: #if the movie title has not assigned rating value for the rating_source, the rating value is 0
            rat = 0
    return rat

print("--- tests function ---")
get_movie_rating(get_movie_data('Shrek'))

--- tests function ---


7.9

In [None]:
def get_sorted_recommendations(movies_lst, rating_source = 'Internet Movie Database', limit_per_movie = 5):
    '''This function takes a list of movie titles as an input.
    It returns a sorted list of related movie titles as output,
    up to 'limit_per_movie' related movies for each input movie title. The
    movies should be sorted in descending order by their 
    rating value assigned by the 'rating_source', as returned by the get_movie_rating function.'''
    related_movies = get_related_titles(movies_lst,limit=limit_per_movie) #obtain the list of related titles by the list movies (without repetitions)
    #obtains the movie title and the rating value assigned by rating_source
    related_movies_rating = [ ( movie, get_movie_rating(get_movie_data(movie), rating_source=rating_source) ) for movie in related_movies]
    #sort the movie titles according to their rating value, from the higher to the lower
    related_movies_rating_sort = sorted(related_movies_rating, key = lambda data: (data[1], data[0]), reverse = True)
    return [movie for movie, rating in related_movies_rating_sort]  #return only the list of the movies titles, w/o ratings  

print("--- tests function ---")
get_sorted_recommendations(['Shrek','Whiplash'])

--- tests function ---


["One Flew Over The Cuckoo's Nest",
 'The Grand Budapest Hotel',
 'Mad Max: Fury Road',
 'The Imitation Game',
 'Ratatouille',
 'La La Land',
 'Shrek 2',
 'Madagascar',
 'Shrek Forever After',
 'Shrek The Third']

***On this part***, I generate a list of recommendations related to the list movies ['Shrek','Whiplash'] using the three different Rating sources. Add, ussing the assig_index function to ilustrate clearly the order of recommendations, we cn see that the first two recommended movies are different to each Rating source. For this, it is important to select your preferred and reliable Rating source when you need recommended movies ✌

In [None]:
def assig_index(lst):
  '''Function to index the elements of the list lst'''
  index = 0
  lst_with_index  =[]
  for elem in lst:
    lst_with_index.append((index,elem))
    index+= 1
  return lst_with_index

print(assig_index(get_sorted_recommendations(['Shrek','Whiplash'],rating_source='Internet Movie Database')))
print(assig_index(get_sorted_recommendations(['Shrek','Whiplash'],rating_source='Rotten Tomatoes')))
print(assig_index(get_sorted_recommendations(['Shrek','Whiplash'],rating_source='Metacritic')))

[(0, "One Flew Over The Cuckoo's Nest"), (1, 'The Grand Budapest Hotel'), (2, 'Mad Max: Fury Road'), (3, 'The Imitation Game'), (4, 'Ratatouille'), (5, 'La La Land'), (6, 'Shrek 2'), (7, 'Madagascar'), (8, 'Shrek Forever After'), (9, 'Shrek The Third')]
[(0, 'Mad Max: Fury Road'), (1, 'Ratatouille'), (2, "One Flew Over The Cuckoo's Nest"), (3, 'The Grand Budapest Hotel'), (4, 'La La Land'), (5, 'The Imitation Game'), (6, 'Shrek 2'), (7, 'Shrek Forever After'), (8, 'Madagascar'), (9, 'Shrek The Third')]
[(0, 'Ratatouille'), (1, 'La La Land'), (2, 'Mad Max: Fury Road'), (3, 'The Grand Budapest Hotel'), (4, "One Flew Over The Cuckoo's Nest"), (5, 'Shrek 2'), (6, 'The Imitation Game'), (7, 'Shrek The Third'), (8, 'Shrek Forever After'), (9, 'Madagascar')]
