COMP 215 - LAB 2  (iMdb)
----------------
#### Name: William Qin
#### Date: 2023/01/16
This lab exercise is mostly a review of strings, tuples, lists, dictionaries, and functions.
We will also see how "list comprehension" provides a compact form for "list accumulator" algorithms.

As usual, the first code cell simply imports all the modules we'll be using...

In [1]:
import json, requests
import matplotlib.pyplot as plt
from pprint import pprint

We'll answer some questions about movies and TV shows with the IMDb database:  https://www.imdb.com/
> using the IMDb API:  https://imdb-api.com/api

You can register for your own API key, or simply use the one provided below.

Here's an example query:
 *   search for TV Series with title == "Lexx"

In [2]:
API_KEY = 'k_ynffhhna'

title = 'lexx'
url = "https://imdb-api.com/en/API/SearchTitle/{key}/{title}".format(key=API_KEY, title=title)

response = requests.request("GET", url, headers={}, data={})

data = json.loads(response.text)  # recall json.loads for lab 1

results = data['results']
pprint(results)

[{'description': '1996–2002 TV Series Brian Downey, Michael McManus',
  'id': 'tt0115243',
  'image': 'https://m.media-amazon.com/images/M/MV5BOGFjMzQyMTYtMjQxNy00NjAyLWI2OWMtZGVhMjk4OGM3ZjE5XkEyXkFqcGdeQXVyNzMzMjU5NDY@._V1_Ratio0.6757_AL_.jpg',
  'resultType': 'Title',
  'title': 'Lexx'},
 {'description': '2008 Video Daniel Beaudoin, Annick Blanchard',
  'id': 'tt1833738',
  'image': 'https://m.media-amazon.com/images/M/MV5BMjAyMTYzNjk4NV5BMl5BanBnXkFtZTcwNzE4MTU0NA@@._V1_Ratio0.7027_AL_.jpg',
  'resultType': 'Title',
  'title': 'Lexx'},
 {'description': '2018 Lexy Roxx',
  'id': 'tt10800568',
  'image': 'https://m.media-amazon.com/images/M/MV5BZWY5ODYwNzYtMmIyMS00YzhhLTg0OTAtODM1M2I5YzkxMzY1XkEyXkFqcGdeQXVyMTEwNDU1MzEy._V1_Ratio0.7568_AL_.jpg',
  'resultType': 'Title',
  'title': 'Lexxy Roxx: Lexy 360 - Der Film'},
 {'description': '2010 Joseph Gordon-Levitt, Carla Gugino',
  'id': 'tt1340773',
  'image': 'https://m.media-amazon.com/images/M/MV5BMTM0MDU1MjkyMF5BMl5BanBnXkFtZTcwNDQ5MD

Next we extract the item we want from the data set by applying a "filter":

In [3]:
items = [item for item in results if item['title']=='Lexx' and "TV" in item['description']]
assert len(items) == 1
lexx = items[0]
pprint(lexx)

{'description': '1996–2002 TV Series Brian Downey, Michael McManus',
 'id': 'tt0115243',
 'image': 'https://m.media-amazon.com/images/M/MV5BOGFjMzQyMTYtMjQxNy00NjAyLWI2OWMtZGVhMjk4OGM3ZjE5XkEyXkFqcGdeQXVyNzMzMjU5NDY@._V1_Ratio0.6757_AL_.jpg',
 'resultType': 'Title',
 'title': 'Lexx'}


## Exercise 1

In the code cell below, re-write the "list comprehension" above as a loop so you understand how it works.
Notice how the "conditional list comprehension" is a compact way to "filter" items of interest from a large data set.


In [4]:
items = []
for item in results:
    if item['title']=='Lexx' and "TV" in item['description']:
      items.append(item)
assert len(items) == 1
lexx = items[0]
pprint(lexx)

{'description': '1996–2002 TV Series Brian Downey, Michael McManus',
 'id': 'tt0115243',
 'image': 'https://m.media-amazon.com/images/M/MV5BOGFjMzQyMTYtMjQxNy00NjAyLWI2OWMtZGVhMjk4OGM3ZjE5XkEyXkFqcGdeQXVyNzMzMjU5NDY@._V1_Ratio0.6757_AL_.jpg',
 'resultType': 'Title',
 'title': 'Lexx'}


Notice that the `lexx` dictionary contains an `id` field that uniquely identifies this record in the database.

We can use the `id` to fetch other information about the TV series, for example,
*   get names of all actors in the TV Series Lexx


In [5]:
url = "https://imdb-api.com/en/API/FullCast/{key}/{id}".format(key=API_KEY, id=lexx['id'])
response = requests.request("GET", url, headers={}, data={})
data = json.loads(response.text)

actors = data['actors']
pprint(actors[:10])   # recall the slice operator (it's a long list!)

[{'asCharacter': 'Stanley H. Tweedle / ... 61 episodes, 1996-2002',
  'id': 'nm0235978',
  'image': 'https://m.media-amazon.com/images/M/MV5BMTYxODI3OTM5Ml5BMl5BanBnXkFtZTgwMjM4ODc3MjE@._V1_Ratio1.3182_AL_.jpg',
  'name': 'Brian Downey'},
 {'asCharacter': 'Kai / ... 61 episodes, 1996-2002',
  'id': 'nm0573158',
  'image': 'https://m.media-amazon.com/images/M/MV5BMTY3MjQ4NzE0NV5BMl5BanBnXkFtZTgwNDE4ODc3MjE@._V1_Ratio1.3182_AL_.jpg',
  'name': 'Michael McManus'},
 {'asCharacter': '790 / ... 57 episodes, 1996-2002',
  'id': 'nm0386601',
  'image': 'https://m.media-amazon.com/images/M/MV5BMjMyMDM1NzgzNF5BMl5BanBnXkFtZTgwOTM4ODc3MjE@._V1_Ratio1.3182_AL_.jpg',
  'name': 'Jeffrey Hirschfield'},
 {'asCharacter': 'Xev Bellringer / ... 55 episodes, 1998-2002',
  'id': 'nm0781462',
  'image': 'https://m.media-amazon.com/images/M/MV5BMTk2MDQ4NzExOF5BMl5BanBnXkFtZTcwOTMyNzcyMQ@@._V1_Ratio0.7273_AL_.jpg',
  'name': 'Xenia Seeberg'},
 {'asCharacter': 'The Lexx 46 episodes, 1996-2002',
  'id': 'nm0302

Notice that the `asCharacter` field contains a number of different pieces of data as a single string, including the character name.
This kind of "free-form" text data is notoriously challenging to parse...

## Exercise 2

In the code cell below, write a python function that takes a string input (the text from `asCharacter` field)
and returns the number of episodes, if available, or None.

Hints:
* notice this is a numeric value followed by the word "episodes"
* recall str.split() and str.isdigit() and other string build-ins.

Add unit tests to cover as many cases from the `actors` data set above as you can.


In [6]:
# your code here

## Exercise 3

In the code cell below, write a python function that takes a string input (the text from `asCharacter` field)
and returns just the character name.  This one may be even a little harder!

Hints:
* notice the character name is usually followed by a forward slash, `/`
* don't worry if your algorithm does not perfectly parse every character's name --
it may not really be possible to correclty handle all cases because the field format does not follow consistent rules

Add unit tests to cover as many cases from the `actors` data set above as you can.


In [7]:
# Your code here


## Exercise 4

Using the functions you developed above, define 2 list comprehensions that:
* create list of 2 tuples with (actor name, character description) for actors in Lexx  (from `asCharacter` field)
* create a list of dictionaries, with keys:  'actor' and 'character' for the same data

Hint: this is a very simple problem - the goal is to learn how to build these lists using a comprehension.

Pretty print (pprint) your lists to visually verify the results.

In [8]:
# your code here