# Lesson 7: Advanced web scraping and data gathering
## Activity 2: Build your own movie database by reading from an API
### This notebook does the following
* Retrieves and prints basic data about a movie (title entered by user) from the web (OMDB database)
* If a poster of the movie could be found, it downloads the file and saves at a user-specified location

In [10]:
import urllib.request, urllib.parse, urllib.error
import json
import os

### Load the secret API key (you have to get one from OMDB website and use that, 1000 daily limit) from a JSON file, stored in the same folder into a variable
Hint: Use **`json.loads()`**

#### Note: The following cell will not be executed in the solution notebook because the author cannot give out his private API key. 
#### Students/users/instructor will need to obtain a key and store in a JSON file. 
#### For the code's sake, we are calling this file `APIkeys.json`. But you need to store your own key in this file.
#### An example file called `"APIkey_Bogus_example.json"` is given along with the notebook. Just change the code in this file and rename as `APIkeys.json`. The file name does not matter of course.

In [25]:
with open("APIkeys.json") as file:
    key = json.load(file)
    DSC540 = key["99227ca9"]

FileNotFoundError: [Errno 2] No such file or directory: 'APIkeys.json'

### The final URL to be passed should look like: http://www.omdbapi.com/?t=movie_name&apikey=secretapikey 
Do the following,
* Assign the OMDB portal (http://www.omdbapi.com/?) as a string to a variable `serviceurl` (don't miss the `?`)
* Create a variable `apikey` with the last portion of the URL ("&apikey=secretapikey"), where `secretapikey` is your own API key (an actual code)
* The movie name portion i.e. "t=movie_name" will be addressed later

In [47]:
serviceurl = "http://www.omdbapi.com/?"
apikey = "&apikey=99227ca9"

### Write a utility function `print_json` to print nicely the movie data from a JSON file (which we will get from the portal)
Here are the keys of a JSON file,

'Title', 'Year', 'Rated', 'Released', 'Runtime', 'Genre', 'Director', 'Writer', 'Actors', 'Plot', 'Language','Country', 'Awards', 'Ratings', 'Metascore', 'imdbRating', 'imdbVotes', 'imdbID'

In [41]:
def print_json(data):
    keys = ['Title', 'Year', 'Rated', 'Released', 'Runtime', 'Genre', 'Director', 'Writer', 'Actors', 'Plot', 'Language','Country', 'Awards', 'Ratings', 'Metascore', 'imdbRating', 'imdbVotes', 'imdbID']
    print("-"*50)
    for k in keys:
        if k in list(data.keys()):
            print(f"{k}: {data[k]}")
    print("-"*50)

### Write a utility function to download a poster of the movie based on the information from the jason dataset and save in your local folder

* Use `os` module
* The poster data is stored in the JSON key 'Poster'
* You may want to split the name of the Poster file and extract the file extension only. Let's say the extension is ***'jpg'***.
* Then later join this extension to the movie name and create a filename like ***movie.jpg***
* Use the Python command `open` to open a file and write the poster data. Close the file after done.
* This function may not return anything. It just saves the poster data as an image file.

In [42]:
def dwnload_poster(data):
    title = data["Title"]
    poster = data["Poster"]
    file_split = poster.split(".")[-1]
    file_data = urllib.request.urlopen(poster).read()
    save = os.getcwd()+"\\"+"Posters"+"\\"
    if not os.path.isdir(save):
        os.mkdir(save)
    name = save+str(title)+"."+file_split
    f = open(name, "wb")
    f.write(data)
    f.close()

### Write a utility function `search_movie` to search a movie by its name, print the downloaded JSON data (use the `print_json` function for this) and save the movie poster in the local folder (use `save_poster` function for this)

* Use `try-except` loop for this i.e. try to connect to the web portal, if successful proceed but if not (i.e. exception raised) then just print an error message
* Here use the previously created variables `serviceurl` and `apikey`
* You have to pass on a dictionary with a key `t` and the movie name as the corresponding value to `urllib.parse.urlencode()` function and then add the `serviceurl` and `apikey` to the output of the function to construct the full URL
* This URL will be used for accessing the data
* The JSON data has a key called `Response`. If it is `True`, that means the read was successful. Check this before processing the data. If not successful, then print the JSON key `Error`, which will contain the appropriate error message returned by the movie database.

In [43]:
def search_movie(title):
    try:
        url = serviceurl + urllib.parse.urlencode({"t":str(title)}) + apikey
        print(url)
        opening = urllib.request.urlopen(url)
        data = opening.read()
        json_data = json.loads(data)
        
        if json_data["Response"] == "True":
            print_json(json_data)
            if json_data["Poster"] != "N/A":
                dwnload_poster(json_data)
        else:
            print("Error", json_data["Error"])

    except urllib.error.URLError as e:
        print(f"Error: {e.reason}")

### Test `search_movie` function by entering *Titanic*

In [48]:
search_movie("Titanic")

http://www.omdbapi.com/?t=Titanic&apikey=99227ca9
--------------------------------------------------
Title: Titanic
Year: 1997
Rated: PG-13
Released: 19 Dec 1997
Runtime: 194 min
Genre: Drama, Romance
Director: James Cameron
Writer: James Cameron
Actors: Leonardo DiCaprio, Kate Winslet, Billy Zane, Kathy Bates
Plot: A seventeen-year-old aristocrat falls in love with a kind but poor artist aboard the luxurious, ill-fated R.M.S. Titanic.
Language: English, Swedish, Italian, French
Country: USA, Mexico, Australia, Canada
Awards: Won 11 Oscars. Another 113 wins & 83 nominations.
Ratings: [{'Source': 'Internet Movie Database', 'Value': '7.8/10'}, {'Source': 'Rotten Tomatoes', 'Value': '89%'}, {'Source': 'Metacritic', 'Value': '75/100'}]
Metascore: 75
imdbRating: 7.8
imdbVotes: 1,018,292
imdbID: tt0120338
--------------------------------------------------


TypeError: a bytes-like object is required, not 'dict'

### Test `search_movie` function by entering "*Random_error*" (obviously this will not be found and you should be able to check whether your error catching code is working properly)

In [49]:
search_movie("Random_error")

http://www.omdbapi.com/?t=Random_error&apikey=99227ca9
Error Movie not found!


### Look for a folder called 'Posters' in the same directory you are working in. It should contain a file called 'Titanic.jpg'. Open and see if the poster came alright!