This script is based on instructions given in [this lesson](https://github.com/HeardLibrary/digital-scholarship/blob/master/code/scrape/pylesson/lesson2-api.ipynb). 

## Import libraries and load API key from file

The API key should be the only item in a text file called `flickr_api_key.txt` located in the user's home directory. No trailing newline and don't include the "secret".

In [None]:
from pathlib import Path
import requests
import json
import csv
from time import sleep
import webbrowser

# define some canned functions we need to use

# write a list of dictionaries to a CSV file
def write_dicts_to_csv(table, filename, fieldnames):
    with open(filename, 'w', newline='', encoding='utf-8') as csv_file_object:
        writer = csv.DictWriter(csv_file_object, fieldnames=fieldnames)
        writer.writeheader()
        for row in table:
            writer.writerow(row)

home = str(Path.home()) #gets path to home directory; supposed to work for Win and Mac
key_filename = 'flickr_api_key.txt'
api_key_path = home + '/' + key_filename

try:
    with open(api_key_path, 'rt', encoding='utf-8') as file_object:
        api_key = file_object.read()
        # print(api_key) # delete this line once the script is working; don't want the key as part of the notebook
except:
    print(key_filename + ' file not found - is it in your home directory?')

## Make a test API call to the account

We need to know the user ID. Go to flickr.com, and search for vutheatre. The result is https://www.flickr.com/photos/123262983@N05 which tells us that the ID is 123262983@N05 . There are a lot of kinds of searches we can do. A list is [here](https://www.flickr.com/services/api/).  Let's try `flickr.people.getPhotos` (described [here](https://www.flickr.com/services/api/flickr.people.getPhotos.html)).  This method doesn't actually get the photos; it gets metadata about the photos for an account.

The main purpose of this query is to find out the number of photos that are available so that we can know how to set up the next part. The number of photos is in `['photos']['total']`, so we can extract that from the response data.

In [None]:
user_id = '123262983@N05' # vutheatre's ID
endpoint_url = 'https://www.flickr.com/services/rest'
method = 'flickr.people.getPhotos'
filename = 'theatre-metadata.csv'

param_dict = {
    'method' : method,
#    'tags' : 'kangaroo',
#    'extras' : 'url_o',
    'per_page' : '1',  # default is 100, maximum is 500. Use paging to retrieve more than 500.
    'page' : '1',
    'user_id' : user_id,
    'oauth_consumer_key' : api_key,
    'nojsoncallback' : '1', # this parameter causes the API to return actual JSON instead of its weird default string
    'format' : 'json' # overrides the default XML serialization for the search results
    }

metadata_response = requests.get(endpoint_url, params = param_dict)

# print(metadata_response.url) # uncomment this if testing is needed, again don't reveal key in notebook
data = metadata_response.json()
print(json.dumps(data, indent=4))
print()

number_photos = int(data['photos']['total']) # need to convert string to number
print('Number of photos: ', number_photos)

## Test to see what kinds of useful metadata we can get

The instructions for the [method](https://www.flickr.com/services/api/flickr.people.getPhotos.html) says what kinds of "extras" you can request metadata about. Let's ask for everything that we care about and don't already know: 

`description,license,original_format,date_taken,original_format,geo,tags,machine_tags,media,url_t,url_o`

`url_t` is the URL for a thumbnail of the image and `url_o` is the URL to retrieve the original photo. The dimensions of these images will be given automatically when we request the URLs, so we don't need `o_dims`. There isn't any place to request the title, since it's automatically returned.

In [None]:
param_dict = {
    'method' : method,
    'extras' : 'description,license,original_format,date_taken,original_format,geo,tags,machine_tags,media,url_t,url_o',
    'per_page' : '1',  # default is 100, maximum is 500. Use paging to retrieve more than 500.
    'page' : '1',
    'user_id' : user_id,
    'oauth_consumer_key' : api_key,
    'nojsoncallback' : '1', # this parameter causes the API to return actual JSON instead of its weird default string
    'format' : 'json' # overrides the default XML serialization for the search results
    }

metadata_response = requests.get(endpoint_url, params = param_dict)
# print(metadata_response.url) # uncomment this if testing is needed, again don't reveal key in notebook

data = metadata_response.json()
print(json.dumps(data, indent=4))
print()

## Create and test the function to extract the data we want



In [None]:
def extract_data(photo_number, data):
    dictionary = {} # create an empty dictionary

    # load the response data into a dictionary
    dictionary['id'] = data['photos']['photo'][photo_number]['id']
    dictionary['title'] = data['photos']['photo'][photo_number]['title']
    dictionary['license'] = data['photos']['photo'][photo_number]['license']
    dictionary['description'] = data['photos']['photo'][photo_number]['description']['_content']

    # convert the stupid date format to ISO 8601 dateTime; don't know the time zone - maybe add later?
    temp_time = data['photos']['photo'][photo_number]['datetaken']
    dictionary['date_taken'] = temp_time.replace(' ', 'T')

    dictionary['tags'] = data['photos']['photo'][photo_number]['tags']
    dictionary['machine_tags'] = data['photos']['photo'][photo_number]['machine_tags']
    dictionary['original_format'] = data['photos']['photo'][photo_number]['originalformat']
    dictionary['latitude'] = data['photos']['photo'][photo_number]['latitude']
    dictionary['longitude'] = data['photos']['photo'][photo_number]['longitude']
    dictionary['thumbnail_url'] = data['photos']['photo'][photo_number]['url_t']
    dictionary['original_url'] = data['photos']['photo'][photo_number]['url_o']
    dictionary['original_height'] = data['photos']['photo'][photo_number]['height_o']
    dictionary['original_width'] = data['photos']['photo'][photo_number]['width_o']
    
    return dictionary

# test the function with a single row
table = []

photo_number = 0
photo_dictionary = extract_data(photo_number, data)
table.append(photo_dictionary)

# write the data to a file
fieldnames = photo_dictionary.keys() # use the keys from the last dictionary for column headers; assume all are the same
write_dicts_to_csv(table, filename, fieldnames)

print('Done')

## Create the loops to do the paging

Flickr limits the number of photos that can be requested to 500. Since we have more than that, we need to request the data 500 photos at a time.

In [None]:
per_page = 5   # use 500 for full download, use smaller number like 5 for testing
pages = number_photos // per_page   # the // operator returns the integer part of the division ("floor")
table = []

#for page_number in range(0, pages + 1):  # need to add one to get the final partial page
for page_number in range(0, 1):  # use this to do only one page for testing
    print('retrieving page ', page_number + 1)
    page_string = str(page_number + 1)
    param_dict = {
        'method' : method,
        'extras' : 'description,license,original_format,date_taken,original_format,geo,tags,machine_tags,media,url_t,url_o',
        'per_page' : str(per_page),  # default is 100, maximum is 500.
        'page' : page_string,
        'user_id' : user_id,
        'oauth_consumer_key' : api_key,
        'nojsoncallback' : '1', # this parameter causes the API to return actual JSON instead of its weird default string
        'format' : 'json' # overrides the default XML serialization for the search results
        }
    metadata_response = requests.get(endpoint_url, params = param_dict)
    data = metadata_response.json()
#    print(json.dumps(data, indent=4))  # uncomment this line for testing
    
    # data['photos']['photo'] is the number of photos for which data was returned
    for image_number in range(0, len(data['photos']['photo'])):
        photo_dictionary = extract_data(image_number, data)
        table.append(photo_dictionary)

    # write the data to a file
    # We could just do this for all the data at the end.
    # But if the search fails in the middle, we will at least get partial results
    fieldnames = photo_dictionary.keys() # use the keys from the last dictionary for column headers; assume all are the same
    write_dicts_to_csv(table, filename, fieldnames)

    sleep(1) # wait a second to avoid getting blocked for hitting the API to rapidly

print('Done')