## Google Books API
The goal of this notebook is to turn the raw information obtained from the google books API into information in the following format:
```
{
  googleId: STRING
  title: STRING
  authors: STRING
  publisher: STRING
  publishedDate: STRING
  description: STRING
  isbn10: STRING
  isbn13: STRING
  pageCount: INTEGER
  categories: STRING
  thumbnail: STRING
  smallThumbnail: STRING
  language: STRING
  webReaderLink: STRING
  textSnippet: STRING
  isEbook: BOOLEAN
}
```

In [1]:
import requests
import json
import pickle
import pandas as pd
import numpy as np
import time

## Obtain sample information from request:

In [2]:
from secrets import GOOGLE_KEY

In [3]:
response = requests.get('https://www.googleapis.com/books/v1/volumes?q=flowers+inauthor:keyes&key='+GOOGLE_KEY)

In [12]:
sample_api_result = json.loads(response.text)

In [14]:
sample_api_result

{'kind': 'books#volumes',
 'totalItems': 113,
 'items': [{'kind': 'books#volume',
   'id': 'gK98gXR8onwC',
   'etag': 'ym8oWidtoE0',
   'selfLink': 'https://www.googleapis.com/books/v1/volumes/gK98gXR8onwC',
   'volumeInfo': {'title': 'Flowers for Algernon',
    'subtitle': 'A One-act Play',
    'authors': ['David Rogers', 'Daniel Keyes'],
    'publisher': 'Dramatic Publishing',
    'publishedDate': '1969',
    'industryIdentifiers': [{'type': 'ISBN_10', 'identifier': '0871293870'},
     {'type': 'ISBN_13', 'identifier': '9780871293879'}],
    'readingModes': {'text': False, 'image': True},
    'pageCount': 32,
    'printType': 'BOOK',
    'averageRating': 5,
    'ratingsCount': 1,
    'maturityRating': 'NOT_MATURE',
    'allowAnonLogging': False,
    'contentVersion': '0.0.2.0.preview.1',
    'panelizationSummary': {'containsEpubBubbles': False,
     'containsImageBubbles': False},
    'imageLinks': {'smallThumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=fro

### The following is template code to save and load a reference object. 

In [103]:
with open('example_google_books_return.obj','ab') as f:
    pickle.dump(sample_api_result,f)

In [104]:
with open('example_google_books_return.obj','rb') as f:
    test = pickle.load(f)

In [105]:
test

{'kind': 'books#volumes',
 'totalItems': 113,
 'items': [{'kind': 'books#volume',
   'id': 'gK98gXR8onwC',
   'etag': 'ym8oWidtoE0',
   'selfLink': 'https://www.googleapis.com/books/v1/volumes/gK98gXR8onwC',
   'volumeInfo': {'title': 'Flowers for Algernon',
    'subtitle': 'A One-act Play',
    'authors': ['David Rogers', 'Daniel Keyes'],
    'publisher': 'Dramatic Publishing',
    'publishedDate': '1969',
    'industryIdentifiers': [{'type': 'ISBN_10', 'identifier': '0871293870'},
     {'type': 'ISBN_13', 'identifier': '9780871293879'}],
    'readingModes': {'text': False, 'image': True},
    'pageCount': 32,
    'printType': 'BOOK',
    'averageRating': 5,
    'ratingsCount': 1,
    'maturityRating': 'NOT_MATURE',
    'allowAnonLogging': False,
    'contentVersion': '0.0.2.0.preview.1',
    'panelizationSummary': {'containsEpubBubbles': False,
     'containsImageBubbles': False},
    'imageLinks': {'smallThumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=fro

## Extracting necessary information to send to Web

In [23]:
# Results are in ['items'], and by default there are 10 results
print('Top level heirarchy:', sample_api_result.keys(),'\n')
print("type(sample_api_results['items']) = ", type(sample_api_result['items']))
print("len(sample_api_result['items']) = ", len(sample_api_result['items']))

Top level heirarchy: dict_keys(['kind', 'totalItems', 'items']) 

type(sample_api_results['items']) =  <class 'list'>
len(sample_api_result['items']) =  10


In [24]:
# Information stored in one record
sample_api_result['items'][0]

{'kind': 'books#volume',
 'id': 'gK98gXR8onwC',
 'etag': 'ym8oWidtoE0',
 'selfLink': 'https://www.googleapis.com/books/v1/volumes/gK98gXR8onwC',
 'volumeInfo': {'title': 'Flowers for Algernon',
  'subtitle': 'A One-act Play',
  'authors': ['David Rogers', 'Daniel Keyes'],
  'publisher': 'Dramatic Publishing',
  'publishedDate': '1969',
  'industryIdentifiers': [{'type': 'ISBN_10', 'identifier': '0871293870'},
   {'type': 'ISBN_13', 'identifier': '9780871293879'}],
  'readingModes': {'text': False, 'image': True},
  'pageCount': 32,
  'printType': 'BOOK',
  'averageRating': 5,
  'ratingsCount': 1,
  'maturityRating': 'NOT_MATURE',
  'allowAnonLogging': False,
  'contentVersion': '0.0.2.0.preview.1',
  'panelizationSummary': {'containsEpubBubbles': False,
   'containsImageBubbles': False},
  'imageLinks': {'smallThumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=5&edge=curl&source=gbs_api',
   'thumbnail': 'http://books.google.com/books/cont

In [34]:
def retreive_details(json_dict,keys_to_extract):
    """
    This is a function that searches through a dictionary, and
    retreives keys that match keys from a list. This function is 
    will find matching keys in every level of the heirarchy.
    
    Inputs:
    json_dict       -  A dictionary expected to be from json.loads() 
                       but any dictionary with JSON-like formatting
                       will do
    keys_to_extract -  A list of keys that you want to extract from the 
                       json_dict object
                       
    Output:
    new_dict        -  A new "flattened"dictionary with all the matching keys. 
                       All the keys are in the top level 
    """
    new_dict={}
    for item in json_dict.keys():
        if type(json_dict[item]) is dict:
            temp_dict = retreive_details(json_dict[item],keys_to_extract)
            new_dict.update(temp_dict)
        if item in keys_to_extract:
            new_dict[item] = json_dict[item] 
    return new_dict

In [35]:
# The details (keys) to extract from the json object
relevant_details=['id','title','authors','publisher',
                  'publishedDate','description','industryIdentifiers',
                  'pageCount','categories','thumbnail','smallThumbnail',
                  'language','webReaderLink','textSnippet','isEbook']

In [36]:
my_new_dictionary = retreive_details(sample_api_result['items'][0],relevant_details)
my_new_dictionary

{'id': 'gK98gXR8onwC',
 'title': 'Flowers for Algernon',
 'authors': ['David Rogers', 'Daniel Keyes'],
 'publisher': 'Dramatic Publishing',
 'publishedDate': '1969',
 'industryIdentifiers': [{'type': 'ISBN_10', 'identifier': '0871293870'},
  {'type': 'ISBN_13', 'identifier': '9780871293879'}],
 'pageCount': 32,
 'smallThumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=5&edge=curl&source=gbs_api',
 'thumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api',
 'language': 'en',
 'isEbook': False,
 'webReaderLink': 'http://play.google.com/books/reader?id=gK98gXR8onwC&hl=&printsec=frontcover&source=gbs_api',
 'textSnippet': '<b>FLOWERS</b> FOR ALGERNON A One-act Play For Four Men And One Woman” <br>\nCHARACTERS DR. STRAUSS . . . . . . . . . . . . . . . . . . . a young neurosurgeon <br>\nPROFESSOR NEMUR . . . . . . . . . . . . . . . . his older colleague ALICE KINNIAN&nbsp;...'}

In [37]:
# Loop over all the entries in the items list to 
# process all the books into the format that the API endpoint requires

my_dict_list = []
for item in a['items']:
    temp_dict = retreive_details(item,relevant_details)
    my_dict_list.append(temp_dict)


In [38]:
my_dict_list

[{'id': 'gK98gXR8onwC',
  'title': 'Flowers for Algernon',
  'authors': ['David Rogers', 'Daniel Keyes'],
  'publisher': 'Dramatic Publishing',
  'publishedDate': '1969',
  'industryIdentifiers': [{'type': 'ISBN_10', 'identifier': '0871293870'},
   {'type': 'ISBN_13', 'identifier': '9780871293879'}],
  'pageCount': 32,
  'smallThumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=5&edge=curl&source=gbs_api',
  'thumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api',
  'language': 'en',
  'isEbook': False,
  'webReaderLink': 'http://play.google.com/books/reader?id=gK98gXR8onwC&hl=&printsec=frontcover&source=gbs_api',
  'textSnippet': '<b>FLOWERS</b> FOR ALGERNON A One-act Play For Four Men And One Woman” <br>\nCHARACTERS DR. STRAUSS . . . . . . . . . . . . . . . . . . . a young neurosurgeon <br>\nPROFESSOR NEMUR . . . . . . . . . . . . . . . . his older colleague ALICE KINN

## Final Cleanup of information:
The function defined earlier is generic, and so reusable. The data however still does not perfectly match the data model that the Web team is expecting. The ISBN informaiton needs to be unpacked, and the id key needs to change to googleid

In [58]:
# Starting from cleaning one entry:
a = retreive_details(sample_api_result['items'][0],relevant_details)
a

{'id': 'gK98gXR8onwC',
 'title': 'Flowers for Algernon',
 'authors': ['David Rogers', 'Daniel Keyes'],
 'publisher': 'Dramatic Publishing',
 'publishedDate': '1969',
 'industryIdentifiers': [{'type': 'ISBN_10', 'identifier': '0871293870'},
  {'type': 'ISBN_13', 'identifier': '9780871293879'}],
 'pageCount': 32,
 'smallThumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=5&edge=curl&source=gbs_api',
 'thumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api',
 'language': 'en',
 'isEbook': False,
 'webReaderLink': 'http://play.google.com/books/reader?id=gK98gXR8onwC&hl=&printsec=frontcover&source=gbs_api',
 'textSnippet': '<b>FLOWERS</b> FOR ALGERNON A One-act Play For Four Men And One Woman” <br>\nCHARACTERS DR. STRAUSS . . . . . . . . . . . . . . . . . . . a young neurosurgeon <br>\nPROFESSOR NEMUR . . . . . . . . . . . . . . . . his older colleague ALICE KINNIAN&nbsp;...'}

In [50]:
my_dict = {}
for i in a['industryIdentifiers']:
    my_dict[("".join(i['type'].split('_')).lower())] = i['identifier']

In [51]:
my_dict

{'isbn10': '0871293870', 'isbn13': '9780871293879'}

In [52]:
# Add isbns to dictionary and remove the industryIdentifiers key
a.update(my_dict)
del a['industryIdentifiers']

In [55]:
# Change id to googleId
a['googleId']=a['id']
del a['id']

In [56]:
# final result
a

{'title': 'Flowers for Algernon',
 'authors': ['David Rogers', 'Daniel Keyes'],
 'publisher': 'Dramatic Publishing',
 'publishedDate': '1969',
 'pageCount': 32,
 'smallThumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=5&edge=curl&source=gbs_api',
 'thumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api',
 'language': 'en',
 'isEbook': False,
 'webReaderLink': 'http://play.google.com/books/reader?id=gK98gXR8onwC&hl=&printsec=frontcover&source=gbs_api',
 'textSnippet': '<b>FLOWERS</b> FOR ALGERNON A One-act Play For Four Men And One Woman” <br>\nCHARACTERS DR. STRAUSS . . . . . . . . . . . . . . . . . . . a young neurosurgeon <br>\nPROFESSOR NEMUR . . . . . . . . . . . . . . . . his older colleague ALICE KINNIAN&nbsp;...',
 'isbn10': '0871293870',
 'isbn13': '9780871293879',
 'googleId': 'gK98gXR8onwC'}

An initial concern was the ordering of the dictionary, but whenever any sort of dictionary-like object is accessed it is accessed through a call to the key, so order doesn't matter. Another concern is null values. The Web team has dealt with exception handling, so explicit null values are not required.

In [105]:
def clean(input_dict):
    """
    This function reformats isbns, and modifies the id name
    from a dictionary returned from retreive_details. It also 
    changes the id key to googleId to match the format that the
    API endpoint specifies.
    
    Input:
    input_dict  - a dictionary with only relevant keys.
                -
    """
    
    my_dict = {}
    
    # Change the format of the isbn data
    try:
        for i in input_dict['industryIdentifiers']:
            my_dict[("".join(i['type'].split('_')).lower())] = i['identifier']

        # Add isbns to dictionary and remove the industryIdentifiers key
        input_dict.update(my_dict)
        del input_dict['industryIdentifiers']
    except KeyError:
        pass
    
    # Change id to googleId
    
    # Try/except should not be necessary here.
    # Google Books always returns an id
    input_dict['googleId']=input_dict['id']
    del input_dict['id']
    
    return input_dict

In [106]:
# Testing out cleaning functionality
a = retreive_details(sample_api_result['items'][0],relevant_details)
b = clean(a)
b

{'title': 'Flowers for Algernon',
 'authors': ['David Rogers', 'Daniel Keyes'],
 'publisher': 'Dramatic Publishing',
 'publishedDate': '1969',
 'pageCount': 32,
 'smallThumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=5&edge=curl&source=gbs_api',
 'thumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api',
 'language': 'en',
 'isEbook': False,
 'webReaderLink': 'http://play.google.com/books/reader?id=gK98gXR8onwC&hl=&printsec=frontcover&source=gbs_api',
 'textSnippet': '<b>FLOWERS</b> FOR ALGERNON A One-act Play For Four Men And One Woman” <br>\nCHARACTERS DR. STRAUSS . . . . . . . . . . . . . . . . . . . a young neurosurgeon <br>\nPROFESSOR NEMUR . . . . . . . . . . . . . . . . his older colleague ALICE KINNIAN&nbsp;...',
 'isbn10': '0871293870',
 'isbn13': '9780871293879',
 'googleId': 'gK98gXR8onwC'}

## Putting everything into one function.


In [107]:
def process_list(list_of_entries,keys_to_extract):
    out_list = []
    for i in list_of_entries:
        parsed_dict = retreive_details(i,keys_to_extract)
        clean_dict = clean(parsed_dict)
        out_list.append(clean_dict)
    return out_list

In [108]:
response = requests.get('https://www.googleapis.com/books/v1/volumes?q=flowers+inauthor:keyes&key='+GOOGLE_KEY)

In [109]:
sample_api_result = json.loads(response.text)

In [110]:
process_list(sample_api_result['items'],relevant_details)

[{'title': 'Flowers for Algernon',
  'authors': ['David Rogers', 'Daniel Keyes'],
  'publisher': 'Dramatic Publishing',
  'publishedDate': '1969',
  'pageCount': 32,
  'smallThumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=5&edge=curl&source=gbs_api',
  'thumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api',
  'language': 'en',
  'isEbook': False,
  'webReaderLink': 'http://play.google.com/books/reader?id=gK98gXR8onwC&hl=&printsec=frontcover&source=gbs_api',
  'textSnippet': '<b>FLOWERS</b> FOR ALGERNON A One-act Play For Four Men And One Woman” <br>\nCHARACTERS DR. STRAUSS . . . . . . . . . . . . . . . . . . . a young neurosurgeon <br>\nPROFESSOR NEMUR . . . . . . . . . . . . . . . . his older colleague ALICE KINNIAN&nbsp;...',
  'isbn10': '0871293870',
  'isbn13': '9780871293879',
  'googleId': 'gK98gXR8onwC'},
 {'title': 'Flowers for Algernon',
  'authors': ['Dani

### Testing error handling for when the `industryIdentifiers` doesn't exist for an entry

In [97]:
del sample_api_result['items'][0]['volumeInfo']['industryIdentifiers']

In [98]:
# notice the lack of isbn10 and isbn13
clean(retreive_details(sample_api_result['items'][0],relevant_details))

{'title': 'Flowers for Algernon',
 'authors': ['David Rogers', 'Daniel Keyes'],
 'publisher': 'Dramatic Publishing',
 'publishedDate': '1969',
 'pageCount': 32,
 'smallThumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=5&edge=curl&source=gbs_api',
 'thumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api',
 'language': 'en',
 'isEbook': False,
 'webReaderLink': 'http://play.google.com/books/reader?id=gK98gXR8onwC&hl=&printsec=frontcover&source=gbs_api',
 'textSnippet': '<b>FLOWERS</b> FOR ALGERNON A One-act Play For Four Men And One Woman” <br>\nCHARACTERS DR. STRAUSS . . . . . . . . . . . . . . . . . . . a young neurosurgeon <br>\nPROFESSOR NEMUR . . . . . . . . . . . . . . . . his older colleague ALICE KINNIAN&nbsp;...',
 'googleId': 'gK98gXR8onwC'}

## Testing the module:

This is the minimal code needed to process the request. In the final app the code needs to be 
jsonified, butt this is good enough as proof of concept

In [112]:
import google_books_hf as hf

In [116]:
help(hf)

Help on module google_books_hf:

NAME
    google_books_hf

DESCRIPTION
    This module is a set of helper functions to process data from the google
    books API. The functions are as follows:
    
    retreive_details()  - Retreive specific keys from a dictionary object
    
    clean()             - Alter data to match data model
    
    process_list()      - Process a list of dictionaries to endpoint output format

FUNCTIONS
    clean(input_dict)
        Alter data to match data model 
        
        This function reformats isbns, and modifies the id name
        from a dictionary returned from retreive_details. It also 
        changes the id key to googleId to match the format that the
        API endpoint specifies.
        
        Input:
        input_dict  - a dictionary output from retreive_details using
                      the list of keys required for the BetterReadsDS 
                      API response.
        
        Output:
        output_dict - a dictionary read

In [118]:
from secrets import GOOGLE_KEY
response = requests.get('https://www.googleapis.com/books/v1/volumes?q=flowers+inauthor:keyes&key='+GOOGLE_KEY)
sample_api_result = json.loads(response.text)

relevant_details=['id','title','authors','publisher',
                  'publishedDate','description','industryIdentifiers',
                  'pageCount','categories','thumbnail','smallThumbnail',
                  'language','webReaderLink','textSnippet','isEbook']

In [120]:
process_list(sample_api_result['items'],relevant_details)

[{'title': 'Flowers for Algernon',
  'authors': ['David Rogers', 'Daniel Keyes'],
  'publisher': 'Dramatic Publishing',
  'publishedDate': '1969',
  'pageCount': 32,
  'smallThumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=5&edge=curl&source=gbs_api',
  'thumbnail': 'http://books.google.com/books/content?id=gK98gXR8onwC&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api',
  'language': 'en',
  'isEbook': False,
  'webReaderLink': 'http://play.google.com/books/reader?id=gK98gXR8onwC&hl=&printsec=frontcover&source=gbs_api',
  'textSnippet': '<b>FLOWERS</b> FOR ALGERNON A One-act Play For Four Men And One Woman” <br>\nCHARACTERS DR. STRAUSS . . . . . . . . . . . . . . . . . . . a young neurosurgeon <br>\nPROFESSOR NEMUR . . . . . . . . . . . . . . . . his older colleague ALICE KINNIAN&nbsp;...',
  'isbn10': '0871293870',
  'isbn13': '9780871293879',
  'googleId': 'gK98gXR8onwC'},
 {'title': 'Flowers for Algernon',
  'authors': ['Dani

## Importing ENV variables instead of secrets.py
This is simply syntax reference because jupyter notebook environment doen't have a .env file like a pipenv does. In the final product this is probably what we are going to be doing because we can load in environment variables to elastic beanstalk
```
import os
GOOGLE_KEY = os.environ['GOOGLE_KEY']
```