# CSV + API

In this reboot, we are going to use:

- The [Goodreads books](https://www.kaggle.com/jealousleopard/goodreadsbooks) dataset from Kaggle.
- The [Open Library Books API](https://openlibrary.org/dev/docs/api/books)

The goal of this livecode is to load the data from a CSV + loop over rows to enrich each row with information such as:

- List of subjects (Science, Humor, Travel, etc.)
- The cover URL of the book
- Other information you'd find useful in the JSON API

First, download the CSV in the local folder:

In [1]:
!curl -L https://gist.githubusercontent.com/ssaunier/351b17f5a7a009808b60aeacd1f4a036/raw/books.csv > books.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1509k  100 1509k    0     0  6260k      0 --:--:-- --:--:-- --:--:-- 6288k


In [2]:
!ls -lh

total 1.5M
-rwxrwxrwx 1 thomas thomas 1.5M Apr 22 17:05 books.csv
-rwxrwxrwx 1 thomas thomas 3.1K Apr 22 17:03 Recap-Data-Sourcing-Pandas-batch-1992.ipynb


Then import the usual suspects!

In [3]:
import requests
import pandas as pd
import numpy as np

## Load books from CSV

In [4]:
# YOUR CODE HERE
df = pd.read_csv('books.csv')
df.head(5)

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,0439785960,9780439785969,eng,652,1944099,26249
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,0439358078,9780439358071,eng,870,1996446,27613
2,3,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,0439554934,9780439554930,eng,320,5629932,70390
3,4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.41,0439554896,9780439554893,eng,352,6267,272
4,5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,4.55,043965548X,9780439655484,eng,435,2149872,33964


Let's add a new column

In [5]:
# YOUR CODE HERE
df.loc[:, 'thumbnail'] = ''
df.head(3)

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count,thumbnail
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,439785960,9780439785969,eng,652,1944099,26249,
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,439358078,9780439358071,eng,870,1996446,27613,
2,3,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,439554934,9780439554930,eng,320,5629932,70390,


## API - Open Library

In [23]:
# YOUR CODE HERE
def get_info(isbn):
    response = requests.get("https://openlibrary.org/api/books", params={
        'bibkeys': f'ISBN:{isbn}',
        'format': 'json'
    })
    
    print(f'Pulling URL: {response.url}')
    data = response.json()

    if not data:
        return 'NOT FOUND'
    
    return data.get(f'ISBN:{isbn}', {}).get('thumbnail_url', '')

get_info('0451526538')

Pulling URL: https://openlibrary.org/api/books?bibkeys=ISBN%3A0451526538&format=json


'https://covers.openlibrary.org/b/id/11403183-S.jpg'

In [29]:
for index, row in df.head(15).iterrows():

    if row['thumbnail'] != '':
        print(f"{index}. Skipping. Exists")
        continue
        
    df.loc[index, 'thumbnail'] = get_info(row['isbn'])
df.head()

0. Skipping. Exists
1. Skipping. Exists
2. Skipping. Exists
3. Skipping. Exists
4. Skipping. Exists
5. Skipping. Exists
6. Skipping. Exists
7. Skipping. Exists
8. Skipping. Exists
9. Skipping. Exists
10. Skipping. Exists
11. Skipping. Exists
12. Skipping. Exists
13. Skipping. Exists
14. Skipping. Exists


Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count,thumbnail
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,0439785960,9780439785969,eng,652,1944099,26249,https://covers.openlibrary.org/b/id/14860369-S...
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,0439358078,9780439358071,eng,870,1996446,27613,https://covers.openlibrary.org/b/id/14656833-S...
2,3,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,0439554934,9780439554930,eng,320,5629932,70390,https://covers.openlibrary.org/b/id/7572543-S.jpg
3,4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.41,0439554896,9780439554893,eng,352,6267,272,https://covers.openlibrary.org/b/id/10301720-S...
4,5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,4.55,043965548X,9780439655484,eng,435,2149872,33964,https://covers.openlibrary.org/b/id/8778528-S.jpg


## Calling the API with multiple ISBNs at a time

In [35]:
",".join( f'ISBN:{isbn}' for isbn in ['A','B','C'])

'ISBN:A,ISBN:B,ISBN:C'

In [52]:
def get_info_multiple(isbn_list: list):

    bibkeys = ",".join([f'ISBN:{isbn}' for isbn in isbn_list])
    response = requests.get("https://openlibrary.org/api/books", params={
        'bibkeys': bibkeys,
        'format': 'json'
    })
    
    print(f'Pulling URL: {response.url}')
    data = response.json()

    if not data:
        return 'NOT FOUND'
    
    out = {}
    for i, row in data.items():
        out[i.replace("ISBN:", "")] = row.get('thumbnail_url', 'not found')
    return out

get_info_multiple(['0451526538', '0439785960'])

Pulling URL: https://openlibrary.org/api/books?bibkeys=ISBN%3A0451526538%2CISBN%3A0439785960&format=json


{'0451526538': 'https://covers.openlibrary.org/b/id/11403183-S.jpg',
 '0439785960': 'https://covers.openlibrary.org/b/id/14860369-S.jpg'}

In [42]:
df = df.head(100)

In [45]:
df.set_index('isbn', inplace=True)
df.head()

Unnamed: 0_level_0,bookID,title,authors,average_rating,isbn13,language_code,# num_pages,ratings_count,text_reviews_count,thumbnail
isbn,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0439785960,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,9780439785969,eng,652,1944099,26249,https://covers.openlibrary.org/b/id/14860369-S...
0439358078,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,9780439358071,eng,870,1996446,27613,https://covers.openlibrary.org/b/id/14656833-S...
0439554934,3,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,9780439554930,eng,320,5629932,70390,https://covers.openlibrary.org/b/id/7572543-S.jpg
0439554896,4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.41,9780439554893,eng,352,6267,272,https://covers.openlibrary.org/b/id/10301720-S...
043965548X,5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,4.55,9780439655484,eng,435,2149872,33964,https://covers.openlibrary.org/b/id/8778528-S.jpg


In [54]:
for group in np.array_split(df, 5): #5 x 20 rows
    isbn_list = group.index
    infos = get_info_multiple(isbn_list)
    for isbn, thumbnail in infos.items():
        df.loc[isbn, 'thumbnail'] = thumbnail

Pulling URL: https://openlibrary.org/api/books?bibkeys=ISBN%3A0439785960%2CISBN%3A0439358078%2CISBN%3A0439554934%2CISBN%3A0439554896%2CISBN%3A043965548X%2CISBN%3A0439682584%2CISBN%3A0976540606%2CISBN%3A0439827604%2CISBN%3A0517226952%2CISBN%3A0345453743%2CISBN%3A1400052920%2CISBN%3A0739322206%2CISBN%3A0517149257%2CISBN%3A076790818X%2CISBN%3A0767915062%2CISBN%3A0767910435%2CISBN%3A0767903862%2CISBN%3A076790382X%2CISBN%3A0060920084%2CISBN%3A0380713802&format=json
Pulling URL: https://openlibrary.org/api/books?bibkeys=ISBN%3A0380727501%2CISBN%3A0380715430%2CISBN%3A0345538374%2CISBN%3A0618517650%2CISBN%3A0618346244%2CISBN%3A0618346252%2CISBN%3A0618260587%2CISBN%3A0618391002%2CISBN%3A0618510826%2CISBN%3A0618153977%2CISBN%3A193337201X%2CISBN%3A097669400X%2CISBN%3A0689840926%2CISBN%3A1557344493%2CISBN%3A0385326505%2CISBN%3A1575606240%2CISBN%3A1595580271%2CISBN%3A1595962808%2CISBN%3A0670059676%2CISBN%3A0141312629&format=json
Pulling URL: https://openlibrary.org/api/books?bibkeys=ISBN%3A05953218

In [55]:
df

Unnamed: 0_level_0,bookID,title,authors,average_rating,isbn13,language_code,# num_pages,ratings_count,text_reviews_count,thumbnail
isbn,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0439785960,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,9780439785969,eng,652,1944099,26249,https://covers.openlibrary.org/b/id/14860369-S...
0439358078,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,9780439358071,eng,870,1996446,27613,https://covers.openlibrary.org/b/id/14656833-S...
0439554934,3,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,9780439554930,eng,320,5629932,70390,https://covers.openlibrary.org/b/id/7572543-S.jpg
0439554896,4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.41,9780439554893,eng,352,6267,272,https://covers.openlibrary.org/b/id/10301720-S...
043965548X,5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,4.55,9780439655484,eng,435,2149872,33964,https://covers.openlibrary.org/b/id/8778528-S.jpg
...,...,...,...,...,...,...,...,...,...,...
0451528611,152,Anna Karenina,Leo Tolstoy-David Magarshack-Priscilla Meyer,4.04,9780451528612,eng,960,108970,5694,https://covers.openlibrary.org/b/id/295745-S.jpg
0140449175,153,Anna Karenina,Leo Tolstoy-Richard Pevear-Larissa Volokhonsky...,4.04,9780140449174,eng,837,2835,300,not found
0822001837,154,CliffsNotes on Tolstoy's Anna Karenina,Marianne Sturman-Leo Tolstoy,3.89,9780822001836,eng,80,15,3,not found
1593080271,155,Anna Karenina,Leo Tolstoy-Amy Mandelker-Constance Garnett,4.04,9781593080273,eng,803,9362,710,https://covers.openlibrary.org/b/id/869620-S.jpg
