# CSV + API

In this reboot, we are going to use:

- The [Goodreads books](https://www.kaggle.com/jealousleopard/goodreadsbooks) dataset from Kaggle.
- The [Open Library Books API](https://openlibrary.org/dev/docs/api/books)

The goal of this livecode is to load the data from a CSV + loop over rows to enrich each row with information such as:

- List of subjects (Science, Humor, Travel, etc.)
- The cover URL of the book
- Other information you'd find useful in the JSON API

First, download the CSV in the local folder:

In [1]:
!curl -L https://gist.githubusercontent.com/ssaunier/351b17f5a7a009808b60aeacd1f4a036/raw/books.csv > books.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1509k  100 1509k    0     0  4691k      0 --:--:-- --:--:-- --:--:-- 4687k


In [2]:
!ls -lh

total 1.5M
-rw-r--r-- 1 victor victor 1.5M Mar 10 21:53 books.csv
-rw-r--r-- 1 victor victor  15K Mar 10 21:52 Recap.ipynb
-rw-r--r-- 1 victor victor 7.2K Mar 10 21:41 Recap_Solution.ipynb


Then import the usual suspects!

In [3]:
# your turn!
import requests
import pandas as pd
import numpy as np

In [4]:
file = 'books.csv'
books_df = pd.read_csv(file, delimiter=",")
books_df = books_df.drop(columns=['bookID', 'isbn', 'average_rating', 'language_code', 'ratings_count', 'text_reviews_count'])

books_df

Unnamed: 0,title,authors,isbn13,# num_pages
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,9780439554893,352
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,9780439655484,435
...,...,...,...,...
13714,M Is for Magic,Neil Gaiman-Teddy Kristiansen,9780061186424,260
13715,Black Orchid,Neil Gaiman-Dave McKean,9780930289553,160
13716,InterWorld (InterWorld #1),Neil Gaiman-Michael Reaves,9780061238963,239
13717,The Faeries' Oracle,Brian Froud-Jessica Macbeth,9780743201117,224


In [5]:
books_df.dtypes

title          object
authors        object
isbn13          int64
# num_pages     int64
dtype: object

In [6]:
books_df['cover_url'] = None
books_df.head()

Unnamed: 0,title,authors,isbn13,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870,
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320,
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,9780439554893,352,
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,9780439655484,435,


In [7]:
def fetch_book(isbn):
    url = 'https://openlibrary.org/api/books'

    response = requests.get(url, params = {'bibkeys': f'ISBN:{isbn}','format': 'json','jscmd': 'data'}).json()
    
    if f'ISBN:{isbn}' in response:
        return response[f'ISBN:{isbn}']
    else:
        return None

In [8]:
%%time

for index, row in books_df.head(10).iterrows():
    if row['cover_url'] is None:
        isbn = row['isbn13']
        print(f"Fetching cover for {row['title']}")
        
        book = fetch_book(isbn)
        
        if book:
            cover_url = book.get('cover', {}).get('large', '')
            books_df.loc[index, 'cover_url'] = cover_url
        else:
            books_df.loc[index, 'cover_url'] = ''

Fetching cover for Harry Potter and the Half-Blood Prince (Harry Potter  #6)
Fetching cover for Harry Potter and the Order of the Phoenix (Harry Potter  #5)
Fetching cover for Harry Potter and the Sorcerer's Stone (Harry Potter  #1)
Fetching cover for Harry Potter and the Chamber of Secrets (Harry Potter  #2)
Fetching cover for Harry Potter and the Prisoner of Azkaban (Harry Potter  #3)
Fetching cover for Harry Potter Boxed Set  Books 1-5 (Harry Potter  #1-5)
Fetching cover for Unauthorized Harry Potter Book Seven News: "Half-Blood Prince" Analysis and Speculation
Fetching cover for Harry Potter Collection (Harry Potter  #1-6)
Fetching cover for The Ultimate Hitchhiker's Guide: Five Complete Novels and One Story (Hitchhiker's Guide to the Galaxy  #1-5)
Fetching cover for The Ultimate Hitchhiker's Guide to the Galaxy
CPU times: user 399 ms, sys: 22.2 ms, total: 421 ms
Wall time: 6.22 s


In [10]:
books_df.head()

Unnamed: 0,title,authors,isbn13,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,https://covers.openlibrary.org/b/id/9326654-L.jpg
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870,https://covers.openlibrary.org/b/id/12025650-L...
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320,https://covers.openlibrary.org/b/id/7572543-L.jpg
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,9780439554893,352,https://covers.openlibrary.org/b/id/10301720-L...
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,9780439655484,435,https://covers.openlibrary.org/b/id/8778528-L.jpg


In [11]:
isbns = [9780439785969, 9780439358071, 9780439554930]

['ISBN:9780439785969', 'ISBN:9780439358071', 'ISBN:9780439554930']

In [15]:
def fetch_books(isbns):
    url = "https://openlibrary.org/api/books"
    bibkeys = ",".join([f"ISBN:{isbn}" for isbn in isbns])

    response = requests.get(url, params={'bibkeys': bibkeys,'format': 'json','jscmd': 'data'}).json()
    
    return response

In [16]:
books_df.set_index("isbn13", inplace=True)

In [17]:
books_df.head()

Unnamed: 0_level_0,title,authors,# num_pages,cover_url
isbn13,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
9780439785969,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,652,https://covers.openlibrary.org/b/id/9326654-L.jpg
9780439358071,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,870,https://covers.openlibrary.org/b/id/12025650-L...
9780439554930,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,320,https://covers.openlibrary.org/b/id/7572543-L.jpg
9780439554893,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,352,https://covers.openlibrary.org/b/id/10301720-L...
9780439655484,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,435,https://covers.openlibrary.org/b/id/8778528-L.jpg


In [19]:
%%time

from tqdm import tqdm

for group in tqdm(np.array_split(books_df.head(100), 5)):
    books = fetch_books(list(group.index))
    
    for isbn_code, book in books.items():
        isbn = int(isbn_code.strip("ISBN:"))
        books_df.loc[isbn, "cover_url"] = book.get("cover", {}).get("large", "")

100%|█████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:10<00:00,  2.04s/it]

CPU times: user 239 ms, sys: 8.32 ms, total: 248 ms
Wall time: 10.2 s



