# CSV + API

In this reboot, we are going to use:

- The [Goodreads books](https://www.kaggle.com/jealousleopard/goodreadsbooks) dataset from Kaggle.
- The [Open Library Books API](https://openlibrary.org/dev/docs/api/books)

The goal of this livecode is to load the data from a CSV + loop over rows to enrich each row with information such as:

- List of subjects (Science, Humor, Travel, etc.)
- The cover URL of the book
- Other information you'd find useful in the JSON API

First, download the CSV in the local folder:

In [2]:
!curl -L https://gist.githubusercontent.com/ssaunier/351b17f5a7a009808b60aeacd1f4a036/raw/books.csv > books.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

100 1509k  100 1509k    0     0  1194k      0  0:00:01  0:00:01 --:--:-- 1194k


In [3]:
!ls -lh

total 1.5M
-rw-r--r-- 1 branchedelac branchedelac 1.5M Jan 23 16:43 books.csv
-rw-r--r-- 1 branchedelac branchedelac  579 Nov 29  2022 README.md
-rw-r--r-- 1 branchedelac branchedelac 2.8K Nov 29  2022 Recap.ipynb


Then import the usual suspects!

In [4]:
import requests
import pandas as pd
import numpy as np

## Load books from CSV

In [5]:
books = pd.read_csv("books.csv")
display(books.info())
display(books.isnull().sum())
books.head(2)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13719 entries, 0 to 13718
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   bookID              13719 non-null  int64  
 1   title               13719 non-null  object 
 2   authors             13719 non-null  object 
 3   average_rating      13719 non-null  float64
 4   isbn                13719 non-null  object 
 5   isbn13              13719 non-null  int64  
 6   language_code       13719 non-null  object 
 7   # num_pages         13719 non-null  int64  
 8   ratings_count       13719 non-null  int64  
 9   text_reviews_count  13719 non-null  int64  
dtypes: float64(1), int64(5), object(4)
memory usage: 1.0+ MB


None

bookID                0
title                 0
authors               0
average_rating        0
isbn                  0
isbn13                0
language_code         0
# num_pages           0
ratings_count         0
text_reviews_count    0
dtype: int64

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,439785960,9780439785969,eng,652,1944099,26249
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,439358078,9780439358071,eng,870,1996446,27613


Let's add a new column

In [6]:
books["isbn"].nunique()

13719

In [7]:
books["language_code"].value_counts().head(20)

eng      10598
en-US     1700
spa        419
en-GB      341
ger        238
fre        209
jpn         64
por         27
mul         21
ita         19
zho         16
grc         12
en-CA        9
rus          7
nl           7
swe          6
glg          4
enm          3
lat          3
tur          3
Name: language_code, dtype: int64

In [12]:
books[books["# num_pages"] == 6576]

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
7824,24520,The Complete Aubrey/Maturin Novels (5 Volumes),Patrick O'Brian,4.7,039306011X,9780393060119,eng,6576,1287,82


In [21]:
books[["authors", "# num_pages"]].groupby("authors").mean().sort_values("# num_pages", ascending=False).head(20)

Unnamed: 0_level_0,# num_pages
authors,Unnamed: 1_level_1
Winston S. Churchill-John Keegan,4736.0
Marcel Proust-C.K. Scott Moncrieff-Andreas Mayor-Terence Kilmartin-D.J. Enright-Richard Howard,4211.0
M.H. Abrams-Stephen Greenblatt-James Noggle-James Simpson-Jon Stallworthy-Jack Stillinger-Carol T. Christ-Lawrence Lipking-Jahan Ramazani-George M. Logan-Alfred David-Katharine Eisaman Maus-Barbara Kiefer Lewalski-Deidre Shauna Lynch-Catherine Robson,3956.0
Marcel Proust-C.K. Scott Moncrieff-Frederick A. Blossom-Joseph Wood Crutch,3400.0
M.H. Abrams-Stephen Greenblatt-James Simpson-Jon Stallworthy-Katharine Eisaman Maus-Jack Stillinger-Barbara Kiefer Lewalski-Lawrence Lipking-Jahan Ramazani-Alfred David-Carol T. Christ-Deidre Shauna Lynch-Catherine Robson-James Noggle-George M. Logan,3072.0
M.H. Abrams-Stephen Greenblatt,3072.0
Judith Tanka-Max Franklin-Arnold Krupat-Philip F. Gura-Jerome Klinkowitz-Ronald Gottesman,2930.0
Dennis L. Kasper-Dan L. Longo-Stephen L. Hauser-Anthony S. Fauci-Eugene Braunwald,2751.0
Sarah N. Lawall-Heather James-William G. Thalmann-Patricia Meyer Spacks-Lee Patterson,2704.0
Vincent B. Leitch-William E. Cain-Laurie A. Finke-John P. McGowan-Barbara Johnson-Jeffrey L. Williams,2662.0


## API - Open Library

In [9]:
# YOUR CODE HERE

## Calling the API with multiple ISBNs at a time

In [10]:
# YOUR CODE HERE