# Data on bestselling books via the Times API 
The New York Times makes its data available via API calls. The instructions we will be following in this notebook can be found in their developer portal, which they call [The New York Times Developer Network](https://developer.nytimes.com/). Its slogan is,

> All the APIs Fit to POST"

Before proceeding further in this notebook, follow the instructions in the [getting started](https://developer.nytimes.com/get-started) section of the developer network linked above.

## Getting started
After you sign in, you will need to create an "App". You are not actually creating an application here, rather, you are creating an API key that you would use in an application. As we will use the API key in a Jupyter notebook, for us, the "App" in this case is actually this notebook.

For example, I named mine `notebook` and described it as a `JupyterNotebook`.

![My Apps](https://storage.googleapis.com/jacobson-dsc-bucket1/TimesAPINotebook/Screenshot%202020-03-09%20at%2011.56.20%20AM%20-%20Display%201.png)

### Which APIs?
Next, activate at least one API. I suggest you activate all of them, since we will want to experiment with them all. Use the toggles to activate an API.

![Activate APIs](https://storage.googleapis.com/jacobson-dsc-bucket1/TimesAPINotebook/Screenshot%202020-03-09%20at%2012.06.24%20PM%20-%20Display%201.png)

### API key and `getpass`
Below, we are using a python package called getpass. On the first line of the code cell below, `import getpass` will load all of the functions that are available in the getpass package into memory on our notebook's machine. 

Then, we use the function `getpass()` by referring to where it is located with `getpass.getpass()` and storing the output of that function in a variable we are calling `APIKEY`.

##### Why are we doing this?
If you paste your key into this notebook as a variable directly, for instance with a line of code like

`APIKEY = A3sw3H67rDhN`

and then save your notebook in your GitHub repo, or anywhere publically available, webscraping bots will find it and can use it for malicious purposes. Of course, with this particular API, the worst that can happen is that your quota of free API calls to the NYT is used up. However, if you had a paid account that could be a more serious issue.

The solution is to copy the API key,then execute the cell below and paste it into the input box that appears. In this way, you can save your notebook without the key being saved with it. 

In [None]:
import getpass
APIKEY = getpass.getpass()

Now, if you type in a code cell `APIKEY` and execute the cell, you should see as output the *Key* that you copied from the NYT developer network. Try it in the cell below. Then delete the output as you don't want the key to be saved in the notebook.

In [None]:
APIKEY

## Making API calls using the command line.
Recall from the Qwiklab [Introduction to APIs in Google](https://www.qwiklabs.com/focuses/3473?parent=catalog), we used the command line and the curl to make API requests. 

### Books review service API
Let's start by making a GET request.

Note, any command that you used in the GCP cloud shell, you can use in a code cell here. Indeed, in a Jupyter Notebook, you execute shell commands by placing them in a code cell with `!` in front of the command. 

The first example demonstrates a simple curl command that makes a GET request for the current bestsellers in the hardcover fiction category.

In [None]:
!curl --request GET "https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json?api-key=$APIKEY"

### Viewing the JSON
There are Python based tools for viewing the JSON, but first let's explore our request using a [browser-based viewer](https://jsonformatter.curiousconcept.com/).

Copy and past the output above into the viewer.

Recall that a JSON consists of key-value pairs. What are some of the keys and their values? Under what key do the find the actual information on the bestselling books?

### Saving and filtering the results of a request
The output is in json format, but it is really hard to read. Let's save it to another file. Then we will use another tool to filter the json file down to the information we are interested in.
#### Saving output
To save the output, we add another argument to the curl command, `-o`, and then after it, we write a name for the file. Below you can see that I chose the name `current-hardcover-fiction.json` so that the filename will remind me of what I put there.

After executing the cell below, the output of our request will be stored in that file.


In [None]:
!curl --request GET -o current-hardcover-fiction.json "https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json?api-key=$APIKEY"

Now let's check and see with the `ls` command that a new json file exists. Note, `ls` is short for *list* as it lists all files and folders in your current folder of the notebook's machine.

In [None]:
!ls

## Viewing and filtering the JSON with `jq`
What is [`jq`](https://stedolan.github.io/jq/)? It is a command line tool for  slicing, filtering, transforming JSON data. 

First we install it with the cell below.

In [None]:
!sudo apt-get install jq

Next, we *redirect* our file to `jq` with the `<` symbol.  

In [None]:
!jq < current-hardcover-fiction.json

### Filtering using `jq`
To filter based on key, use `jq '.key'`, where `.key` is one of the keys from the JSON file, and `jq` will return the corresponding values in the JSON. For example, we filter on the key `num_results` in the cell below.

In [None]:
!jq '.num_results' < current-hardcover-fiction.json

Next, let's look at the values for the key, `results`.

In [None]:
!jq '.results' < current-hardcover-fiction.json

Let's dig deeper and look at the values for the key `books`. Since `books` is a key that is within the values of the key `results`, we will *pipe*, using the `|` symbol, the output of the `.results` filter to the `.books` filter. 

In [None]:
!jq '.results | .books' < current-hardcover-fiction.json

Let's get a really precise answer to the following question using this API.

## How long do books stay on the bestsellers list?
As we saw above, we can extract the values for the key `weeks_on_list` from the values in books.

Next, we will arrange these values into a csv file and answer some questions using spreadsheets.

In [None]:
!jq '.results | .books' < current-hardcover-fiction.json > bestsellers.json

What was going on in the cell above? There are two redirections. The first we had seen before. Then, the JSON ouput from `jq '.results | .books' < current-hardcover-fiction.json` gets redirected to a new file that I have chosen to call `bestsellers.json` and saved. Let's check that it was created by listing the files and directories.

In [None]:
!ls

# From JSON to csv
For working with structured data in notebooks, the most popular and full-featured packages is `pandas`.  Some say that the name is short for *panel data* which you might have heard of in an econometrics class, and others say it's the favorite animal of [Wes McKinney](https://en.wikipedia.org/wiki/Wes_McKinney), its creator. All you need to know is (at least at the moment) that it can tranform our JSON into a csv file.

First we import the pandas package. It is a common convention to import it under the *alias* `pd` so that you do not need to type pandas over and over again when referring back to the package name. 

In [None]:
import pandas as pd

Then, we use the `read_json()` function in pandas to transform our filtered JSON into a *dataframe*.

In [None]:
pd.read_json('bestsellers.json')

Great, now what?

First, let's save the dataframe as a Python variable, which we will call `df`, so that we can use it for more purposes than just viewing.

In [None]:
df = pd.read_json('bestsellers.json')



A dataframe consists of a *header*, where you find the names of the *columns*, *rows* where you find the values in those columns, and an *index* where you can find the row number. So this includes the information you find in a csv. Indeed, we can write this to a .csv file with the following pandas function.

In [None]:
df.to_csv('bestsellers.csv')

Let's check that the file was created. 

In [None]:
!ls