In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("assignment4.ipynb")

# Assignment 4: Accessing JSON Data with APIs

Many data providers publish application programming interface (API) for users to access their data. In this assignment we will use the [Addresses in the City of Los Angeles](https://data.lacity.org/City-Infrastructure-Service-Requests/Addresses-in-the-City-of-Los-Angeles/4ca8-mxuh/about_data) API.

## API Key

Many API providers want to know who is accessing their data. The way applications access services programmatically is using API authentication credentials. The terminology for the credentials may be different: e.g., app tokens, API keys, etc.

The [LA addresses API documentation](https://dev.socrata.com/foundry/data.lacity.org/4ca8-mxuh) has the link to create an API key ID/Secret pair. Click on "Sign up for an app token!" to be forwarded to a link where you can sign up and [Create New App Token](https://data.lacity.org/profile/edit/developer_settings). _Make sure to get App Token and not  API Key._

Save your App Token. _Warning: app tokens and API keys are like passwords. In practice, app tokens and API key strings should be kept private, and you don't want to publish them publicly: e.g. on a public GitHub repository. However, in this assignment, we will check if your token works._

To check if your token works, construct a query. Consider the following query that has a placeholder app token, `##DEMO_KEY##`:
```
https://data.lacity.org/resource/4ca8-mxuh.json?$$app_token=##DEMO_KEY##&hse_id=527545'.format(app_token)
```
To use your key instead of the temporary key, replace `##DEMO_KEY##` with the _your app token_ and paste it into a browser window. You should get an output similar to the following:
```
[{"hse_id":"527545","pin":"117B193    67","pind":"117B193-67","hse_nbr":"1556","hse_dir_cd":"W","str_nm":"37TH","str_sfx_cd":"ST","zip_cd":"90018","lat":"34.020859999999999","lon":"-118.30483","x_coord_nbr":"6469295.9020999996","y_coord_nbr":"1830113.4445","asgn_stts_ind":"A","eng_dist":"C","cncl_dist":"8"}]
```

Below, we will do this programmatically. Assign `app_token` to hold your app token, modify `url_str` to use your `app_token`. The subsequent line will make a query with `url_str` and assign the returned result to `check_key` variable.

In [None]:
app_token = ...

url_str = 'https://data.lacity.org/resource/4ca8-mxuh.json?\$\$app_token={}'.format(...)

# `curl` is used to retrieve from URL in `url_str` with an additional query parameter.
check_key = !curl -s "{url_str}&hse_id=527545" 
check_key

In [None]:
grader.check("my_api_key")

<!-- BEGIN QUESTION -->

## Endpoints



API endpoints are the points of interaction or communication between a client and a server in an API (Application Programming Interface). They are the routes or URLs where requests can be made and data can be accessed or manipulated.

Each endpoint corresponds to a specific function in the API. For example, in a RESTful API, you might have the following endpoints:

- `GET /public/collection/v1/search`: returns a listing of all Object IDs for objects that contain the search query within the object’s data.
  ```
  https://collectionapi.metmuseum.org/public/collection/v1/search?q=bouquet+of+sunflowers
  https://collectionapi.metmuseum.org/public/collection/v1/search?q=%22bouquet+of+sunflowers%22
  ```
- `GET /public/collection/v1/objects/[objectID]`: returns a record for an object, containing all open access data about that object, including its image (if the image is available under Open Access).
  ```
  https://collectionapi.metmuseum.org/public/collection/v1/objects/437112
  ```

Each of these endpoints represents a different function of the API, and they each correspond to a different URL where a client can make a request. The type of request (GET, POST, PUT, DELETE, etc.) and the data included in the request will determine what action is taken by the API.

In fact the two above [endpoints](https://metmuseum.github.io#endpoints) are from the [The Metropolitan Museum of Art Collection](https://www.metmuseum.org/art/collection/search) APIs. There are many other endpoints on the page. Met's APIs do not require an app token to work.

## Download `jq` for use

`jq` is a lightweight and flexible command-line JSON processor. It is like `sed` for JSON data - you can use it to slice and filter and map and transform structured data with the same ease that `sed`, `awk`, `grep` and friends let you play with text.

Applications such as `jq` are often a single executable file. `jq` can be downloaded from the project's [GitHub page](https://github.com/jqlang/jq/releases/tag/jq-1.7). Specifically, you will download the binary `jq-linux-amd64`. The downloaded file can be set as "executable" (`x`) for the user (`u`) with the `chmod` command.

Command line applications are "installed" by placing the executable file in a searchable location as defined in `PATH` environment variable. 

The `PATH` environment variable is a system variable that operating systems use to locate executables from the command line or Terminal window. `PATH` is essentially a list of directory paths, and when you type a command to run, the system looks for it in the directories specified by `PATH`.

Fill-in the code below to download `jq` and place it in a searchable directory, `/opt/conda/bin`. In other words, replace the following placeholders and remove the comment character `#`.
1. `[SEARCHABLE_DIRECTORY]`
1. `[JQ_DOWNLOAD_LINK]`

In [None]:
%%bash
# wget -q [JQ_DOWNLOAD_LINK] -O [SEARCHABLE_DIRECTORY]/jq
# chmod u+x [SEARCHABLE_DIRECTORY]/jq
echo "Location of installed jq: $(which jq)"
echo "Installed jq version: $(jq --version)"

<!-- END QUESTION -->

## Quickstart with `jq`

We will use the following data to illustrate key features of `jq`. Running the following cell will create a file `made_up_file.json`.

In [None]:
%%writefile made_up_file.json
{
  "company": "Big Data Inc.",
  "employees": [
    {
      "firstName": "John",
      "lastName": "Doe",
      "skills": [
        "Python",
        "Java",
        "C++"
      ]
    },
    {
      "firstName": "Anna",
      "lastName": "Smith",
      "skills": [
        "JavaScript",
        "HTML",
        "CSS"
      ]
    },
    {
      "firstName": "Peter",
      "lastName": "Jones",
      "skills": [
        "Python",
        "R",
        "SQL"
      ]
    }
  ]
}

Run and understand the following commands to learn what each does:

In [None]:
!jq '.company' made_up_file.json # extract company attribute

In [None]:
!jq '.employees[1]' made_up_file.json # extract second employee

In [None]:
!jq '.employees[].skills[0]' made_up_file.json # extract first skill of each employee

In [None]:
!jq '.employees[] | .firstName' made_up_file.json # using pipe makes some things easier

In [None]:
!jq '.employees[] | { first_name: .firstName, last_name: .lastName }' made_up_file.json # create new object from existing attributes

In [None]:
!jq '.employees[] | {name: (.firstName + " " + .lastName)}' made_up_file.json # create new object from combining existing attributes

## Download using search and object APIs
 
Study the [Met's search API](https://metmuseum.github.io#search). For example, searching for "Vincent Van Gogh" can be accomplished by pasting this url: `https://collectionapi.metmuseum.org/public/collection/v1/search?q="vincent van gogh"`. _In the returned page, you will see `https://collectionapi.metmuseum.org/public/collection/v1/search?q=%22vincent%20van%20gogh%22`, which is replaced quotes and spaces into [urlencoded characters](https://en.wikipedia.org/wiki/Percent-encoding#Character_data)._

Study the [Met's object API](https://metmuseum.github.io#object). To find the details of an object with ID, `436533`, paste this url: `https://collectionapi.metmuseum.org/public/collection/v1/objects/436533` 

### The `search` endpoint

[Construct a search query](https://metmuseum.github.io/#search) to search for items with the query string "van gogh" (with quotes), has images, and located in Europe. 

- Save the URL for this query in a variable named `met_van_gogh`.
- Save returned result (in a list) in Python variable named `met_van_gogh_result`.
- Parse the string with `json` module by loading the string with `loads()` method, then,
- Extract the `objectID` element and save as variable `met_van_gogh_items`.


In [None]:
met_van_gogh = 'https://collectionapi.metmuseum.org/public/collection/v1/search?...'

met_van_gogh_result = !curl -s ...

import json
met_van_gogh_items = json.loads(met_van_gogh_result[...])[...]
met_van_gogh_items.sort()

item_urls = {}
for item in met_van_gogh_items:

  item_url = 'https://collectionapi.metmuseum.org/public/collection/v1/objects/{}'.format(item)
  item_urls[item] = item_url

In [None]:
grader.check("search_api")

<!-- BEGIN QUESTION -->

Download from the 10 urls created in the previous question. Save them to files as shown in the code.

In [None]:
for item_id, item_url in item_urls.items():
  print('processing:', item_url)
  !curl -s '...' > item_{item_id}.json # the query URL goes here

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

Run and inspect the following shell code and explain what `xargs` does. What does `wc` tell you about each file?

_Type your answer here, replacing this text._

In [None]:
!ls -1 item_*.json | xargs wc

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

Complete the following code to collect all `primaryImage` into one single file `all_images.json`

In [None]:
!ls -1 item_*.json | xargs cat | jq -r '...' > wikidata_urls.txt # the jq command goes here
!cat wikidata_urls.txt

<!-- END QUESTION -->

Also complete Datacamp's [Intermediate Importing Data in Python](https://app.datacamp.com/learn/courses/intermediate-importing-data-in-python).

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Download the zip file and submit to Gradescope.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True)