Please read the following instructions thoroughly. Neglecting to do so may result in missed points.

### Preamble
**Reminder**: Homeworks are due by 7:00PM ET on Sundays.

Before you turn this problem set in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

### Naming conventions
Be sure the filename of your notebook is in the following form:

    <uni>_<assignment>_<details [optional]>.<extension>
    
For example:

    lr3086_hw01.ipynb
    lr3086_hw01_complete.ipynb
    LR3086_HW01.ipynb
    
To rename a notebook, in the menubar, select File$\rightarrow$Rename. The extension for notebook files, `.ipynb`, will already be appended to the filename, but will be hidden from view within the notebook.
    
This naming format allows for autograding of all assignments. If your files are not named with this format, you should expect a grade of zero for the assignment.

Courseworks may rename your file to something like `lr3086_hw0-1.ipynb` if you resubmit your assignment. This is perfectly fine.

### What Format To Submit In

Most homeworks are in Jupyter notebooks. Once you've finished your homework, unless specified otherwise, please download your work as an `.ipynb` file to your local machine, then upload it to Courseworks when complete (in the menubar, select File$\rightarrow$Download as$\rightarrow$Notebook).

**Failure to submit a Jupyter notebook will result in a grade of zero for the assignment.**

### Grading

Possible points on late assignments are deducted by 50% for each day they are late. For example, if you get 80% of the total possible credits on a homework but hand in that homework a day late, you would get 40%. Assignments two days late get zero points.

Once solutions are posted and graded assignments are handed back, students have 1 week to bring their grading discrepancies to a CA for consideration of possible grading errors.

Because grading is automated, please delete (or comment out) the `raise NotImplmeneted` code before attempting a problem.

Empty un-editable cells in an assignment are there for a reason. They will be filled with tests by the automatic grader. Please do not attempt to remove them.

### Getting Help

Asking for help is a great way to increase your chance of success. However there are some rules. When asking for help (especially from a fellow student), *you can show your helper your code but you can not view theirs*. You work needs to be your own. You can not post screenshots of your current work to Ed Discussions or other tools used for getting help.

If you need to reach out to a CA for help, please do so via Ed Discussions and not via email. Answers given via Ed Discussions will help you as well as other students. Thus, emails will always have a lower priority for response than Ed Discussions questions. If you do email the CA, please make a note of what section you are in. This helps us identify you in Courseworks faster.

Finally, if you do not get a repsonse from a CA within 48 hours, you may email the professor.

---

# Homework 8: HTTP, REST APIs, Data Formats

You may use the following packages/imports for any of the questions below:

* `bs4` ([docs](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) - should already be installed with Anaconda) 
* `requests` ([docs](https://docs.python-requests.org/en/latest/) - should already be installed with Anaconda)
* `json` (standard library)
* `re` (standard library)

Total questions: 6<br/>
Total points: 10

## Question 1

Write a function called `get_api_header_data` that calls `http://numbersapi.com/21/trivia`, and returns a `dict` of the following keys, with the values found in the response's headers:

* `"Server"`
* `"Content-Type"`
* `"Content-Length"`

[1 point]

In [1]:
import requests

def get_api_header_data():
 
    url = "http://numbersapi.com/21/trivia"
    
    response = requests.get(url)
    
    headers = response.headers
    
    return {
        "Server": headers.get("Server"),
        "Content-Type": headers.get("Content-Type"),
        "Content-Length": headers.get("Content-Length")
    }
 


In [2]:
### BEGIN TESTS

# Ensure the returned result is a dictionary
result = get_api_header_data()
assert isinstance(result, dict)

### END TESTS

In [3]:
# CELL INTENTIONALLY LEFT BLANK - DO NOT ALTER OR DELETE

## Question 2

Write a function called `post_data` that takes one argument, `data` (a dictionary), and returns a `requests.Response` object.

This function should submit a `POST` request to the url `https://httpbin.org/post`. The request should post the `data` argument as JSON. It should also set the headers `"Content-Type"` and `"Accept"` to the appropriate [MIME type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types) for sending JSON data. Finally, it should return the response received from the post request.

[2 points]

In [4]:
import json
import requests
 

def post_data(data):
    url = "https://httpbin.org/post"
    headers = {
        "Content-Type": "application/json",
        "Accept": "application/json"
    }
    response = requests.post(url, json=data, headers=headers)
    return response
 


In [5]:
### BEGIN TESTS

result = post_data({"foo": "bar", "baz": "blah"})

# Ensure the correct type is returned
assert isinstance(result, requests.Response)

# Ensure the correct request method was used
assert "POST" == result.request.method
### END TESTS

In [6]:
# CELL INTENTIONALLY LEFT BLANK - DO NOT ALTER OR DELETE

In [7]:
# CELL INTENTIONALLY LEFT BLANK - DO NOT ALTER OR DELETE

## Question 3

Define a function called `get_html` that returns an HTML string. The HTML returned should contain the following HTML elements:

* Three paragraphs
* One photo
* Three hypertext reference link

The paragraphs and link elements must contain content. The HTML returned must be valid HTML (you can use [this HTML validator](https://validator.w3.org/nu/#textarea) to check if your HTML is valid), and must contain all necessary container tags.

[2 points]

In [8]:
def get_html():
 
    html_content = """
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>Sample HTML</title>
    </head>
    <body>
        <p>This is the first paragraph.</p>
        <p>This is the second paragraph.</p>
        <p>This is the third paragraph.</p>
        
        <img src="https://via.placeholder.com/150" alt="Sample Image">
        
        <a href="https://example.com/1">First link</a><br>
        <a href="https://example.com/2">Second link</a><br>
        <a href="https://example.com/3">Third link</a>
    </body>
    </html>
    """
    return html_content
 


In [9]:
### BEGIN TESTS

# Ensure the HTML has the correct header
html = get_html()
# first normalize possible correct implementations
cleaned_html = ' '.join(html.split()).strip().lower()
# then assert that it starts with the expected header
assert cleaned_html.startswith("<!doctype html>")

### END TESTS

In [10]:
# CELL INTENTIONALLY LEFT BLANK - DO NOT ALTER OR DELETE

In [11]:
# CELL INTENTIONALLY LEFT BLANK - DO NOT ALTER OR DELETE

In [12]:
# CELL INTENTIONALLY LEFT BLANK - DO NOT ALTER OR DELETE

## Question 4

Write a function called `ones` which takes two arguments, `m` and `n`, and produces an `m` row by `n` column JSON array-of-arrays, with all elements being 1.

For example:

```py
>>> ones(2, 2)
'[[1, 1], [1, 1]]'
```

Another example:

```py
>>> ones(1, 4)
'[[1, 1, 1, 1]]'
```

You may assume that `m` and `n` will only ever be a positive integer.

[1 points]

In [13]:
import json

def ones(m, n):
    # Using a nested list comprehension to generate the 2D list
    result = [[1] * n for _ in range(m)]
    # Convert the 2D list to a JSON string and return it
    return json.dumps(result)

 


In [14]:
### BEGIN TESTS
assert json.loads(ones(2, 2)) == [[1, 1], [1, 1]]
assert json.loads(ones(1, 4)) == [[1, 1, 1, 1]]
### END TESTS

In [15]:
# CELL INTENTIONALLY LEFT BLANK - DO NOT ALTER OR DELETE

## Question 5

As practice to help prepare your final project, write a function called `parse_nyc_firehouse_data`. It should call the `get_firehouse_data` function already defined for you, and returns a `list` of `dict`s, one for each firehouse returned. Each `dict` should have 1 key, the facility name (`str`) of the firehouse, mapped to one value, a `tuple` of (latitude, longitude) of where that firehouse is located.

Use `json` to parse the response returned by `get_firehouse_data`. Do not include any other pieces of data for each firehouse. 

You may want to read more about the firehouse data [here](https://data.cityofnewyork.us/Public-Safety/FDNY-Firehouse-Listing/hc8x-tcnd).

[2 points]

In [16]:
import json
import requests

FIREHOUSE_URL = "https://data.cityofnewyork.us/resource/hc8x-tcnd.json"

def get_firehouse_data():
    response = requests.get(FIREHOUSE_URL)
    data = response.text
    return data

def parse_nyc_firehouse_data():
    data = get_firehouse_data()
    parsed_data = json.loads(data)
    
    firehouses = []
    
    for firehouse in parsed_data:
        facility_name = firehouse['facilityname']
        latitude = float(firehouse['latitude'])
        longitude = float(firehouse['longitude'])
        firehouses.append({facility_name: (latitude, longitude)})
    
    return firehouses


In [17]:
### BEGIN TESTS

result = parse_nyc_firehouse_data()

# Ensure the returned results are the expected type
assert isinstance(result, list)
assert len(result) > 0
assert isinstance(result[0], dict)

### END TESTS

In [18]:
# CELL INTENTIONALLY LEFT BLANK - DO NOT ALTER OR DELETE

In [19]:
# CELL INTENTIONALLY LEFT BLANK - DO NOT ALTER OR DELETE

## Question 6

Write a function called `find_taxi_parquet_links`. It should call the `get_taxi_html` function already defined for you (which returns the entire HTML of the NYC taxi data website used for your final project), and returns a `list` of `str`s of all links for _both_ Yellow and Green taxi trip records. 

Use BeautifulSoup to parse the HTML returned by `get_taxi_html`. Do not include any other types of links, including for-hire vehicle trip records.

[2 points]

In [20]:
import bs4
import requests

from bs4 import BeautifulSoup
TAXI_URL = "https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page"


def get_taxi_html():
    response = requests.get(TAXI_URL)
    html = response.content
    return html

 

def find_taxi_parquet_links():
    # Get the raw HTML using the provided function
    html_content = get_taxi_html()
    
    # Parse the HTML using BeautifulSoup
    soup = BeautifulSoup(html_content, 'html.parser')
    
    # Extract links that match the criteria for Yellow and Green taxi trip records
    # We're assuming that the links containing either 'yellow' or 'green' and ending with '.parquet' are the ones you're looking for.
    links = soup.find_all('a', href=True)
    taxi_links = [link['href'] for link in links if ('yellow' in link['href'].lower() or 'green' in link['href'].lower()) and link['href'].endswith('.parquet')]
    
    return taxi_links

# Example usage:
taxi_data_links = find_taxi_parquet_links()
print(taxi_data_links)


['https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet', 'https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2023-01.parquet', 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-02.parquet', 'https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2023-02.parquet', 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-03.parquet', 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-04.parquet', 'https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2023-04.parquet', 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-06.parquet', 'https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2023-06.parquet', 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-01.parquet', 'https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2022-01.parquet', 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-02.parquet', 'https:/

In [21]:
### BEGIN TESTS

result = find_taxi_parquet_links()

# Ensure the returned results are the expected type,
assert isinstance(result, list)
assert len(result) > 0
assert isinstance(result[0], str)

# Ensure the number of returned results;
# Since they regularly update the site with new monthly data,
# this is only a rough approximation.
assert len(result) >= 290
assert len(result) < 330

### END TESTS

In [22]:
# CELL INTENTIONALLY LEFT BLANK - DO NOT ALTER OR DELETE

In [23]:
# CELL INTENTIONALLY LEFT BLANK - DO NOT ALTER OR DELETE