## Reading remote data into `pandas`

### example: reading parquet

### challenge

Adapt your previous Excel-reading code to read the same Excel file directly from the Internet. GitHub has convenient links for all the files in the repo - you should be able to find the Excel file here: `https://github.com/catalyst-cooperative/open-energy-data-for-all/raw/refs/heads/main/data/eia923_2022.xlsx`

Start with the code below.

### solution

```python
# modify this to read from the URL!
pd.read_excel("data/eia923_2022.xlsx", skiprows=5)
```

### key point

Most, but not all, of the `read_*` functions support URLs -  check the docs to make sure this will work!

### Discussion
What are some advantages and disadvantages you can imagine for using remote data vs. saving the data to your hard drive (aka **local data**)?


## Using `requests` to download files

### example: EIA 923 JSON

### challenge

Adapt the JSON reading code from last episode to use requests.get.

### solution

```python
import pandas as pd
import json

with open('data/eia923_2022.json') as file:
    eia923_json = json.load(file)

eia923_json_df = pd.DataFrame(eia923_json["response"]["data"])
```

### key points

* `requests` is useful when you need to reformat the data before shoving it into `pandas`
* `response.status_code` tells you if the request succeeded or why it failed.
* `response.text` gives you the raw response, if you need to check that the data is formatted how you expect
* `response.json()` will parse the response as JSON, which is handy

## Web APIs: Fancy URLs

In [None]:
response = requests.get("https://api.eia.gov/v2/electricity/electric-power-operational-data/data?data[]=consumption-for-eg&facets[fueltypeid][]=NG&facets[sectorid][]=99&facets[location][]=CO&frequency=annual&start=2020&end=2023&api_key=3zjKYxV86AqtJWSRoAECir1wQFscVu6lxXnRVKG8")

response.json()

In [None]:
# https://api.eia.gov/v2/electricity/electric-power-operational-data/data?
# data[]=consumption-for-eg&
# facets[fueltypeid][]=NG&
# facets[sectorid][]=99&facets[location][]=CO&frequency=annual&start=2020&end=2023&api_key=3zjKYxV86AqtJWSRoAECir1wQFscVu6lxXnRVKG8

### challenge

Make a request to `https://api.eia.gov/v2/electricity/electric-power-operational-data/data?data[]=consumption-for-eg&facets[fueltypeid][]=NG&facets[sectorid][]=99&facets[location][]=CO&frequency=annual&start=2020&end=2023&api_key=3zjKYxV86AqtJWSRoAECir1wQFscVu6lxXnRVKG8` with `requests.get`.

Try removing the `end=2023` parameter from the URL. What happens?

### solution

### key points

* web APIs can be thought of as bundles of fancy URLs
* each web API is different, but if you can read the documentation and make requests to URLs, you can figure them out


## Case study: EIA API

In [None]:
api_key = "3zjKYxV86AqtJWSRoAECir1wQFscVu6lxXnRVKG8"

### challenge

If we're looking for yearly data about fuel consumption at the plant level, what route should we request next?

### solution

In [None]:
base_url = "https://api.eia.gov/v2/electricity"

# requests.get("some route for the fuel consumption")

### example: drilling down

![Screenshot of documentation, with relevant text reproduced below](../episodes/fig/ep-3/data-endpoint.png)

> In earlier examples, when we asked about the metadata, the API responded with these available data points [under the 'data' key]:
>
> [...]
>
> Remember, in addition to specifying the column in the data[] parameter, we must also specify /data as the last node in the route:
>
> `https://api.eia.gov/v2/electricity/retail-sales/data/?api_key=XXXXXX&data[]=price`

### challenge

Given the above example, and the output for the `facility-fuels` metadata, how do we get the net generation data?

Build off of the earlier request, reproduced below:

### solution

In [None]:
# what should the url be changed to?
facility_fuel = requests.get(f"{base_url}/facility-fuel?api_key={api_key}")

### example: frequency

In [None]:
yearly = requests.get(f"{base_url}/facility-fuel/data?data[]=generation&frequency=yearly&api_key={api_key}")
yearly

In [None]:
annual = requests.get(f"{base_url}/facility-fuel/data?data[]=generation&frequency=annual&api_key={api_key}")

annual.json()

In [None]:
annual = requests.get(
    f"{base_url}/facility-fuel/data",
    params={
        "data[]": "generation",
        "frequency": "annual",
        "api_key": api_key
    },
)

annual.json()

### example: "faceting" / filtering

In [None]:
facility_fuel.json()

In [None]:
fueltypes = requests.get(f"{base_url}/facility-fuel/facet/fuel2002?api_key={api_key}").json()

fueltypes

In [None]:
annual_ng = requests.get(
    f"{base_url}/facility-fuel/data",
    params={
        "data[]": "generation",
        "frequency": "annual",
        "facets[fuel2002][]": "NG",
        "api_key": api_key
    },
)

annual_ng.json()

### challenge

Now we want to limit this to just the state of Colorado - let's update the code to do that.

As before, let's build off the old request.

### solution

In [None]:
annual_ng = requests.get(
    f"{base_url}/facility-fuel/data",
    params={
        "data[]": "generation",
        "frequency": "annual",
        "facets[fuel2002][]": "NG",
        "api_key": api_key
    },
)

annual_ng.json()

### example: time limits

We saw the start/end parameters a bit earlier, but let's actually poke at the documentation to see how they're used:

![Screenshot with several examples, reproduced below](../episodes/fig/ep-3/start-end.png)

> Start date
> https://api.eia.gov/v2/electricity/retail-sales/data?api_key=xxxxxx&data[]=price&facets[sectorid][]=RES&facets[stateid][]=CO&frequency=monthly&start=2008-01-31
>
> End date
> https://api.eia.gov/v2/electricity/retail-sales/data?api_key=xxxxxx&data[]=price&facets[sectorid][]=RES&facets[stateid][]=CO&frequency=monthly&end=2008-03-01
>
> Start and end date together
> https://api.eia.gov/v2/electricity/retail-sales/data?api_key=xxxxxx&data[]=price&facets[sectorid][]=RES&facets[stateid][]=CO&frequency=monthly&start=2008-01-31&end=2008-03-01

Let's try out this pattern!

### challenge

Limit the results to 2020-2023. Start from your last query, reproduced below:

### solution

In [None]:
annual_ng_co = requests.get(
    f"{base_url}/facility-fuel/data",
    params={
        "data[]": "generation",
        "frequency": "annual",
        "facets[fuel2002][]": "NG",
        "facets[state][]": "CO",
        "api_key": api_key
    },
)

annual_ng_co.json()

### discussion

Think back to the metadata you saw - what are some questions can you answer with the `facility-fuel` endpoint?

### keypoints

* Many functions in the `pandas.read_*` family can read tabular data from remote servers & cloud storage as if it was on your local computer
* `requests` can get data that's not in the right shape for `pandas.read_*`; you'll have to do the translation from their response format into `pandas.DataFrame` yourself
* web APIs are just collections of fancy URLs, which you can interact with via `requests`
* to learn an API, you need to be able to read the documentation and experiment with the API to see how it responds.