In [2]:
import pandas as pd

## Reading remote data into `pandas`

### challenge

Adapt your previous Excel-reading code to read the same Excel file directly from the Internet. GitHub has convenient links for all the files in the repo - you should be able to find the Excel file here: `https://github.com/catalyst-cooperative/open-energy-data-for-all/raw/refs/heads/main/data/eia923_2022.xlsx`

```python
import pandas as pd

pd.read_excel("data/eia923_2022.xlsx", skiprows=5)
```


### solution

### key point

Most, but not all, of the `read_*` functions support URLs -  check the docs to make sure this will work!

In [None]:
## Using `requests` to download files

### discussion

What are some advantages and disadvantages you can imagine for using remote data vs. saving the data to your hard drive (aka **local data**)?

### challenge

Adapt the JSON reading code from last episode to use requests.get.

```python
import pandas as pd
import json

with open('data/eia923_2022.json') as file:
    eia923_json = json.load(file)

eia923_json_df = pd.DataFrame(eia923_json["response"]["data"])
```

### solution

### key points

* `requests` is useful when you need to reformat the data before shoving it into `pandas`
* `response.status_code` tells you if the request succeeded or why it failed.
* `response.text` gives you the raw response, if you need to check that the data is formatted how you expect
* `response.json()` will parse the response as JSON, which is handy

## Web APIs: Fancy URLs

### challenge

Make a request to `https://api.eia.gov/v2/electricity/electric-power-operational-data/data?data[]=consumption-for-eg&facets[fueltypeid][]=NG&facets[sectorid][]=99&facets[location][]=CO&frequency=annual&start=2020&end=2023&api_key=3zjKYxV86AqtJWSRoAECir1wQFscVu6lxXnRVKG8` with `requests.get`.

Try removing the `end=2023` parameter from the URL. What happens?

### solution

### key points

* web APIs can be thought of as bundles of fancy URLs
* each web API is different, but if you can read the documentation and make requests to URLs, you can figure them out


## Case study: EIA API

### challenge

If we're looking for yearly data about fuel consumption at the plant level, what route should we request next?

### solution

### challenge

Given the above example, and the output for the `facility-fuels` metadata, how do we get the net generation data?

Build off of the earlier request:

```python
facility_fuel = requests.get(f"{base_url}/facility-fuel?api_key={api_key}")

facility_fuel.json()
```

### solution

### challenge

Now we want to limit this to just the state of Colorado - let's update the code to do that.

As before, let's build off the old request.

```python
annual_ng = requests.get(
    "{base_url}/facility-fuel/data",
    params={
        "data[]": "generation",
        "frequency": "annual",
        "facets[fuel2002][]": "NG",
        "api_key": api_key
    },
)

annual_ng.json()
```


### solution

### challenge

Limit the results to 2020-2023. Start from your last query:

```python
annual_ng_co = requests.get(
    "{base_url}/facility-fuel/data",
    params={
        "data[]": "generation",
        "frequency": "annual",
        "facets[fuel2002][]": "NG",
        "facets[state][]": "CO",
        "api_key": api_key
    },
)

annual_ng_co.json()
```


### solution