# COMP4160_Yinjie_Liu_20211091_Assignment1_1

In this assignment I would like to collect data from [US Energy Information Administration](https://www.eia.gov/opendata) 
for different types of energy imports and exports

This notebook covers Task 1 - Data Collection. Since this API provides different types of energy imports and exports as well as total energy imports and exports, at first I will extract json data from total energy API query browser as demonstrations and then extract, save and analyse specific engery imports and exports in details.

In [None]:
import json, requests, urllib
from pathlib import Path
from datetime import datetime
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

# Data Collection

Settings for the API and data collection. Note that you can use API key as I provide below:

In [None]:
# API Key (replace with your own API key)
api_key = "gKViIy44kknSJmNLGi90jtekQ2zEbPVnuzJuPmVZ"
# Prefix for API URLs
api_prefix = "https://api.eia.gov/series/"
# The two total energy exports and two total energy imports as examples

# TOTAL.PMEXBUS.A ---- Total Petroleum, Excluding Biofuels, Exports, Annual
# TOTAL.TEEXBUS.A ---- Total Primary Energy Exports, Annual
# TOTAL.PMIMBUS.A ---- Total Petroleum, Excluding Biofuels, Imports, Annual
# TOTAL.TEIMBUS.A ---- Total Primary Energy Imports, Annual
example_names = ["TOTAL.PMEXBUS.A", "TOTAL.TEEXBUS.A", "TOTAL.PMIMBUS.A", "TOTAL.TEIMBUS.A"]
# The series name for these examples, for disambiguation purposes
example_dictionary = {"TOTAL.PMEXBUS.A":"Total Petroleum, Excluding Biofuels, Exports, Annual",
                    "TOTAL.TEEXBUS.A":"Total Primary Energy Exports, Annual",
                    "TOTAL.PMIMBUS.A":"Total Petroleum, Excluding Biofuels, Imports, Annual",
                    "TOTAL.TEIMBUS.A":"Total Primary Energy Imports, Annual"}

Create directory for raw data storage, if it does not already exist:

In [None]:
dir_raw = Path("raw")
dir_raw.mkdir(parents=True, exist_ok=True)

Define a fetch function for retrieving data from the **US Energy Information Administration API**:

In [None]:
# The example of API call to use
# http://api.eia.gov/series/?api_key=gKViIy44kknSJmNLGi90jtekQ2zEbPVnuzJuPmVZ&series_id=TOTAL.TEIMBUS.A
def fetch(params):
    # construct the url
    url = api_prefix
    url += "?" + urllib.parse.urlencode(params)
    print("Fetching %s" % url)
    # fetch the page
    response = requests.get(url)
    jdata = response.text
    # return retrieved data as json format
    return json.loads(jdata)

This API data search's key is series ID, based on distinct ID we can retrieve total energy imports and exports data from the API.

In [None]:
example_metadata = {}
example_series_name = {}
params={}
params["api_key"] = api_key
for example_name in example_names:
    params["series_id"] = example_name
    example_data = fetch(params)
    # is this the result we are looking for?
    if example_data["series"][0]["series_id"] == example_name and example_data["series"][0]["name"] == example_dictionary[example_name]:
        print("Found match for %s: Meaning=%s" % 
              (example_name, example_dictionary[example_name]))
        example_metadata[example_name] = example_data 
        example_series_name[example_name] = example_data["request"]["series_id"]
    
print("Found keys for %d cities" % len(example_series_name))

Illustrate data that we collect from API as dataframe format to show the strcture of data which will be analysed soon.

In [None]:
metadata_rows = []
for example_name in example_names:
    row = {"title": example_name, "Frequency": example_metadata[example_name]["series"][0]["f"]}
    row["units"] = example_metadata[example_name]["series"][0]["units"]
    row["start"] = example_metadata[example_name]["series"][0]["start"]
    row["end"] =  example_metadata[example_name]["series"][0]["end"]
    index = 0
    len_list = len(example_metadata[example_name]["series"][0]["data"])- 1
    # only pick up six years' data for demonstrated purpose
    while index <= 5:
        row[example_metadata[example_name]["series"][0]["data"][len_list][0]] = example_metadata[example_name]["series"][0]["data"][len_list][1]
        index += 1
        len_list -=1
    metadata_rows.append(row)
pd.DataFrame(metadata_rows).set_index("title")

Now we will focus on specific energies and save their data as json files for further analysis

In [None]:
def fetch_specific_energy(example_name, series_name):
    # fetch the specific data among different types of energy
    params["series_id"] = example_name
    specifc_data = fetch(params)
    # write it out to our raw dataset directory
    fname = "%s-%s.json" % (example_name, series_name)
    out_path = dir_raw / fname
    print("Writing data to %s" % out_path)
    fout = open(out_path, "w")
    json.dump(specifc_data, fout, indent=4, sort_keys=True)
    fout.close()

There are five export types and five import types, separately extract their data from **US Energy Information Administration API**

In [None]:
# five specific energy imports data will be retrieved and saved 
imports_names = ["TOTAL.BFIMBUS.A", "TOTAL.CCIMBUS.A", "TOTAL.CLIMBUS.A", "TOTAL.ELIMBUS.A", "TOTAL.COIMBUS.A"]
imports_dictionary = {"TOTAL.BFIMBUS.A":"Biomass Imports, Annual",
                    "TOTAL.CCIMBUS.A":"Coal Coke Imports, Annual",
                    "TOTAL.CLIMBUS.A":"Coal Imports, Annual",
                    "TOTAL.ELIMBUS.A":"Electricity Imports, Annual",
                      "TOTAL.COIMBUS.A":"Crude Oil Imports, Annual"}

# five specific energy exports data will be retrieved and saved 
exports_names = ["TOTAL.BMEXBUS.A", "TOTAL.CCEXBUS.A", "TOTAL.CLEXBUS.A", "TOTAL.ELEXBUS.A", "TOTAL.COEXBUS.A"]
exports_dictionary = {"TOTAL.BMEXBUS.A":"Biomass Exports, Annual",
                    "TOTAL.CCEXBUS.A":"Coal Coke Exports, Annual",
                    "TOTAL.CLEXBUS.A":"Coal Exports, Annual",
                    "TOTAL.ELEXBUS.A":"Electricity Exports, Annual",
                      "TOTAL.COEXBUS.A":"Crude Oil Exports, Annual"}

for imports_name in imports_names:
    fetch_specific_energy(imports_name, imports_dictionary[imports_name])

for exports_name in exports_names:
    fetch_specific_energy(exports_name, exports_dictionary[exports_name])