# HW 8: Writing a Simple API Client in Python
Submitted by: Gideon Tay\
My UNI: gt2528\
Contact me at: gideon.tay@columbia.edu

## 1. Choose an API

I choose [Climatiq's API](https://www.climatiq.io/). Climatiq is a seed stage startup providing an API for companies to make compliant carbon measurements. They have up-to-date emission factor data on their platform and provide automated calculation functionalities as well.

Emission factors measure the weight of CO2 equivalent (CO2e) emitted per unit of reference activity in kilograms. CO2e is a standard measurement enabling emission of different types of greenhouse gases to be measured against each other. For example, an emission factor for natural gas combustion might be 2538.48 kg CO2e per metric ton (t) of natural gas combusted.

## 2. Authentication

Climatiq's API authenticates users with user-specific API keys. Apply for an API key by:
- Go to [Climatiq's pricing page](https://www.climatiq.io/pricing)
- Scroll down to the free community plan and click "Free Signup". This free API key allows for up to 250 API calls/month and basic carbon estimate calculations

## 3. Send a Simple GET request

Firstly, we use the `requests` package to send a GET request and fetch data from Climatiq's API. We use the API's search endpoint to query for the emission factors related to cloud computing in the United States. Note that the search endpoint does not provide the emission factor values, but only the metadata and description of available emission factors in Climatiq's database that is related to the search query. 

We store our API key in a `.env` file in the same directory, alongside a `.gitignore` file to ensure our API key is not visible in Github.

In [1]:
import requests
import os
import json
from dotenv import load_dotenv

# Get API key from .env
load_dotenv()  # loads in the .env file
MY_API_KEY = os.getenv('ENV_API_KEY')

# Query parameters
url = "https://api.climatiq.io/data/v1/search" # The search endpoint
query_params = {
    "query": "cloud computing",   # The free text query
    "data_version": "^18",        # The latest data version
    "region": "US"                # United states as the region
}

# Specify AUTH token in the "Authorization" header
auth_headers = {"Authorization": f"Bearer {MY_API_KEY}"}

# This performs the GET request
r = requests.get(url, params=query_params, headers=auth_headers)

To share a snippet of the JSON output rather than the whole output, we have to explore how the JSON output is structured:

In [2]:
climatiq_json = r.json()
print(type(climatiq_json))  # JSON is a dictionary
print(climatiq_json.keys()) # Keys for the dictionary

<class 'dict'>
dict_keys(['current_page', 'last_page', 'total_results', 'results', 'possible_filters'])


It looks like the main results of interest (emission factor data) are in `climatiq_json['results']`. Let's check the data type of this:

In [3]:
print(type(climatiq_json['results']))  # this is a list

<class 'list'>


**(3a) Now, we display a snippet of the JSON output.** 

In particular, we print the first item on the `climatiq_json['results']` list. We see that we have an emission factor for Emission intensity for use of cloud computing memory in kg CO2e per gigabyte-hour for AWS data centers in the "us_gov_east_1" region. From the [AWS docs website](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html), we can see that this corresponds to their US East (N. Virginia) data centers.

In [4]:
# View output, json.dumps() makes the r.json() more readable
climatiq_json = r.json()
print(json.dumps(climatiq_json['results'][0], indent=2))

{
  "activity_id": "memory-provider_aws-region_us_gov_east_1",
  "id": "c461b335-085e-464b-bb5c-a481388aa61b",
  "name": "AWS (us-gov-east-1) memory",
  "category": "Cloud Computing - Memory",
  "sector": "Information and Communication",
  "source": "CCF",
  "source_link": "https://www.cloudcarbonfootprint.org/docs/methodology/#compute",
  "source_dataset": "Derived from CCF models",
  "uncertainty": null,
  "year": 2021,
  "year_released": 2021,
  "region": "US",
  "region_name": "United States of America (the)",
  "description": "Emission intensity for use of cloud computing memory in kg CO2e per gigabyte-hour for the Amazon Web Services data centers in the given location using the assumptions and grid emissions factors available in the source as of date accessed. The source does not clarify if the kgCO2e value is calculated using either IPCC Fourth Assessment Report (AR4) or IPCC Fifth Assessment Report (AR5) methodologies.",
  "unit_type": "DataOverTime",
  "unit": "kg/GB-hour",
  

**(3b) We check and display the status of our request:**

In [5]:
# Check if request worked. 200 means everything is ok.
r.status_code

200

**(3c) We identify and display the type of the response:**

In [6]:
# Check content type received. We got json
print(f"The content type is: {r.headers['content-type']}")

The content type is: application/json


## 4. Parse the response and create a dataset

**(4a) Let's convert the API response into a pandas data frame**

Note that we only include data from the 'results' list in the JSON to our dataframe. The information in the other keys like are not included as they are less important and do not include data on the emission factors. Rather, they include information on:
- total_results: total results from our initial query
- current_page: the current page we are on
- last_page: the number of pages these results are stored in
- possible_filters: possible ways to filter the results

Now, to store the key emission factor data into a dataframe:

In [7]:
# Import pandas package
import pandas as pd

# Convert list climatiq_json['results'] to a pandas dataframe
climatiq_json_df = pd.DataFrame(climatiq_json['results'])

# Display the first 5 entries of the dataframe
climatiq_json_df.head(5)

Unnamed: 0,activity_id,id,name,category,sector,source,source_link,source_dataset,uncertainty,year,...,source_lca_activity,data_quality_flags,access_type,supported_calculation_methods,factor,factor_calculation_method,factor_calculation_origin,constituent_gases,data_version,data_version_information
0,memory-provider_aws-region_us_gov_east_1,c461b335-085e-464b-bb5c-a481388aa61b,AWS (us-gov-east-1) memory,Cloud Computing - Memory,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2021,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
1,memory-provider_aws-region_us_gov_east_1,c88d3c94-ab3a-4e72-9463-f912f0be2914,AWS (us-gov-east-1) memory,Cloud Computing - Memory,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2022,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
2,memory-provider_aws-region_us_gov_west_1,f2087b27-1b84-47b1-89e3-fc13cdb5cd09,AWS (us-gov-west-1) memory,Cloud Computing - Memory,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2021,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
3,memory-provider_aws-region_us_gov_west_1,cfaf549e-859f-4fd4-8dc4-ba4fc3389cc8,AWS (us-gov-west-1) memory,Cloud Computing - Memory,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2022,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
4,cpu-provider_aws-region_us_gov_east_1,b874c47e-674c-4a5d-a490-c597deb397bc,AWS (us-gov-east-1) CPU,Cloud Computing - CPU,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2021,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}


Check how many emission factor records/ rows we have from our current API query:

In [8]:
print(f"Dataframe shape: {climatiq_json_df.shape}")
print(f"No. of results: {climatiq_json['total_results']}")

Dataframe shape: (20, 26)
No. of results: 20


**(4b) Use the API to create a dataset with multiple records (sample size > 100). Include some interesting features.**

Our previous query only returns 20 records. In fact, this is a limitation of our free API key: it can only return 20 records at a time, even if there are more than 20 records relevant to our search query.

To build a dataset with >100 sample size, we have to perform more GET requests. We will conduct additional GET requests for different geographic regions but with the same 'cloud computing' search query.

In [None]:
# Base query parameters
base_query_params = {
    "query": "cloud computing",
    "data_version": "^18"
}

# List of regions 
regions = [
    "US", # United States
    "EU", # Europe
    "GB", # United Kingdom
    "BR", # Brazil
    "JP", # Japan
    "ZA", # South Africa
    "CA", # Canada
    "SG", # Singapore
    "AU", # Australia
    "CN", # China
    "HK", # Hong Kong
    "IN", # India
    "AE", # UAE
    "BH", # Bahrain
    "IT", # Italy
    "FR", # France
    "SE", # Sweden
    "DE"  # Germany
    ]

# List to store DataFrames
dfs = []

# Loop through regions, update 'region' in query parameters, and make request
for region in regions:
    query_params = {**base_query_params, "region": region} 
    r = requests.get(url, params=query_params, headers=auth_headers)
    
    if r.status_code == 200:  # check if request was successful
        # Convert JSON response to a DataFrame and append it to the list
        climatiq_json = r.json()
        df = pd.DataFrame(climatiq_json['results']) 
        dfs.append(df)
    else:
        print(f"Failed to fetch data for region {region}, "
              f"status code: {r.status_code}")

# Concatenate all DataFrames into a single DataFrame
merged_df = pd.concat(dfs, ignore_index=True)

# Check dataframe shape
merged_df.shape

(115, 26)

We have 115 records from our various GET requests. Let's view some of them:

In [17]:
# Display the merged DataFrame
merged_df.head(5)

Unnamed: 0,activity_id,id,name,category,sector,source,source_link,source_dataset,uncertainty,year,...,source_lca_activity,data_quality_flags,access_type,supported_calculation_methods,factor,factor_calculation_method,factor_calculation_origin,constituent_gases,data_version,data_version_information
0,memory-provider_aws-region_us_gov_east_1,c461b335-085e-464b-bb5c-a481388aa61b,AWS (us-gov-east-1) memory,Cloud Computing - Memory,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2021,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
1,memory-provider_aws-region_us_gov_east_1,c88d3c94-ab3a-4e72-9463-f912f0be2914,AWS (us-gov-east-1) memory,Cloud Computing - Memory,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2022,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
2,memory-provider_aws-region_us_gov_west_1,f2087b27-1b84-47b1-89e3-fc13cdb5cd09,AWS (us-gov-west-1) memory,Cloud Computing - Memory,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2021,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
3,memory-provider_aws-region_us_gov_west_1,cfaf549e-859f-4fd4-8dc4-ba4fc3389cc8,AWS (us-gov-west-1) memory,Cloud Computing - Memory,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2022,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
4,cpu-provider_aws-region_us_gov_east_1,b874c47e-674c-4a5d-a490-c597deb397bc,AWS (us-gov-east-1) CPU,Cloud Computing - CPU,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2021,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}


**(4c) Provide summary statistics of your dataset. Include the data frame in a .csv file named data.csv with your submission.**

Since our data is mostly text-based and not numbers, the standard summary statistics have limited value. However, we still conduct it below:

In [20]:
# Generate summary statistics for the DataFrame
merged_df.describe(include='all')

Unnamed: 0,activity_id,id,name,category,sector,source,source_link,source_dataset,uncertainty,year,...,source_lca_activity,data_quality_flags,access_type,supported_calculation_methods,factor,factor_calculation_method,factor_calculation_origin,constituent_gases,data_version,data_version_information
count,115,115,115,115,115,115,115,115,0.0,115.0,...,115,115,115,115,0.0,0.0,0.0,115,115,115
unique,98,115,98,4,1,1,1,1,0.0,,...,1,1,1,1,0.0,0.0,0.0,1,1,1
top,memory-provider_aws-region_us_gov_east_1,c461b335-085e-464b-bb5c-a481388aa61b,AWS (us-gov-east-1) memory,Cloud Computing - Storage,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
freq,2,1,2,44,115,115,115,115,,,...,115,115,115,115,,,,115,115,115
mean,,,,,,,,,,2021.843478,...,,,,,,,,,,
std,,,,,,,,,,0.364939,...,,,,,,,,,,
min,,,,,,,,,,2021.0,...,,,,,,,,,,
25%,,,,,,,,,,2022.0,...,,,,,,,,,,
50%,,,,,,,,,,2022.0,...,,,,,,,,,,
75%,,,,,,,,,,2022.0,...,,,,,,,,,,


Of the 115 rows, 98 are unique. There are 4 broad categories of cloud computing. All cloud computing related emission factors in Climatiq's database are from the same source: [Cloud Carbon Footprint (CCF)](https://www.cloudcarbonfootprint.org/docs/methodology/#compute).

Let's find the unique categories:

In [21]:
# Display the unique categories
unique_categories = merged_df['category'].unique()
unique_categories

array(['Cloud Computing - Memory', 'Cloud Computing - CPU',
       'Cloud Computing - Networking', 'Cloud Computing - Storage'],
      dtype=object)

The 4 unique categories suggest that using cloud services for memory, CPU, networking, and storage is associated with different levels of emissions.

Next, let's save the data frame into a csv file:

In [18]:
# Save the merged DataFrame to a CSV file
merged_df.to_csv('data.csv', index=False)

## 5. Write an API Client Function

Wrap the code into a simple API client function:

In [24]:
def fetch_api_data(
        query="cloud computing", data_version="^18", regions=["JP", "SG"], 
        auth_token=None, url="https://api.climatiq.io/data/v1/search"
        ):
    """
    Fetches data from an API for specified regions and query parameters.

    Parameters:
    - query (str): The search query to be used in the API request. 
                   Default is "cloud computing".
    - data_version (str): The version of data to be used. Default is "^18".
    - regions (list): List of regions to fetch data for. 
                      Default is ["JP", "SG"] if not provided.
    - auth_token (str): Authorization token for the API. Required for authentication.
    - url (str): The base URL for the API endpoint. 

    Returns:
    - pandas.DataFrame: A DataFrame containing the merged data from all regions.

    Raises:
    - ValueError: If auth_token is not provided or if any region request fails.
    """

    # Check if auth_token is provided
    if not auth_token:
        raise ValueError("Authentication token is required")

    # Set up authentication headers
    auth_headers = {
        "Authorization": f"Bearer {auth_token}"
    }

    # Base query parameters (without region)
    base_query_params = {
        "query": query,
        "data_version": data_version
    }

    # List to store DataFrames from each region
    dfs = []

    # Loop through regions, update 'region' in query parameters, and make requests
    for region in regions:
        query_params = {**base_query_params, "region": region} 
        r = requests.get(url, params=query_params, headers=auth_headers)
        
        # Check response status
        if r.status_code == 200:
            # Convert JSON response to a DataFrame and append it to the list
            json_data = r.json()
            df = pd.DataFrame(json_data.get('results', [])) 
            dfs.append(df)
        else:
            raise ValueError(f"Request failed for region {region} with status code {r.status_code}")

    # Concatenate all DataFrames into a single DataFrame
    merged_df = pd.concat(dfs, ignore_index=True)

    # Return the merged DataFrame
    return merged_df

# Run the function with default values
df = fetch_api_data(auth_token=MY_API_KEY)
# Display first 5 rows of the output
df.head(5)

Unnamed: 0,activity_id,id,name,category,sector,source,source_link,source_dataset,uncertainty,year,...,source_lca_activity,data_quality_flags,access_type,supported_calculation_methods,factor,factor_calculation_method,factor_calculation_origin,constituent_gases,data_version,data_version_information
0,memory-provider_azure-region_japan,10b53d2a-6b76-4ca6-a8fb-5b271bbba749,AZURE (japan) memory,Cloud Computing - Memory,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2022,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
1,cpu-provider_azure-region_japan,08a0b39b-6cf4-4f36-b86c-f192eac1e843,AZURE (japan) CPU,Cloud Computing - CPU,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2022,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
2,networking-provider_azure-region_japan,407e156c-3ea2-44ee-ba0d-b311fa4608b9,AZURE (japan) networking,Cloud Computing - Networking,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2022,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
3,storage-provider_azure-region_japan-type_hdd,4010252a-bf52-44ee-acb0-e34ce40d6f29,AZURE (japan) HDD storage,Cloud Computing - Storage,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2022,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}
4,storage-provider_azure-region_japan-type_ssd,59f9367b-e625-4b80-a296-3eaa687c27fe,AZURE (japan) SSD storage,Cloud Computing - Storage,Information and Communication,CCF,https://www.cloudcarbonfootprint.org/docs/meth...,Derived from CCF models,,2022,...,use_phase,[notable_methodological_variance],public,[ar4],,,,"{'co2e_total': None, 'co2e_other': None, 'co2'...",{'status': 'up_to_date'},{'status': 'up_to_date'}


Note that the function is generalized. You can input other regions, queries, and data versions to tailor your GET request using this function.