# Working with APIs

**Spring 2024 - Instructor:  Chris Volinsky**

**Teaching Assistants: Aditya Deshpande, Stuti Mishra, Krutika Savani**

##**What is an API?**

- An API, or Application Programming Interface, is a set of rules, protocols, and tools that allows different software applications to communicate with each other.

- Web sites that offer data for download will publish an API that allows users to access the data.  The API enforces the rules of the data owner: how much data you can get, which data elements you can get, etc.

- The examples below are a few examples of these - if you want to download data from a web site, search for their public API.  Use the examples below as guidelines of how to write code to get the data.  **Please do not use web scrapers to get data unless granted permission, this may be a violation of terms of service**.
















Here are some of the examples of free APIs

##**1) Weather API**

https://openweathermap.org/api

In [1]:
#Importing the required libraries
import datetime as dt
import requests

In [2]:
BASE_URL = "http://api.openweathermap.org/data/2.5/weather?"

In [3]:
# Create an account on https://home.openweathermap.org/users/sign_in and enter your API key here
API_KEY="cf98a366e668c838c2564149ec2a5ebd"

In [4]:
#Type the city of your choice
CITY="New York"

In [5]:
#Get the weather details of the city of your choice
def kelvin_to_celsius_fahrenheit(kelvin):
  celsius = kelvin - 273.15
  fahrenheit = celsius * (9/5) + 32
  return celsius, fahrenheit


url = BASE_URL + "appid=" + API_KEY + "&q=" + CITY
response = requests.get(url).json()

temp_kelvin = response['main']['temp']
temp_celsius, temp_fahrenheit = kelvin_to_celsius_fahrenheit(temp_kelvin)
feels_like_kelvin = response['main']['feels_like']
feels_like_celsius, feels_like_fahrenheit = kelvin_to_celsius_fahrenheit(feels_like_kelvin)
wind_speed = response['wind']['speed']
humidity = response['main']['humidity']
description = response['weather'][0]['description']
sunrise_time= dt.datetime.utcfromtimestamp(response['sys']['sunrise']+response['timezone'])
sunset_time= dt.datetime.utcfromtimestamp(response['sys']['sunset']+response['timezone'])


print(f"Temperature in {CITY}: {temp_celsius:.2f}°C or {temp_fahrenheit:.2f}°F")
print(f"Temperature in {CITY} feels Like: {feels_like_celsius:.2f}°C or {feels_like_fahrenheit:.2f}°F")
print(f"Wind Speedin {CITY}:{wind_speed} m/s")
print(f"Humidity in {CITY}:{humidity}%")
print(f"General Weather in {CITY}:: {description}")
print(f"Sun rises in {CITY} at {sunrise_time} local time.")
print(f"Sun sets in {CITY} at {sunset_time} local time.")




Temperature in New York: 1.28°C or 34.30°F
Temperature in New York feels Like: -5.43°C or 22.23°F
Wind Speedin New York:10.29 m/s
Humidity in New York:38%
General Weather in New York:: clear sky
Sun rises in New York at 2024-02-14 06:52:06 local time.
Sun sets in New York at 2024-02-14 17:28:34 local time.


##**2) Kaggle API**

Kaggle is a web site created to run data science competitions, and sometimes offers significant cash prizes for the team that builds the best predictive model.  They have published many data sets on a variety of topics.  See https://www.kaggle.com/datasets  

In order to use the API, you must first get a username and key on the Kaggle site.  See https://www.kaggle.com/docs/api

1. Install the kaggle library

In [6]:
!pip install kaggle --quiet

2. Authenticate

In [7]:
#replace with your username and key
%env KAGGLE_USERNAME=scitech06
%env KAGGLE_KEY=4a563980abfd46103a500cf7bbb6cf35

env: KAGGLE_USERNAME=scitech06
env: KAGGLE_KEY=4a563980abfd46103a500cf7bbb6cf35


3. Fetch the data
For this example, we will be using COVID-19 data.

You can find it here: https://www.kaggle.com/datasets/johnjdavisiv/us-counties-covid19-weather-sociohealth-data

In [8]:
!kaggle datasets download -d johnjdavisiv/us-counties-covid19-weather-sociohealth-data

Downloading us-counties-covid19-weather-sociohealth-data.zip to /content
 99% 588M/591M [00:28<00:00, 24.1MB/s]
100% 591M/591M [00:28<00:00, 21.7MB/s]


In [9]:
!unzip /content/us-counties-covid19-weather-sociohealth-data.zip

Archive:  /content/us-counties-covid19-weather-sociohealth-data.zip
  inflating: US_counties_COVID19_health_weather_data.csv  
  inflating: us_county_geometry.csv  
  inflating: us_county_sociohealth_data.csv  


In [10]:
# Import the unzipped .csv files to dataframes using pandas
import pandas as pd
alldata = pd.read_csv("/content/US_counties_COVID19_health_weather_data.csv", header=0)

In [11]:
alldata

Unnamed: 0,date,county,state,fips,cases,deaths,stay_at_home_announced,stay_at_home_effective,lat,lon,...,min_temp_3d_avg,min_temp_5d_avg,min_temp_10d_avg,min_temp_15d_avg,dewpoint_3d_avg,dewpoint_5d_avg,dewpoint_10d_avg,dewpoint_15d_avg,date_stay_at_home_announced,date_stay_at_home_effective
0,2020-01-21,Snohomish,Washington,53061,1,0.0,no,no,48.047489,-121.697307,...,38.266667,38.92,38.44,36.146667,40.333333,41.64,40.74,37.973333,2020-03-23,2020-03-23
1,2020-01-22,Snohomish,Washington,53061,1,0.0,no,no,48.047489,-121.697307,...,39.233333,41.12,39.76,37.613333,42.633333,42.98,41.68,39.440000,2020-03-23,2020-03-23
2,2020-01-23,Snohomish,Washington,53061,1,0.0,no,no,48.047489,-121.697307,...,42.900000,41.74,41.15,38.226667,44.733333,43.72,42.47,40.120000,2020-03-23,2020-03-23
3,2020-01-24,Cook,Illinois,17031,1,0.0,no,no,41.840039,-87.816716,...,32.366667,30.02,27.43,24.886667,31.433333,28.50,25.00,22.693333,2020-03-20,2020-03-21
4,2020-01-24,Snohomish,Washington,53061,1,0.0,no,no,48.047489,-121.697307,...,44.600000,42.54,41.75,38.226667,46.000000,44.30,42.88,41.293333,2020-03-23,2020-03-23
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
790326,2020-12-04,Sweetwater,Wyoming,56037,2077,10.0,no,no,41.659538,-108.879567,...,,,,,,,,,,
790327,2020-12-04,Teton,Wyoming,56039,1724,2.0,no,no,43.934776,-110.589759,...,,,,,,,,,,
790328,2020-12-04,Uinta,Wyoming,56041,1175,5.0,no,no,41.287648,-110.547639,...,,,,,,,,,,
790329,2020-12-04,Washakie,Wyoming,56043,517,8.0,no,no,43.904970,-107.682819,...,,,,,,,,,,


##**3) NYC Open Data**

NYC Open data publishes a wide array of free public data sets collected by NYC government agencies.  https://opendata.cityofnewyork.us/data/

Fetching data from an API and storing it in a dataframe

1. Begin by importing requests, pandas and numpy. If you are graphing, use magic command %matplotlib inline

In [12]:
%matplotlib inline
import requests
import pandas as pd
import numpy as np

2. Find the URL and assign it to a variable. Read in the url and use the .json methods from the requests library to convert it to a native python data structure.

In [13]:
url = 'https://data.cityofnewyork.us/resource/h9gi-nx95.json' #endpoint
results = requests.get(url)
print(f'The result is a response object: {results}')
results = results.json()
print(f'When we use that response object we can call .json for the actual data: {results}')

The result is a response object: <Response [200]>
When we use that response object we can call .json for the actual data: [{'crash_date': '2021-09-11T00:00:00.000', 'crash_time': '2:39', 'on_street_name': 'WHITESTONE EXPRESSWAY', 'off_street_name': '20 AVENUE', 'number_of_persons_injured': '2', 'number_of_persons_killed': '0', 'number_of_pedestrians_injured': '0', 'number_of_pedestrians_killed': '0', 'number_of_cyclist_injured': '0', 'number_of_cyclist_killed': '0', 'number_of_motorist_injured': '2', 'number_of_motorist_killed': '0', 'contributing_factor_vehicle_1': 'Aggressive Driving/Road Rage', 'contributing_factor_vehicle_2': 'Unspecified', 'collision_id': '4455765', 'vehicle_type_code1': 'Sedan', 'vehicle_type_code2': 'Sedan'}, {'crash_date': '2022-03-26T00:00:00.000', 'crash_time': '11:45', 'on_street_name': 'QUEENSBORO BRIDGE UPPER', 'number_of_persons_injured': '1', 'number_of_persons_killed': '0', 'number_of_pedestrians_injured': '0', 'number_of_pedestrians_killed': '0', 'numb

In [14]:
#We can write the code above as:
url = 'https://data.cityofnewyork.us/resource/h9gi-nx95.json'
results = requests.get(url).json() #method chaining techique

In [15]:
type(results)

list

In [16]:
len(results)

1000

In [17]:
results[0] # Since we know it's a list with length 1000 we can look at the first item.

{'crash_date': '2021-09-11T00:00:00.000',
 'crash_time': '2:39',
 'on_street_name': 'WHITESTONE EXPRESSWAY',
 'off_street_name': '20 AVENUE',
 'number_of_persons_injured': '2',
 'number_of_persons_killed': '0',
 'number_of_pedestrians_injured': '0',
 'number_of_pedestrians_killed': '0',
 'number_of_cyclist_injured': '0',
 'number_of_cyclist_killed': '0',
 'number_of_motorist_injured': '2',
 'number_of_motorist_killed': '0',
 'contributing_factor_vehicle_1': 'Aggressive Driving/Road Rage',
 'contributing_factor_vehicle_2': 'Unspecified',
 'collision_id': '4455765',
 'vehicle_type_code1': 'Sedan',
 'vehicle_type_code2': 'Sedan'}

In [18]:
print(results[0]['on_street_name'])
print(results[0]['vehicle_type_code1'])

WHITESTONE EXPRESSWAY
Sedan


In [19]:
#Convert the list into a dataframe
df = pd.DataFrame(results)

In [20]:
df

Unnamed: 0,crash_date,crash_time,on_street_name,off_street_name,number_of_persons_injured,number_of_persons_killed,number_of_pedestrians_injured,number_of_pedestrians_killed,number_of_cyclist_injured,number_of_cyclist_killed,...,latitude,longitude,location,cross_street_name,contributing_factor_vehicle_3,vehicle_type_code_3,contributing_factor_vehicle_4,vehicle_type_code_4,contributing_factor_vehicle_5,vehicle_type_code_5
0,2021-09-11T00:00:00.000,2:39,WHITESTONE EXPRESSWAY,20 AVENUE,2,0,0,0,0,0,...,,,,,,,,,,
1,2022-03-26T00:00:00.000,11:45,QUEENSBORO BRIDGE UPPER,,1,0,0,0,0,0,...,,,,,,,,,,
2,2022-06-29T00:00:00.000,6:55,THROGS NECK BRIDGE,,0,0,0,0,0,0,...,,,,,,,,,,
3,2021-09-11T00:00:00.000,9:35,,,0,0,0,0,0,0,...,40.667202,-73.8665,"{'latitude': '40.667202', 'longitude': '-73.86...",1211 LORING AVENUE,,,,,,
4,2021-12-14T00:00:00.000,8:13,SARATOGA AVENUE,DECATUR STREET,0,0,0,0,0,0,...,40.683304,-73.917274,"{'latitude': '40.683304', 'longitude': '-73.91...",,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,2021-04-14T00:00:00.000,12:47,HENDRIX STREET,ATLANTIC AVENUE,2,0,0,0,0,0,...,40.676594,-73.89038,"{'latitude': '40.676594', 'longitude': '-73.89...",,,,,,,
996,2021-04-16T00:00:00.000,14:30,EAST 64 STREET,,0,0,0,0,0,0,...,40.76468,-73.9643,"{'latitude': '40.76468', 'longitude': '-73.964...",,,,,,,
997,2021-04-15T00:00:00.000,0:00,WEST 155 STREET,BROADWAY,3,0,0,0,0,0,...,40.832764,-73.94583,"{'latitude': '40.832764', 'longitude': '-73.94...",,,,,,,
998,2021-04-14T00:00:00.000,6:55,BROOKLYN QUEENS EXPRESSWAY,,0,0,0,0,0,0,...,40.698544,-73.96236,"{'latitude': '40.698544', 'longitude': '-73.96...",,,,,,,


##**4) Stocks API**

You can find the documentation here: https://www.stockdata.org/documentation

In [21]:
import requests

# Replace 'YOUR_API_KEY' and 'YOUR_ENDPOINT' with your API key and endpoint URL
api_key = 'DMbJL64k7XCY4yaywXx9IC85Rvs5b1NbRHzHanlS'
endpoint = 'https://api.stockdata.org/v1/data/quote?symbols=AAPL%2CTSLA%2CMSFT&api_token=DMbJL64k7XCY4yaywXx9IC85Rvs5b1NbRHzHanlS'

# Creating the request headers
headers = {
    'Authorization': f'Bearer {api_key}',  # Replace 'Bearer' with the appropriate type if needed
    'Content-Type': 'application/json'  # Adjust content type as per API specifications
}

# Making the GET request to the API
response = requests.get(endpoint, headers=headers)

# Checking if the request was successful (status code 200)
if response.status_code == 200:
    data = response.json()
    # Process the data as needed
    print(data)
else:
    print(f"Failed to fetch data. Status code: {response.status_code}")

{'meta': {'requested': 3, 'returned': 3}, 'data': [{'ticker': 'AAPL', 'name': 'Apple Inc', 'exchange_short': None, 'exchange_long': None, 'mic_code': 'IEXG', 'currency': 'USD', 'price': 187.2, 'day_high': 188.65, 'day_low': 186.81, 'day_open': 188.43, '52_week_high': 186.51, '52_week_low': 124.17, 'market_cap': None, 'previous_close_price': 188.84, 'previous_close_price_time': '2024-02-09T15:59:55.000000', 'day_change': -0.88, 'volume': 715026, 'is_extended_hours_price': False, 'last_trade_time': '2024-02-12T16:00:00.000000'}, {'ticker': 'TSLA', 'name': 'Tesla Inc', 'exchange_short': None, 'exchange_long': None, 'mic_code': 'IEXG', 'currency': 'USD', 'price': 188.12, 'day_high': 194.71, 'day_low': 187.3, 'day_open': 192.15, '52_week_high': 314.67, '52_week_low': 101.81, 'market_cap': None, 'previous_close_price': 193.53, 'previous_close_price_time': '2024-02-09T15:59:59.000000', 'day_change': -2.88, 'volume': 459586, 'is_extended_hours_price': False, 'last_trade_time': '2024-02-12T16:0

In [22]:
#We can write the code above as:
url1 = 'https://api.stockdata.org/v1/data/quote?symbols=AAPL%2CTSLA%2CMSFT&api_token=DMbJL64k7XCY4yaywXx9IC85Rvs5b1NbRHzHanlS'
results1 = requests.get(url1).json() #method chaining techique

In [23]:
print(results1)

{'meta': {'requested': 3, 'returned': 3}, 'data': [{'ticker': 'AAPL', 'name': 'Apple Inc', 'exchange_short': None, 'exchange_long': None, 'mic_code': 'IEXG', 'currency': 'USD', 'price': 187.2, 'day_high': 188.65, 'day_low': 186.81, 'day_open': 188.43, '52_week_high': 186.51, '52_week_low': 124.17, 'market_cap': None, 'previous_close_price': 188.84, 'previous_close_price_time': '2024-02-09T15:59:55.000000', 'day_change': -0.88, 'volume': 715026, 'is_extended_hours_price': False, 'last_trade_time': '2024-02-12T16:00:00.000000'}, {'ticker': 'TSLA', 'name': 'Tesla Inc', 'exchange_short': None, 'exchange_long': None, 'mic_code': 'IEXG', 'currency': 'USD', 'price': 188.12, 'day_high': 194.71, 'day_low': 187.3, 'day_open': 192.15, '52_week_high': 314.67, '52_week_low': 101.81, 'market_cap': None, 'previous_close_price': 193.53, 'previous_close_price_time': '2024-02-09T15:59:59.000000', 'day_change': -2.88, 'volume': 459586, 'is_extended_hours_price': False, 'last_trade_time': '2024-02-12T16:0

In [24]:
type(results1)

dict

In [25]:
len(results1)

2

In [26]:
results1['data'][0]

{'ticker': 'AAPL',
 'name': 'Apple Inc',
 'exchange_short': None,
 'exchange_long': None,
 'mic_code': 'IEXG',
 'currency': 'USD',
 'price': 187.2,
 'day_high': 188.65,
 'day_low': 186.81,
 'day_open': 188.43,
 '52_week_high': 186.51,
 '52_week_low': 124.17,
 'market_cap': None,
 'previous_close_price': 188.84,
 'previous_close_price_time': '2024-02-09T15:59:55.000000',
 'day_change': -0.88,
 'volume': 715026,
 'is_extended_hours_price': False,
 'last_trade_time': '2024-02-12T16:00:00.000000'}

In [27]:
results1['data'][1]

{'ticker': 'TSLA',
 'name': 'Tesla Inc',
 'exchange_short': None,
 'exchange_long': None,
 'mic_code': 'IEXG',
 'currency': 'USD',
 'price': 188.12,
 'day_high': 194.71,
 'day_low': 187.3,
 'day_open': 192.15,
 '52_week_high': 314.67,
 '52_week_low': 101.81,
 'market_cap': None,
 'previous_close_price': 193.53,
 'previous_close_price_time': '2024-02-09T15:59:59.000000',
 'day_change': -2.88,
 'volume': 459586,
 'is_extended_hours_price': False,
 'last_trade_time': '2024-02-12T16:00:00.000000'}

In [28]:
#Convert the dictionary into a dataframe
df1 = pd.DataFrame(results1['data'])

In [29]:
df1

Unnamed: 0,ticker,name,exchange_short,exchange_long,mic_code,currency,price,day_high,day_low,day_open,52_week_high,52_week_low,market_cap,previous_close_price,previous_close_price_time,day_change,volume,is_extended_hours_price,last_trade_time
0,AAPL,Apple Inc,,,IEXG,USD,187.2,188.65,186.81,188.43,186.51,124.17,,188.84,2024-02-09T15:59:55.000000,-0.88,715026,False,2024-02-12T16:00:00.000000
1,TSLA,Tesla Inc,,,IEXG,USD,188.12,194.71,187.3,192.15,314.67,101.81,,193.53,2024-02-09T15:59:59.000000,-2.88,459586,False,2024-02-12T16:00:00.000000
2,MSFT,Microsoft Corporation,,,IEXG,USD,415.26,420.68,414.77,420.5,349.84,213.43,,420.42,2024-02-09T16:00:00.000000,-1.24,296981,False,2024-02-12T16:00:00.000000


##**5) Yahoo Finance API**

1a. Install yfinance

In [30]:
#install package
!pip install yfinance --quiet
#https://pypi.org/project/yfinance/

In [31]:
%load_ext google.colab.data_table
#To disable the display
#%unload_ext google.colab.data_table

1b. Begin by importing requests, pandas, matplotlib, numpy, and yfinance. If you are graphing, use magic command %matplotlib inline when if you call .plot from pandas or matplotlib.

In [32]:
%matplotlib inline
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt

  _empty_series = pd.Series()


2a. Make a request.Explore the documentation to learn more about this package and it's helper tools to access API. [Y Finance API](https://pypi.org/project/yfinance/)

In [33]:
#Figure out how to use the methods to call.
msft = yf.Ticker("MSFT")


# get historical market data
hist = msft.history(period="max")
print(hist)

                                 Open        High         Low       Close  \
Date                                                                        
1986-03-13 00:00:00-05:00    0.054893    0.062965    0.054893    0.060274   
1986-03-14 00:00:00-05:00    0.060274    0.063504    0.060274    0.062427   
1986-03-17 00:00:00-05:00    0.062427    0.064042    0.062427    0.063503   
1986-03-18 00:00:00-05:00    0.063503    0.064042    0.061350    0.061888   
1986-03-19 00:00:00-05:00    0.061888    0.062427    0.060274    0.060812   
...                               ...         ...         ...         ...   
2024-02-08 00:00:00-05:00  414.049988  415.559998  412.529999  414.109985   
2024-02-09 00:00:00-05:00  415.250000  420.820007  415.089996  420.549988   
2024-02-12 00:00:00-05:00  420.559998  420.739990  414.750000  415.260010   
2024-02-13 00:00:00-05:00  404.940002  410.070007  403.390015  406.320007   
2024-02-14 00:00:00-05:00  408.070007  409.839996  404.570007  409.489990   

2b. Make a request for a time frame using yfinance history() method.
Available paramaters for the history() method are:

*  
period: data period to download (Either Use period parameter or use start and end) Valid periods are: 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max

* interval: data interval (intraday data cannot extend last 60 days) Valid intervals are: 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo, 3mo


*   start: If not using period - Download start date string (YYYY-MM-DD) or datetime.
*   end: If not using period - Download end date string (YYYY-MM-DD) or datetime.


*   prepost: Include Pre and Post market data in results? (Default is False)
*   auto_adjust: Adjust all OHLC automatically? (Default is True)

*   actions: Download stock dividends and stock splits events? (Default is True)











In [34]:
hist = msft.history(start='2021-11-28', end='2023-11-29')

In [35]:
hist

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2021-11-29 00:00:00-05:00,328.914254,332.930669,328.717840,330.573853,28563500,0.0,0.0
2021-11-30 00:00:00-05:00,329.287402,331.703137,323.071266,324.642487,42885600,0.0,0.0
2021-12-01 00:00:00-05:00,329.100859,333.176193,323.464134,324.141693,33337600,0.0,0.0
2021-12-02 00:00:00-05:00,324.357712,327.490324,321.902688,323.562286,30766000,0.0,0.0
2021-12-03 00:00:00-05:00,326.017309,326.714557,312.308465,317.198883,41779300,0.0,0.0
...,...,...,...,...,...,...,...
2023-11-21 00:00:00-05:00,375.670013,376.220001,371.119995,373.070007,28423100,0.0,0.0
2023-11-22 00:00:00-05:00,378.000000,379.790009,374.970001,377.850006,23345300,0.0,0.0
2023-11-24 00:00:00-05:00,377.329987,377.970001,375.140015,377.429993,10176600,0.0,0.0
2023-11-27 00:00:00-05:00,376.779999,380.640015,376.200012,378.609985,22179200,0.0,0.0


In [36]:
hist.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 503 entries, 2021-11-29 00:00:00-05:00 to 2023-11-28 00:00:00-05:00
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          503 non-null    float64
 1   High          503 non-null    float64
 2   Low           503 non-null    float64
 3   Close         503 non-null    float64
 4   Volume        503 non-null    int64  
 5   Dividends     503 non-null    float64
 6   Stock Splits  503 non-null    float64
dtypes: float64(6), int64(1)
memory usage: 31.4 KB


4. Obtain multiple time series of stock data

In [37]:
msft = yf.Ticker("MSFT")
aapl= yf.Ticker("AAPL")
ibm = yf.Ticker("IBM")

In [38]:
microsoft = msft.history(start='2001-12-06', end='2023-11-29')
apple = aapl.history(start='2001-12-06', end='2023-11-29')
ibm = ibm.history(start='2001-12-06', end='2023-11-29')

In [39]:
apple

Output hidden; open in https://colab.research.google.com to view.