# Working with APIs

** Data Science for Business - Instructor:  Chris Volinsky**

Notebook created by Aditya Deshpande

##**What is an API?**

- An API, or Application Programming Interface, is a set of rules, protocols, and tools that allows different software applications to communicate with each other.

- Web sites that offer data for download will publish an API that allows users to access the data.  The API enforces the rules of the data owner: how much data you can get, which data elements you can get, etc.

- The examples below are a few examples of these - if you want to download data from a web site, search for their public API.  Use the examples below as guidelines of how to write code to get the data.  **Please do not use web scrapers to get data unless granted permission, this may be a violation of terms of service**.
















Here is a list of APIs for popular websites: https://github.com/realpython/list-of-python-api-wrappers

Below we will work through a few example of APIs that might be interesting or useful.

##**1) Weather API**

https://openweathermap.org/api

In [None]:
#Importing the required libraries
import datetime as dt
import requests

In [None]:
BASE_URL = "http://api.openweathermap.org/data/2.5/weather?"

In [None]:
# Create an account on https://home.openweathermap.org/users/sign_in and enter your API key here
API_KEY="YOUR_API_KEY"

In [None]:
#Type the city of your choice
CITY="New York"

In [None]:
#Get the weather details of the city of your choice
def kelvin_to_celsius_fahrenheit(kelvin):
  celsius = kelvin - 273.15
  fahrenheit = celsius * (9/5) + 32
  return celsius, fahrenheit


url = BASE_URL + "appid=" + API_KEY + "&q=" + CITY
response = requests.get(url).json()

temp_kelvin = response['main']['temp']
temp_celsius, temp_fahrenheit = kelvin_to_celsius_fahrenheit(temp_kelvin)
feels_like_kelvin = response['main']['feels_like']
feels_like_celsius, feels_like_fahrenheit = kelvin_to_celsius_fahrenheit(feels_like_kelvin)
wind_speed = response['wind']['speed']
humidity = response['main']['humidity']
description = response['weather'][0]['description']
sunrise_time= dt.datetime.utcfromtimestamp(response['sys']['sunrise']+response['timezone'])
sunset_time= dt.datetime.utcfromtimestamp(response['sys']['sunset']+response['timezone'])


print(f"Temperature in {CITY}: {temp_celsius:.2f}°C or {temp_fahrenheit:.2f}°F")
print(f"Temperature in {CITY} feels Like: {feels_like_celsius:.2f}°C or {feels_like_fahrenheit:.2f}°F")
print(f"Wind Speedin {CITY}:{wind_speed} m/s")
print(f"Humidity in {CITY}:{humidity}%")
print(f"General Weather in {CITY}:: {description}")
print(f"Sun rises in {CITY} at {sunrise_time} local time.")
print(f"Sun sets in {CITY} at {sunset_time} local time.")




Temperature in New York: 28.87°C or 83.97°F
Temperature in New York feels Like: 28.23°C or 82.81°F
Wind Speedin New York:3.13 m/s
Humidity in New York:37%
General Weather in New York:: clear sky
Sun rises in New York at 2024-07-26 05:47:24 local time.
Sun sets in New York at 2024-07-26 20:17:25 local time.


##**2) Kaggle API**

Kaggle is a web site created to run data science competitions, and sometimes offers significant cash prizes for the team that builds the best predictive model.  They have published many data sets on a variety of topics.  See https://www.kaggle.com/datasets  

In order to use the API, you must first get a username and key on the Kaggle site.  See https://www.kaggle.com/docs/api

1. Install the kaggle library

In [None]:
!pip install kaggle --quiet

2. Authenticate

In [None]:
#replace with your username and key
%env KAGGLE_USERNAME=YOUR_USERNAME
%env KAGGLE_KEY=YOUR_KEY

3. Fetch the data
For this example, we will be using COVID-19 data.

You can find it here: https://www.kaggle.com/datasets/johnjdavisiv/us-counties-covid19-weather-sociohealth-data

In [None]:
!kaggle datasets download -d johnjdavisiv/us-counties-covid19-weather-sociohealth-data

Dataset URL: https://www.kaggle.com/datasets/johnjdavisiv/us-counties-covid19-weather-sociohealth-data
License(s): CC0-1.0
Downloading us-counties-covid19-weather-sociohealth-data.zip to /content
 14% 85.0M/591M [00:07<00:45, 11.7MB/s]
User cancelled operation


In [None]:
!unzip /content/us-counties-covid19-weather-sociohealth-data.zip

Archive:  /content/us-counties-covid19-weather-sociohealth-data.zip
  inflating: US_counties_COVID19_health_weather_data.csv  
  inflating: us_county_geometry.csv  
  inflating: us_county_sociohealth_data.csv  


In [None]:
# Import the unzipped .csv files to dataframes using pandas
import pandas as pd
alldata = pd.read_csv("/content/US_counties_COVID19_health_weather_data.csv", header=0)

In [None]:
alldata

##**3) NYC Open Data**

NYC Open data publishes a wide array of free public data sets collected by NYC government agencies.  https://opendata.cityofnewyork.us/data/

Fetching data from an API and storing it in a dataframe

1. Begin by importing requests, pandas and numpy. If you are graphing, use magic command %matplotlib inline

In [None]:
%matplotlib inline
import requests
import pandas as pd
import numpy as np

2. Find the URL and assign it to a variable. Read in the url and use the .json methods from the requests library to convert it to a native python data structure.

In [None]:
url = 'https://data.cityofnewyork.us/resource/h9gi-nx95.json' #endpoint
results = requests.get(url)
print(f'The result is a response object: {results}')
results = results.json()
print(f'When we use that response object we can call .json for the actual data: {results}')

The result is a response object: <Response [200]>
When we use that response object we can call .json for the actual data: [{'crash_date': '2021-09-11T00:00:00.000', 'crash_time': '2:39', 'on_street_name': 'WHITESTONE EXPRESSWAY', 'off_street_name': '20 AVENUE', 'number_of_persons_injured': '2', 'number_of_persons_killed': '0', 'number_of_pedestrians_injured': '0', 'number_of_pedestrians_killed': '0', 'number_of_cyclist_injured': '0', 'number_of_cyclist_killed': '0', 'number_of_motorist_injured': '2', 'number_of_motorist_killed': '0', 'contributing_factor_vehicle_1': 'Aggressive Driving/Road Rage', 'contributing_factor_vehicle_2': 'Unspecified', 'collision_id': '4455765', 'vehicle_type_code1': 'Sedan', 'vehicle_type_code2': 'Sedan'}, {'crash_date': '2022-03-26T00:00:00.000', 'crash_time': '11:45', 'on_street_name': 'QUEENSBORO BRIDGE UPPER', 'number_of_persons_injured': '1', 'number_of_persons_killed': '0', 'number_of_pedestrians_injured': '0', 'number_of_pedestrians_killed': '0', 'numb

In [None]:
#We can write the code above as:
url = 'https://data.cityofnewyork.us/resource/h9gi-nx95.json'
results = requests.get(url).json() #method chaining techique

In [None]:
type(results)

list

In [None]:
len(results)

1000

In [None]:
results[0] # Since we know it's a list with length 1000 we can look at the first item.

{'crash_date': '2021-09-11T00:00:00.000',
 'crash_time': '2:39',
 'on_street_name': 'WHITESTONE EXPRESSWAY',
 'off_street_name': '20 AVENUE',
 'number_of_persons_injured': '2',
 'number_of_persons_killed': '0',
 'number_of_pedestrians_injured': '0',
 'number_of_pedestrians_killed': '0',
 'number_of_cyclist_injured': '0',
 'number_of_cyclist_killed': '0',
 'number_of_motorist_injured': '2',
 'number_of_motorist_killed': '0',
 'contributing_factor_vehicle_1': 'Aggressive Driving/Road Rage',
 'contributing_factor_vehicle_2': 'Unspecified',
 'collision_id': '4455765',
 'vehicle_type_code1': 'Sedan',
 'vehicle_type_code2': 'Sedan'}

In [None]:
print(results[0]['on_street_name'])
print(results[0]['vehicle_type_code1'])

WHITESTONE EXPRESSWAY
Sedan


In [None]:
#Convert the list into a dataframe
df = pd.DataFrame(results)

In [None]:
df

Unnamed: 0,crash_date,crash_time,on_street_name,off_street_name,number_of_persons_injured,number_of_persons_killed,number_of_pedestrians_injured,number_of_pedestrians_killed,number_of_cyclist_injured,number_of_cyclist_killed,number_of_motorist_injured,number_of_motorist_killed,contributing_factor_vehicle_1,contributing_factor_vehicle_2,collision_id,vehicle_type_code1,vehicle_type_code2,borough,zip_code,latitude,longitude,location,cross_street_name,contributing_factor_vehicle_3,vehicle_type_code_3,contributing_factor_vehicle_4,vehicle_type_code_4,contributing_factor_vehicle_5,vehicle_type_code_5
0,2021-09-11T00:00:00.000,2:39,WHITESTONE EXPRESSWAY,20 AVENUE,2,0,0,0,0,0,2,0,Aggressive Driving/Road Rage,Unspecified,4455765,Sedan,Sedan,,,,,,,,,,,,
1,2022-03-26T00:00:00.000,11:45,QUEENSBORO BRIDGE UPPER,,1,0,0,0,0,0,1,0,Pavement Slippery,,4513547,Sedan,,,,,,,,,,,,,
2,2022-06-29T00:00:00.000,6:55,THROGS NECK BRIDGE,,0,0,0,0,0,0,0,0,Following Too Closely,Unspecified,4541903,Sedan,Pick-up Truck,,,,,,,,,,,,
3,2021-09-11T00:00:00.000,9:35,,,0,0,0,0,0,0,0,0,Unspecified,,4456314,Sedan,,BROOKLYN,11208,40.667202,-73.8665,"{'latitude': '40.667202', 'longitude': '-73.86...",1211 LORING AVENUE,,,,,,
4,2021-12-14T00:00:00.000,8:13,SARATOGA AVENUE,DECATUR STREET,0,0,0,0,0,0,0,0,,,4486609,,,BROOKLYN,11233,40.683304,-73.917274,"{'latitude': '40.683304', 'longitude': '-73.91...",,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,2021-04-14T00:00:00.000,12:47,HENDRIX STREET,ATLANTIC AVENUE,2,0,0,0,0,0,2,0,Turning Improperly,,4407740,Station Wagon/Sport Utility Vehicle,,BROOKLYN,11207,40.676594,-73.89038,"{'latitude': '40.676594', 'longitude': '-73.89...",,,,,,,
996,2021-04-16T00:00:00.000,14:30,EAST 64 STREET,,0,0,0,0,0,0,0,0,Backing Unsafely,,4408392,Sedan,,,,40.76468,-73.9643,"{'latitude': '40.76468', 'longitude': '-73.964...",,,,,,,
997,2021-04-15T00:00:00.000,0:00,WEST 155 STREET,BROADWAY,3,0,0,0,0,0,3,0,Driver Inattention/Distraction,Unspecified,4407822,Sedan,,MANHATTAN,10032,40.832764,-73.94583,"{'latitude': '40.832764', 'longitude': '-73.94...",,,,,,,
998,2021-04-14T00:00:00.000,6:55,BROOKLYN QUEENS EXPRESSWAY,,0,0,0,0,0,0,0,0,Following Too Closely,,4407655,Station Wagon/Sport Utility Vehicle,,,,40.698544,-73.96236,"{'latitude': '40.698544', 'longitude': '-73.96...",,,,,,,


##**4) Stocks API**

You can find the documentation here: https://www.stockdata.org/documentation

In [None]:
import requests

# Replace 'YOUR_API_KEY' and endpoint with your API key and endpoint URL
api_key = 'YOUR_API_KEY'
# Example endpoint:
endpoint = 'https://api.stockdata.org/v1/data/quote?symbols=AAPL%2CTSLA%2CMSFT&api_token=DMbJL64k7XCY4yaywXx9IC85Rvs5b1NbRHzHanlS'

# Creating the request headers
headers = {
    'Authorization': f'Bearer {api_key}',  # Replace 'Bearer' with the appropriate type if needed
    'Content-Type': 'application/json'  # Adjust content type as per API specifications
}

# Making the GET request to the API
response = requests.get(endpoint, headers=headers)

# Checking if the request was successful (status code 200)
if response.status_code == 200:
    data = response.json()
    # Process the data as needed
    print(data)
else:
    print(f"Failed to fetch data. Status code: {response.status_code}")

{'meta': {'requested': 3, 'returned': 3}, 'data': [{'ticker': 'TSLA', 'name': 'Tesla Inc', 'exchange_short': None, 'exchange_long': None, 'mic_code': 'IEXG', 'currency': 'USD', 'price': 215.72, 'day_high': 225.91, 'day_low': 214.72, 'day_open': 225.24, '52_week_high': 314.67, '52_week_low': 101.81, 'market_cap': None, 'previous_close_price': 246.46, 'previous_close_price_time': '2024-07-23T16:00:00.000000', 'day_change': -14.25, 'volume': 1421685, 'is_extended_hours_price': False, 'last_trade_time': '2024-07-24T15:59:59.000000'}, {'ticker': 'AAPL', 'name': 'Apple Inc', 'exchange_short': None, 'exchange_long': None, 'mic_code': 'IEXG', 'currency': 'USD', 'price': 218.63, 'day_high': 224.77, 'day_low': 217.13, 'day_open': 224, '52_week_high': 186.51, '52_week_low': 124.17, 'market_cap': None, 'previous_close_price': 225, 'previous_close_price_time': '2024-07-23T15:59:58.000000', 'day_change': -2.91, 'volume': 1005202, 'is_extended_hours_price': False, 'last_trade_time': '2024-07-24T15:59

In [None]:
#We can write the code above as:
url1 = 'https://api.stockdata.org/v1/data/quote?symbols=AAPL%2CTSLA%2CMSFT&api_token=DMbJL64k7XCY4yaywXx9IC85Rvs5b1NbRHzHanlS'
results1 = requests.get(url1).json() #method chaining techique

In [None]:
print(results1)

{'meta': {'requested': 3, 'returned': 3}, 'data': [{'ticker': 'TSLA', 'name': 'Tesla Inc', 'exchange_short': None, 'exchange_long': None, 'mic_code': 'IEXG', 'currency': 'USD', 'price': 215.72, 'day_high': 225.91, 'day_low': 214.72, 'day_open': 225.24, '52_week_high': 314.67, '52_week_low': 101.81, 'market_cap': None, 'previous_close_price': 246.46, 'previous_close_price_time': '2024-07-23T16:00:00.000000', 'day_change': -14.25, 'volume': 1421685, 'is_extended_hours_price': False, 'last_trade_time': '2024-07-24T15:59:59.000000'}, {'ticker': 'AAPL', 'name': 'Apple Inc', 'exchange_short': None, 'exchange_long': None, 'mic_code': 'IEXG', 'currency': 'USD', 'price': 218.63, 'day_high': 224.77, 'day_low': 217.13, 'day_open': 224, '52_week_high': 186.51, '52_week_low': 124.17, 'market_cap': None, 'previous_close_price': 225, 'previous_close_price_time': '2024-07-23T15:59:58.000000', 'day_change': -2.91, 'volume': 1005202, 'is_extended_hours_price': False, 'last_trade_time': '2024-07-24T15:59

In [None]:
type(results1)

dict

In [None]:
len(results1)

2

In [None]:
results1['data'][0]

{'ticker': 'TSLA',
 'name': 'Tesla Inc',
 'exchange_short': None,
 'exchange_long': None,
 'mic_code': 'IEXG',
 'currency': 'USD',
 'price': 215.72,
 'day_high': 225.91,
 'day_low': 214.72,
 'day_open': 225.24,
 '52_week_high': 314.67,
 '52_week_low': 101.81,
 'market_cap': None,
 'previous_close_price': 246.46,
 'previous_close_price_time': '2024-07-23T16:00:00.000000',
 'day_change': -14.25,
 'volume': 1421685,
 'is_extended_hours_price': False,
 'last_trade_time': '2024-07-24T15:59:59.000000'}

In [None]:
results1['data'][1]

{'ticker': 'AAPL',
 'name': 'Apple Inc',
 'exchange_short': None,
 'exchange_long': None,
 'mic_code': 'IEXG',
 'currency': 'USD',
 'price': 218.63,
 'day_high': 224.77,
 'day_low': 217.13,
 'day_open': 224,
 '52_week_high': 186.51,
 '52_week_low': 124.17,
 'market_cap': None,
 'previous_close_price': 225,
 'previous_close_price_time': '2024-07-23T15:59:58.000000',
 'day_change': -2.91,
 'volume': 1005202,
 'is_extended_hours_price': False,
 'last_trade_time': '2024-07-24T15:59:59.000000'}

In [None]:
#Convert the dictionary into a dataframe
df1 = pd.DataFrame(results1['data'])

In [None]:
df1

Unnamed: 0,ticker,name,exchange_short,exchange_long,mic_code,currency,price,day_high,day_low,day_open,52_week_high,52_week_low,market_cap,previous_close_price,previous_close_price_time,day_change,volume,is_extended_hours_price,last_trade_time
0,TSLA,Tesla Inc,,,IEXG,USD,215.72,225.91,214.72,225.24,314.67,101.81,,246.46,2024-07-23T16:00:00.000000,-14.25,1421685,False,2024-07-24T15:59:59.000000
1,AAPL,Apple Inc,,,IEXG,USD,218.63,224.77,217.13,224.0,186.51,124.17,,225.0,2024-07-23T15:59:58.000000,-2.91,1005202,False,2024-07-24T15:59:59.000000
2,MSFT,Microsoft Corporation,,,IEXG,USD,428.99,441.46,427.7,440.15,349.84,213.43,,444.62,2024-07-23T15:59:54.000000,-3.64,405372,False,2024-07-24T15:59:59.000000


##**5) Yahoo Finance API**

1a. Install yfinance

In [None]:
#install package
!pip install yfinance --quiet
#https://pypi.org/project/yfinance/

In [None]:
%load_ext google.colab.data_table
#To disable the display
#%unload_ext google.colab.data_table

1b. Begin by importing requests, pandas, matplotlib, numpy, and yfinance. If you are graphing, use magic command %matplotlib inline when if you call .plot from pandas or matplotlib.

In [None]:
%matplotlib inline
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt

2a. Make a request.Explore the documentation to learn more about this package and it's helper tools to access API. [Y Finance API](https://pypi.org/project/yfinance/)

In [None]:
#Figure out how to use the methods to call.
msft = yf.Ticker("MSFT")


# get historical market data
hist = msft.history(period="max")
print(hist)

                                 Open        High         Low       Close  \
Date                                                                        
1986-03-13 00:00:00-05:00    0.054693    0.062736    0.054693    0.060055   
1986-03-14 00:00:00-05:00    0.060055    0.063272    0.060055    0.062199   
1986-03-17 00:00:00-05:00    0.062199    0.063808    0.062199    0.063272   
1986-03-18 00:00:00-05:00    0.063272    0.063808    0.061127    0.061663   
1986-03-19 00:00:00-05:00    0.061663    0.062199    0.060055    0.060591   
...                               ...         ...         ...         ...   
2024-07-22 00:00:00-04:00  441.790009  444.600006  438.910004  442.940002   
2024-07-23 00:00:00-04:00  443.899994  448.390015  443.100006  444.850006   
2024-07-24 00:00:00-04:00  440.450012  441.480011  427.589996  428.899994   
2024-07-25 00:00:00-04:00  428.799988  429.799988  417.510010  418.399994   
2024-07-26 00:00:00-04:00  418.190002  428.910004  417.309998  424.429993   

2b. Make a request for a time frame using yfinance history() method.
Available paramaters for the history() method are:

*  
period: data period to download (Either Use period parameter or use start and end) Valid periods are: 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max

* interval: data interval (intraday data cannot extend last 60 days) Valid intervals are: 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo, 3mo


*   start: If not using period - Download start date string (YYYY-MM-DD) or datetime.
*   end: If not using period - Download end date string (YYYY-MM-DD) or datetime.


*   prepost: Include Pre and Post market data in results? (Default is False)
*   auto_adjust: Adjust all OHLC automatically? (Default is True)

*   actions: Download stock dividends and stock splits events? (Default is True)











In [None]:
hist = msft.history(start='2021-11-28', end='2023-11-29')

In [None]:
hist

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2021-11-29 00:00:00-05:00,327.716015,331.717799,327.520317,329.369568,28563500,0.0,0.0
2021-11-30 00:00:00-05:00,328.087802,330.494736,321.894311,323.459808,42885600,0.0,0.0
2021-12-01 00:00:00-05:00,327.901946,331.962433,322.285755,322.960846,33337600,0.0,0.0
2021-12-02 00:00:00-05:00,323.176072,326.297273,320.729992,322.383545,30766000,0.0,0.0
2021-12-03 00:00:00-05:00,324.829604,325.524312,311.170703,316.043304,41779300,0.0,0.0
...,...,...,...,...,...,...,...
2023-11-21 00:00:00-05:00,374.301441,374.849425,369.767999,371.710907,28423100,0.0,0.0
2023-11-22 00:00:00-05:00,376.622958,378.406446,373.603998,376.473511,23345300,0.0,0.0
2023-11-24 00:00:00-05:00,375.955351,376.593034,373.773357,376.054993,10176600,0.0,0.0
2023-11-27 00:00:00-05:00,375.407393,379.253347,374.829519,377.230713,22179200,0.0,0.0


In [None]:
hist.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 503 entries, 2021-11-29 00:00:00-05:00 to 2023-11-28 00:00:00-05:00
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          503 non-null    float64
 1   High          503 non-null    float64
 2   Low           503 non-null    float64
 3   Close         503 non-null    float64
 4   Volume        503 non-null    int64  
 5   Dividends     503 non-null    float64
 6   Stock Splits  503 non-null    float64
dtypes: float64(6), int64(1)
memory usage: 31.4 KB


4. Obtain multiple time series of stock data

In [None]:
msft = yf.Ticker("MSFT")
aapl= yf.Ticker("AAPL")
ibm = yf.Ticker("IBM")

In [None]:
microsoft = msft.history(start='2001-12-06', end='2023-11-29')
apple = aapl.history(start='2001-12-06', end='2023-11-29')
ibm = ibm.history(start='2001-12-06', end='2023-11-29')

In [None]:
apple

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2001-12-06 00:00:00-05:00,0.354486,0.354788,0.334256,0.343918,338934400,0.0,0.0
2001-12-07 00:00:00-05:00,0.339087,0.342862,0.332142,0.340295,203515200,0.0,0.0
2001-12-10 00:00:00-05:00,0.336521,0.347089,0.335614,0.340295,170010400,0.0,0.0
2001-12-11 00:00:00-05:00,0.342257,0.344975,0.326858,0.328821,205475200,0.0,0.0
2001-12-12 00:00:00-05:00,0.330180,0.330935,0.320819,0.324442,192460800,0.0,0.0
...,...,...,...,...,...,...,...
2023-11-21 00:00:00-05:00,190.907126,191.016838,189.241515,190.139145,38134500,0.0,0.0
2023-11-22 00:00:00-05:00,190.986939,192.423143,190.328669,190.807404,39617700,0.0,0.0
2023-11-24 00:00:00-05:00,190.368531,190.398451,188.752792,189.470901,24048300,0.0,0.0
2023-11-27 00:00:00-05:00,189.421030,190.169059,188.403705,189.291367,40552600,0.0,0.0


##**6) US Census API**

 https://colab.research.google.com/drive/1Ipe6v8CtHXzGIkKCivwKx9UPGYmkVsQu?usp=sharing

Please follow the link above for accesing the US Census API