# Data Wrangling Challenge
### Pull and manipulate the API data

The point of this exercise is to try data enrichment with data from external APIs. We are going to take data about car crashes in Monroe County, Indiana from 2003 to 2015 and try to figure out the weather during the accident and how many bars there are in the area. We will work with two different APIs during this challenge:

- Foursquare API
- World Weather Online API

We will try to find correlations between the severity of crash and weather/number of bars in the area. To indicate the severity of a crash, we will use column `Injury Type`.

## Data

The data for this exercise can be found [here](https://drive.google.com/file/d/1_KF9oIJV8cB8i3ngA4JPOLWIE_ETE6CJ/view?usp=sharing).

Just run the cells below to get your data ready. Little help from us.


In [1]:
from pprint import pprint
from dotenv import load_dotenv
load_dotenv()
import json
import numpy as np
import pandas as pd
import os # use this to access your environment variables
import requests # this will be used to call the APIs

In [2]:
data = pd.read_csv("./_data/monroe-county-crash-data2003-to-2015.csv", encoding="unicode_escape")

# replace zeroes with nan
data['Latitude'] = [np.nan if data.loc[i, 'Latitude'] == 0 else data.loc[i, 'Latitude']for i in range(data.shape[0])]
data['Longitude'] = [np.nan if data.loc[i, 'Longitude'] == 0 else data.loc[i, 'Longitude']for i in range(data.shape[0])]

# preparing data by removing nan values
data.dropna(subset=['Latitude', 'Longitude'], inplace=True)
data.reset_index(drop=True, inplace=True)
print(data.shape)
data.head()

(49005, 12)


Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625


In [3]:
data.tail()

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude
49000,521147,2003,4,7,Weekend,2000.0,1-Car,No injury/unknown,UNSAFE SPEED,POPCORN RD & ROCKPORT,39.004275,-86.581375
49001,521157,2003,5,7,Weekend,1500.0,1-Car,Non-incapacitating,UNSAFE SPEED,GUTHERIE RD & PRINCE,39.002752,-86.463856
49002,900087672,2003,11,3,Weekday,2300.0,1-Car,No injury/unknown,ROADWAY SURFACE CONDITION,INGRAM & SR37,38.992326,-86.537252
49003,919959,2003,12,7,Weekend,1700.0,1-Car,Non-incapacitating,UNSAFE SPEED,BARTLETTSVILLE & CHAPLE HILL RD,38.99152,-86.448784
49004,900046599,2003,6,7,Weekend,800.0,1-Car,No injury/unknown,,OLD & SR446,38.990848,-86.368864


In this section you'll use the requests library to access the Foursquare places API and pull points of interest
The Python Requests library is a popular and user-friendly HTTP library that simplifies the process of making HTTP requests.
It provides an elegant and intuitive API for sending various types of requests, handling headers, cookies, and authentication, 
making it an excellent choice for web scraping, API integration, and general HTTP communication tasks in Python applications.

# Foursquare API

Foursquare API documentation is [here](https://location.foursquare.com/developer/reference/place-search/)

1. Start a foursquare application and get your keys.
2. For each crash, create the function **get_venues** that will pull bars in the radius of 5km around the crash
3. Find a relationship (if there is any) between number of bars in the area and severity of the crash.

HINT: 
- **categories** for bars and nightlife needs to be found in the [foursquare API documentation](https://location.foursquare.com/places/docs/categories/)
- you'll have to parse latitude and longitude together as a string, separated by a comma for the API

In [4]:
# pay no attention to the OAuth credentials, you don't need them
# if you didn't save your API key when you first created it, you'll have to make a new one
# import your foursquare API key from an environment variable
# this is the safest way to store your API key
# be sure to pass the environment variable as you named it - it may not be the same as below
FOURSQUARE_KEY = os.environ['FOURSQUARE_KEY']

In [5]:
def get_venues_fs(latitude:float, longitude:float, radius:int, api_key:str, categories:str) -> str:
    """
    Get venues from foursquare with a specified place type and coordinates.
    Args:
        latitude (float): latitude for query (must be combined with longitude)
        longitude (float): longitude for query (must be combined with latitude)
        radius (int32) : search radius in meters around the latitude/longitude coordinates
        api_key (str): foursquare API to use for query
        categories (str) : Foursquare-recognized place type. If not passed no place_type will be specified. Separate ids with commas
    
    Returns:
        response: response object from the requests library.
    """
    
    url = 'https://api.foursquare.com/v3/places/search'
    headers = {
        'accept' : 'application/json',
        'Authorization' : api_key}
    params = {
        'll' : f'{latitude},{longitude}',
        'radius' : radius,
        'categories' : categories}
    
    response = requests.get(
        url,
        headers=headers,
        params=params)
    
    return response

In [6]:
# testing
res = get_venues_fs(latitude=51.51, longitude=-0.1337, radius=5000, api_key=FOURSQUARE_KEY, categories='13003')
res.status_code

200

## Using FSQ API to Search Bars in Vicinity of Crash Data in Monroe County

### Count number of venues retrieved

In [7]:
# specific example of retrieval
fsq_bar_data = get_venues_fs(
  latitude=39.159207,
  longitude=-86.525874,
  radius=5000,
  api_key=FOURSQUARE_KEY,
  categories='13003'
)

fsq_results = pd.json_normalize(
  fsq_bar_data.json(),
  record_path=['results'],
  errors='ignore')

print(f'{fsq_results.shape[0]} results recorded.')

10 results recorded.


In [18]:
# generalizing the process
# stopped short due to fsq restrictions on data pull
# for i in range(data.shape[0]):
#   latitude = data.loc[i, 'Latitude']
#   longitude = data.loc[i, 'Longitude']
  
#   results = get_venues_fs(
#     latitude=latitude,
#     longitude=longitude,
#     categories='13003',
#     radius=5000,
#     api_key=FOURSQUARE_KEY,
#   )
  
#   results_json = pd.json_normalize(
#     fsq_bar_data.json(),
#     record_path=['results'],
#     errors='ignore')
  
#   data.loc[i, 'Bar Count'] = results_json.shape[0]

In [10]:
data.to_json('./_data/monroe_with_bars.json')

In [17]:
df = pd.read_json('./_data/monroe_with_bars.json')
print(df.shape)
df.head()

(49005, 13)


Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,Bar Count
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874,10.0
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848,10.0
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889,10.0
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956,10.0
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625,10.0


In [20]:
df['Bar Count'].unique()

array([10., nan])

# World Weather Online API

World Weather Online API is [here](https://www.worldweatheronline.com/developer/api/historical-weather-api.aspx)

1. Sign up for FREE api key if you haven't done that before (it's free for **30 days**).
2. For each crush, get the weather for the location and date.
3. Find a relationship between the weather and severity of the crash.

Hints:

* pull weather only for smaller sample of crashes (250 or so) due to API limits
* for sending HTTP requests check out "requests" library [here](http://docs.python-requests.org/en/master/)


In [None]:
import time
api_key = os.getenv("<>")