# Data Wrangling Challenge
### Pull and manipulate the API data

The point of this exercise is to try data enrichment with data from external APIs. We are going to take data about car crashes in Monroe County, Indiana from 2003 to 2015 and try to figure out the weather during the accident and how many bars there are in the area. We will work with two different APIs during this challenge:

- Foursquare API
- Visual Crossing API

We will try to find correlations between the severity of crash and weather/number of bars in the area. To indicate the severity of a crash, we will use column `Injury Type`.

## Data

The data for this exercise can be found [here](https://drive.google.com/file/d/1_KF9oIJV8cB8i3ngA4JPOLWIE_ETE6CJ/view?usp=sharing).

Just run the cells below to get your data ready. Little help from us.


In [12]:
import pandas as pd
import os
from IPython.display import JSON

In [13]:
data = pd.read_csv("data/monroe-county-crash-data2003-to-2015.csv", encoding="unicode_escape")
# ========================
# preparing data
data.dropna(subset=['Latitude', 'Longitude'], inplace=True)
# creation of variable with lon and lat together
data['ll'] = data['Latitude'].astype(str) + ',' + data['Longitude'].astype(str)
data = data[data['ll'] != '0.0,0.0']
print(data.shape)
data.head()

(49005, 13)


Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874,"39.15920668,-86.52587356"
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848,"39.16144,-86.534848"
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889,"39.14978027,-86.56889006"
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956,"39.165655,-86.57595635"
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625,"39.164848,-86.57962482"


In [14]:
data.shape

(49005, 13)

In [15]:
random_sample = data.sample(n=250)

In [16]:
random_sample.head()

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll
50489,211026,2003,10,6,Weekday,1200.0,2-Car,No injury/unknown,ALCOHOLIC BEVERAGES,10TH & LINCOLN,39.171849,-86.530992,"39.17184928,-86.530992"
21719,901492887,2010,9,4,Weekday,1400.0,3+ Cars,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,11TH & FEE LN,39.173324,-86.518916,"39.17332407,-86.51891649"
30655,900961824,2008,9,1,Weekend,1700.0,2-Car,No injury/unknown,UNSAFE BACKING,JONES AVE & JORDAN,39.1654,-86.51641,"39.1654,-86.51641"
50428,177263,2003,12,4,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,COLLEGE AVE & ELEVENTH,39.173232,-86.534752,"39.173232,-86.534752"
29555,900883319,2008,4,2,Weekday,800.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,FORREST & SEVENTEENTH ST,39.17902,-86.5225,"39.17902,-86.5225"


In [17]:
random_sample.shape

(250, 13)

In [18]:
df = random_sample["Injury Type"].copy()

# Foursquare API

Foursquare API documentation is [here](https://developer.foursquare.com/)

1. Start a foursquare application and get your keys.
2. For each crash, create the function **get_venues** that will pull bars in the radius of 5km around the crash

#### example
`get_venues('48.146394, 17.107969')`

3. Find a relationship (if there is any) between number of bars in the area and severity of the crash.

HINTs: 
- check out python package "foursquare" (no need to send HTTP requests directly with library `requests`)
- **categoryId** for bars and nightlife needs to be found in the [foursquare API documentation](https://developer.foursquare.com/docs/api-reference/venues/search/)

In [19]:
import foursquare

In [21]:
#set the keys
foursquare_id = os.environ["FOURSQUARE_CLIENT_ID"]
foursquare_secret = os.environ["FOURSQUARE_CLIENT_SECRET"]

client = foursquare.Foursquare(client_id=foursquare_id, client_secret=foursquare_secret, version='20210630')

___

In [22]:
def get_venues(ll, categoryId="4d4b7105d754a06376d81259", radius="5000", limit="50"):
    venue_info = client.venues.search(params={"ll":ll, "categoryId":categoryId, "radius":radius, "limit":limit})
    df_venue_info = pd.json_normalize(venue_info, record_path="venues")
    return df_venue_info
    

In [23]:
# Get latitude and longitude from the random sample of crashes
lls = list(random_sample["ll"].values)

In [24]:
# query foursquare for bar information within 5km of each crash
responses = []
n_bars = []
for ll in lls:
    res = get_venues(ll)
    responses.append(res)
    n_bars.append(res.shape[0])

In [29]:
df = df.to_frame()

In [36]:
# add this info to the dataframe
n_bars_col = pd.Series(n_bars)
df["n_bars_nearby"] = n_bars_col.values

In [37]:
df

Unnamed: 0,Injury Type,n_bars_nearby
50489,No injury/unknown,48
21719,Non-incapacitating,47
30655,No injury/unknown,48
50428,No injury/unknown,48
29555,No injury/unknown,47
...,...,...
5434,No injury/unknown,48
275,Incapacitating,33
26658,No injury/unknown,47
31798,Non-incapacitating,48


In [64]:
df.groupby("Injury Type").mean()

Unnamed: 0_level_0,n_bars_nearby
Injury Type,Unnamed: 1_level_1
Fatal,41.5
Incapacitating,31.0
No injury/unknown,42.074627
Non-incapacitating,37.790698


In [65]:
df.groupby("Injury Type").count()

Unnamed: 0_level_0,n_bars_nearby
Injury Type,Unnamed: 1_level_1
Fatal,2
Incapacitating,4
No injury/unknown,201
Non-incapacitating,43


# Visual Crossing API

Virtual Crossing API documentation is [here](https://www.visualcrossing.com/resources/documentation/)

1. Sign up for FREE api key if you haven't done that before.
2. For each crush, get the weather for the location and date.
3. Find a relationship between the weather and severity of the crash.

Hints:

* randomly sample only 250 or so (due to API limits), or pull weather only for smaller sample of crashes
* for sending HTTP requests check out "requests" library [here](http://docs.python-requests.org/en/master/)


In [26]:
import requests
import time
api_key = os.environ["WORLD_WEATHER_API_KEY"]

In [40]:
random_sample.head(20)

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll
50489,211026,2003,10,6,Weekday,1200.0,2-Car,No injury/unknown,ALCOHOLIC BEVERAGES,10TH & LINCOLN,39.171849,-86.530992,"39.17184928,-86.530992"
21719,901492887,2010,9,4,Weekday,1400.0,3+ Cars,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,11TH & FEE LN,39.173324,-86.518916,"39.17332407,-86.51891649"
30655,900961824,2008,9,1,Weekend,1700.0,2-Car,No injury/unknown,UNSAFE BACKING,JONES AVE & JORDAN,39.1654,-86.51641,"39.1654,-86.51641"
50428,177263,2003,12,4,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,COLLEGE AVE & ELEVENTH,39.173232,-86.534752,"39.173232,-86.534752"
29555,900883319,2008,4,2,Weekday,800.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,FORREST & SEVENTEENTH ST,39.17902,-86.5225,"39.17902,-86.5225"
46561,1907837,2004,7,7,Weekend,200.0,1-Car,No injury/unknown,UNSAFE SPEED,AMY & PETE ELLIS DR,39.168594,-86.495539,"39.16859355,-86.49553949"
51415,211663,2003,11,3,Weekday,1100.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,3RD ST & LINCOLN,39.16464,-86.531024,"39.16464,-86.531024"
19189,901653695,2011,6,2,Weekday,1300.0,2-Car,No injury/unknown,DISREGARD SIGNAL/REG SIGN,E HUNTER & S WOODLAWN AVE,39.16224,-86.522576,"39.16224,-86.522576"
27504,901204435,2009,11,1,Weekend,200.0,2-Car,No injury/unknown,ANIMAL/OBJECT IN ROADWAY,FRIENDSHIP & SR46E,39.15253,-86.406262,"39.15252993,-86.40626176"
14428,901769343,2012,1,6,Weekday,1600.0,2-Car,No injury/unknown,IMPROPER TURNING,3RD ST & CURRY,39.16466,-86.58292,"39.16465983,-86.58292002"


In [39]:
endpoint = "http://api.worldweatheronline.com/premium/v1/past-weather.ashx"

In [None]:
def get_weather():
    requests.get(endpoint, params={"q":})

In [48]:
weather = requests.get(endpoint, params={"q":"48.834,-2.394",
                                         "date":"2010-09-04",
                                         "format":"json",
                                        "key":api_key})

In [55]:
pd.json_normalize(weather.json(), record_path="request")

Unnamed: 0,data.request,data.weather
0,"[{'type': 'LatLon', 'query': 'Lat 48.83 and Lo...","[{'date': '2010-09-04', 'astronomy': [{'sunris..."


In [61]:
pd.json_normalize(weather.json(), record_path=["data", "weather", "hourly"])

Unnamed: 0,time,tempC,tempF,windspeedMiles,windspeedKmph,winddirDegree,winddir16Point,weatherCode,weatherIconUrl,weatherDesc,...,HeatIndexF,DewPointC,DewPointF,WindChillC,WindChillF,WindGustMiles,WindGustKmph,FeelsLikeC,FeelsLikeF,uvIndex
0,0,15,58,8,12,81,E,113,[{'value': 'http://cdn.worldweatheronline.com/...,[{'value': 'Clear'}],...,58,13,55,14,57,16,26,14,57,1
1,300,14,57,6,10,126,SE,113,[{'value': 'http://cdn.worldweatheronline.com/...,[{'value': 'Clear'}],...,57,13,55,13,56,14,22,13,56,1
2,600,14,57,5,8,142,SE,113,[{'value': 'http://cdn.worldweatheronline.com/...,[{'value': 'Sunny'}],...,57,12,54,14,56,10,16,14,56,4
3,900,19,67,4,7,135,SE,113,[{'value': 'http://cdn.worldweatheronline.com/...,[{'value': 'Sunny'}],...,67,13,56,19,67,6,10,19,67,5
4,1200,24,75,3,5,140,SE,113,[{'value': 'http://cdn.worldweatheronline.com/...,[{'value': 'Sunny'}],...,77,14,58,24,75,4,7,25,77,6
5,1500,25,77,2,3,21,NNE,113,[{'value': 'http://cdn.worldweatheronline.com/...,[{'value': 'Sunny'}],...,79,14,58,25,77,3,4,26,79,7
6,1800,21,70,7,11,57,ENE,113,[{'value': 'http://cdn.worldweatheronline.com/...,[{'value': 'Clear'}],...,70,14,57,21,70,15,23,21,70,1
7,2100,19,67,8,13,115,ESE,113,[{'value': 'http://cdn.worldweatheronline.com/...,[{'value': 'Clear'}],...,67,14,57,19,67,16,26,19,67,1
