# Data Wrangling Challenge
### Pull and manipulate the API data

The point of this exercise is to try data enrichment with data from external APIs. We are going to take data about car crashes in Monroe County, Indiana from 2003 to 2015 and try to figure out the weather during the accident and how many bars there are in the area. We will work with two different APIs during this challenge:

- Foursquare API
- Visual Crossing API

We will try to find correlations between the severity of crash and weather/number of bars in the area. To indicate the severity of a crash, we will use column `Injury Type`.

## Data

The data for this exercise can be found [here](https://drive.google.com/file/d/1_KF9oIJV8cB8i3ngA4JPOLWIE_ETE6CJ/view?usp=sharing).

Just run the cells below to get your data ready. Little help from us.


In [41]:
import pandas as pd
import os
import pprint

In [4]:
data = pd.read_csv("monroe-county-crash-data2003-to-2015.csv", encoding="unicode_escape")
# ========================
# preparing data
data.dropna(subset=['Latitude', 'Longitude'], inplace=True)
# creation of variable with lon and lat together
data['ll'] = data['Latitude'].astype(str) + ',' + data['Longitude'].astype(str)
data = data[data['ll'] != '0.0,0.0']
print(data.shape)
data.head(15)

(49005, 13)


Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874,"39.15920668,-86.52587356"
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848,"39.16144,-86.534848"
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889,"39.14978027,-86.56889006"
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956,"39.165655,-86.57595635"
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625,"39.164848,-86.57962482"
5,902364664,2015,1,6,Weekday,1800.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,BURKS & WALNUT,39.12667,-86.53137,"39.12666969,-86.53136998"
6,902364682,2015,1,6,Weekday,1200.0,2-Car,No injury/unknown,DRIVER DISTRACTED - EXPLAIN IN NARRATIVE,SOUTH CURRY PIKE LOT 71,39.150825,-86.584899,"39.150825,-86.584899"
7,902364683,2015,1,6,Weekday,1400.0,1-Car,Incapacitating,ENGINE FAILURE OR DEFECTIVE,NORTH LOUDEN RD,39.199272,-86.637024,"39.19927216,-86.63702393"
8,902364714,2015,1,7,Weekend,1400.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,LIBERTY & W 3RD,39.16461,-86.57913,"39.16461021,-86.57913007"
9,902364756,2015,1,7,Weekend,1600.0,1-Car,No injury/unknown,RAN OFF ROAD RIGHT,PATTERSON & W 3RD,39.16344,-86.55128,"39.16344009,-86.55128002"


In [115]:
data.columns

Index(['Master Record Number', 'Year', 'Month', 'Day', 'Weekend?', 'Hour',
       'Collision Type', 'Injury Type', 'Primary Factor', 'Reported_Location',
       'Latitude', 'Longitude', 'll', 'll_string'],
      dtype='object')

# Foursquare API

Foursquare API documentation is [here](https://developer.foursquare.com/)

1. Start a foursquare application and get your keys.
2. For each crash, create the function **get_venues** that will pull bars in the radius of 5km around the crash

#### example
`get_venues('48.146394, 17.107969')`

3. Find a relationship (if there is any) between number of bars in the area and severity of the crash.

HINTs: 
- check out python package "foursquare" (no need to send HTTP requests directly with library `requests`)
- **categoryId** for bars and nightlife needs to be found in the [foursquare API documentation](https://developer.foursquare.com/docs/api-reference/venues/search/)

In [103]:
#set the keys
foursquare_client_id = os.environ["FOURSQUARE_CLIENT_ID"]
foursquare_secret = os.environ["FOURSQUARE_CLIENT_SECRET"]

In [92]:
# !pip install foursquare

In [26]:
# # Construct the client object
# # client = foursquare.Foursquare(client_id='YOUR_CLIENT_ID', client_secret='YOUR_CLIENT_SECRET', redirect_uri='http://fondu.com/oauth/authorize')
# client = foursquare.Foursquare(client_id=foursquare_id, client_secret=foursquare_secret, redirect_uri='http://fondu.com/oauth/authorize')

# # Build the authorization url for your app
# auth_uri = client.oauth.auth_url()

In [13]:
# import requests as re
# import os

# client_id = os.environ["FOURSQUARE_CLIENT_ID"]
# client_secret = os.environ["FOURSQUARE_CLIENT_SECRET"]
# location = "Toronto,Canada"
# v = "20200731"

# # note '39.122352,-86.5712' middle

# url = "https://api.foursquare.com/v2/venues/search?near="+location+"&client_id="+client_id+"&client_secret="+client_secret+"&v="+v

In [15]:
# client = foursquare.Foursquare(client_id=client_id, client_secret=client_secret, redirect_uri='http://fondu.com/oauth/authorize')

# # Build the authorization url for your app
# auth_uri = client.oauth.auth_url()

In [None]:
# # url_venues = "https://api.foursquare.com/v2/venues/search?near=39.16144,-86.534848&client_id="+foursquare_id+"&client_secret="+foursquare_secret+"&v=20200731"
# # res = re.get(url)
# # print(res.json())

# url_venues = "https://api.foursquare.com/v2/venues/search?near="+string_test_loc+"&client_id="+foursquare_id+"&client_secret="+foursquare_secret+"&v=20200731"
# res = re.get(url)
# response = res.json()

In [None]:
# pprint.pprint(response)

# response['response']['venues']

In [None]:
res = re.get(url)

## covert ll to string

In [31]:
type(data['ll'])


0        39.15920668,-86.52587356
1             39.16144,-86.534848
2        39.14978027,-86.56889006
3          39.165655,-86.57595635
4          39.164848,-86.57962482
                   ...           
53344    39.00427482,-86.58137523
53345        39.002752,-86.463856
53346    38.99232624,-86.53725171
53347         38.99152,-86.448784
53348        38.990848,-86.368864
Name: ll, Length: 49005, dtype: object

In [38]:
ll_string = {
    'string_ll'
}
crash_locations = data['ll']
For i in range(len(crash_locations)):
    string_test_loc = str(crash_locations[i])
    string_test_loc

'39.15920668,-86.52587356'

In [95]:
data['ll_string'] = data['ll'].astype('string')

In [97]:
# data['ll_string']


## only work with 100 samples 

In [116]:
data_sample = data.sample(50)

In [117]:
data_sample.count()

Master Record Number    50
Year                    50
Month                   50
Day                     50
Weekend?                50
Hour                    50
Collision Type          50
Injury Type             50
Primary Factor          49
Reported_Location       50
Latitude                50
Longitude               50
ll                      50
ll_string               50
dtype: int64

In [119]:
data_sample['number_bars_5k'] = 0

In [120]:
# data_sample['number_bars_5k'] 

42588    0
51943    0
43986    0
26922    0
48196    0
16800    0
22457    0
9753     0
12550    0
52551    0
48542    0
43162    0
2997     0
12081    0
948      0
45932    0
26369    0
13618    0
23097    0
35678    0
35023    0
29676    0
34394    0
42759    0
45474    0
31383    0
5020     0
4029     0
33476    0
42636    0
14573    0
25958    0
23630    0
4936     0
17876    0
26972    0
27110    0
11852    0
52199    0
50693    0
15444    0
2694     0
27818    0
44080    0
15403    0
51963    0
32737    0
26068    0
8553     0
48695    0
Name: number_bars_5k, dtype: int64

### get requests

In [168]:
def bar_search(location):
    base_url = "https://api.foursquare.com/v2/venues/search?"
    params = dict(
        client_id = client_id,
        client_secret = client_secret,
        v = '20200731',
        ll = location,
        query = 'bar',
        limit = 5,
        radius = 5000 
        )
    bar_response = requests.get(url=base_url, params= params)
    if bar_response.status_code != 200:
        print("Something's not quite right - please take a look")
    bars_json = bar_response.json()
    number_bars = len(bars_json['response']['venues'])
    return(number_bars)

In [169]:
bars = bar_search('39.163344,-86.5272')

In [170]:
bars

5

In [165]:
# len(bars['response']['venues'])

5

In [175]:
# bars['response']['venues']

In [148]:
# data_sample_ll = data_sample['ll_string']

In [149]:
# data_sample_ll.iloc[1]

'39.163344,-86.5272'

In [159]:
data_sample.iloc[8,[13]]

ll_string    39.33283031,-86.67892355
Name: 12550, dtype: object

## count/ grab response and number bars from json response

In [199]:
number_bars = {
    'number_bars_5k': []
}

for i in range(len(data_sample['number_bars_5k'])):
    bar_num = bar_search(data_sample.iloc[i,[13]])
    number_bars['number_bars_5k'].append(bar_num)
#     number_bars['number_bars_5k'].append(bar_num)
    
# number_bars   
#     data_sample['number_bars_5k'][i] = 2
#                #= bar_search(data_sample['ll_string'])


In [176]:
# data_sample["number_bars_5k"] = 

In [196]:
number_bars

In [195]:
number_bars_df = pd.DataFrame(number_bars, columns=["number_bars_5k"])

In [187]:
# data_sample.drop("number_bars_5k", axis=1)

In [183]:
data_sample["number_bars_5k"] = number_bars["number_bars_5k"]

In [189]:
number_bars["number_bars_5k"].count()

50

In [202]:
# number_bars

In [203]:
df_with_bars = data_sample.assign(number_bars_5k =number_bars['number_bars_5k'])

In [206]:
df_with_bars.head(2)

Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude,ll,ll_string,number_bars_5k
42588,1748077,2005,7,5,Weekday,2200.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,LONGVIEW & PETE ELLIS,39.166944,-86.495056,"39.166944,-86.495056","39.166944,-86.495056",5
51943,900026158,2003,4,4,Weekday,2100.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,ATWATER & HENDERSON,39.163344,-86.5272,"39.163344,-86.5272","39.163344,-86.5272",5


In [106]:
# bars_json = test

# for i in range(len(bars_json['response']['venues'])):
#     bars.append(bars_json['response']['venues'][i]['name'])
# return(bars)

In [None]:
# def get_venues(location):
#     bars = []
#     bars_json = search(location)
    
#     for i in range(len(bars_json['response']['venues'])):
#         bars.append(bars_json['response']['venues'][i]['name'])
#     return(bars)

# get_venues(string_test_loc)
# # bars_json['response']['venues'][3]['name']

## Try again

In [61]:
# import requests 
# import os

# client_id = os.environ["FOURSQUARE_CLIENT_ID"]
# client_secret = os.environ["FOURSQUARE_CLIENT_SECRET"]
# # location = "Toronto,Canada"
# # v = "20200731"

# # # note '39.122352,-86.5712' middle

# # url = "https://api.foursquare.com/v2/venues/search?near="+location+"&client_id="+client_id+"&client_secret="+client_secret+"&v="+v

In [114]:
# crash_locations = data['ll'].head(1)

# string_test_loc = str(crash_locations[0])
# string_test_loc

# Visual Crossing API

Virtual Crossing API documentation is [here](https://www.visualcrossing.com/resources/documentation/)

1. Sign up for FREE api key if you haven't done that before.
2. For each crush, get the weather for the location and date.
3. Find a relationship between the weather and severity of the crash.

Hints:

* randomly sample only 250 or so (due to API limits), or pull weather only for smaller sample of crashes
* for sending HTTP requests check out "requests" library [here](http://docs.python-requests.org/en/master/)


In [193]:
VISUAL_CROSSING_API_KEY

# https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline/[location]/[date1]/[date2]?key=YOUR_API_KEY 
# date in yyyy-MM-dd format
# yyyy-MM-ddTHH:mm:ss. For example 2020-10-19T13:00:00. 
# date2 optional

# https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline/London,UK?key=YOUR_API_KEY 

# https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline/38.9697,-77.385?key=YOUR_API_KEY 

# https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline/London,UK/2020-10-01/2020-12-31?key=YOUR_API_KEY 

# #     daily hoursly
# https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline/London,UK/last30days?key=YOUR_API_KEY&include=obs%2Cfcst%2Cstats%2Calerts%2Ccurrent%2Chistfcst&elements=tempmax,tempmin,temp


In [20]:
# Convert date1
need f = 

In [207]:
#set the keys
vs_api_key = os.environ["VISUAL_CROSSING_API_KEY"]


KeyError: 'VISUAL_CROSSING_API_KEY'