# Data Wrangling Challenge
### Pull and manipulate the API data

The point of this exercise is to try data enrichment with data from external APIs. We are going to take data about car crashes in Monroe County, Indiana from 2003 to 2015 and try to figure out the weather during the accident and how many bars there are in the area. We will work with two different APIs during this challenge:

- Foursquare API
- World Weather Online API

We will try to find correlations between the severity of crash and weather/number of bars in the area. To indicate the severity of a crash, we will use column `Injury Type`.

In [162]:
import pandas as pd
import requests as re

## Data

The data for this exercise can be found [here](https://drive.google.com/file/d/1_KF9oIJV8cB8i3ngA4JPOLWIE_ETE6CJ/view?usp=sharing).

Just run the cells below to get your data ready. Little help from us.


In [163]:
# had to specify the correct coding
crash_data = pd.read_csv('data/monroe_county_crash_data.csv', encoding='ISO-8859-1')
crash_data[["Latitude", "Longitude"]].head(5)

Unnamed: 0,Latitude,Longitude
0,39.159207,-86.525874
1,39.16144,-86.534848
2,39.14978,-86.56889
3,39.165655,-86.575956
4,39.164848,-86.579625


# Foursquare API

Foursquare API documentation is [here](https://developer.foursquare.com/)

1. Start a foursquare application and get your keys.
2. For each crash, create the function **get_venues** that will pull bars in the radius of 5km around the crash

#### example
`get_venues('48.146394, 17.107969')`

3. Find a relationship (if there is any) between number of bars in the area and severity of the crash.

HINTs: 
- check out python package "foursquare" (no need to send HTTP requests directly with library `requests`)
- **categoryId** for bars and nightlife needs to be found in the [foursquare API documentation](https://developer.foursquare.com/docs/api-reference/venues/search/)

### Function `get_venues`

In [164]:
# function to return a response from the query (venue, in this case) and the point of reference (long/lang) with radius in m
def get_venues(venue, longitude, latitude, radius):

    url = "https://api.foursquare.com/v3/places/search"

    longitude = f"{longitude:.2f}"
    latitude = f"{latitude:.2f}"

    params = {
        "query" : venue,
        "ll" : longitude + "," + latitude,
        "sort" : "DISTANCE",
        "radius" : str(radius)
    }

    headers = {
        "Accept" : "application/json",
        "Authorization" : "fsq3P3rZR2Kb2ccaHpbuTMdwWCtYF3fIp1frLeqT2saiS0M="
    }

    response = re.get(url, params=params, headers=headers)

    return response

### Example extraction for location `48.146394, 17.107969`

In [165]:
# example output in json format
bar_data_example = get_venues("bar", 48.146394, 17.107969, 5000).json()
bar_data_example['results']

[{'fsq_id': '5baf3efbc66666002c063861',
  'categories': [{'id': 13012,
    'name': 'Hookah Bar',
    'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/nightlife/hookahbar_',
     'suffix': '.png'}}],
  'chains': [],
  'distance': 78,
  'geocodes': {'main': {'latitude': 48.149291, 'longitude': 17.110026}},
  'link': '/v3/places/5baf3efbc66666002c063861',
  'location': {'address': 'Námestie 1. mája 4',
   'country': 'SK',
   'cross_street': '',
   'formatted_address': 'Námestie 1. mája 4, 811 06 Bratislava',
   'locality': 'Bratislava',
   'postcode': '811 06',
   'region': 'Bratislava Region'},
  'name': 'Vice City Shisha Bar & Lounge',
  'related_places': {},
  'timezone': 'Europe/Bratislava'},
 {'fsq_id': '53c2d3e2498eda35d7854d64',
  'categories': [{'id': 13003,
    'name': 'Bar',
    'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/nightlife/pub_',
     'suffix': '.png'}}],
  'chains': [],
  'distance': 85,
  'geocodes': {'main': {'latitude': 48.149235, 'longitude

In [166]:
# example conversion to DataFrame using json_normalize
bar_data_example_df = pd.json_normalize(
    data=bar_data_example["results"],
    record_path="categories",
    meta=["distance", "name", ["geocodes", "main", "latitude"], ["geocodes", "main", "longitude"]],
    record_prefix="cat_"
    ).drop(columns = ["cat_id", "cat_icon.prefix", "cat_icon.suffix"])

# rename columns for readability
bar_data_example_df.rename(columns={"geocodes.main.latitude" : "latitude", "geocodes.main.longitude" : "longitude"}, inplace=True)

In [167]:
df_bars = pd.DataFrame()
df_bars = pd.concat([df_bars, bar_data_example_df], ignore_index=True)
df_bars

Unnamed: 0,cat_name,distance,name,latitude,longitude
0,Hookah Bar,78,Vice City Shisha Bar & Lounge,48.149291,17.110026
1,Bar,85,Smile Bar & Caffe,48.149235,17.109996
2,Beer Bar,125,Mešuge Craft Beer Bar,48.148932,17.110533
3,Gastropub,133,Skupinová Terapia,48.149856,17.111783
4,Lounge,166,EVENT bar & restaurant,48.151471,17.110419
5,Bakery,177,Minute - Fresh Food Bar,48.149603,17.112317
6,Cocktail Bar,238,MYST BAR,48.149131,17.112938
7,Slovak Restaurant,247,1. Slovak pub,48.148398,17.11231
8,Café,247,Bar BaRon,48.147839,17.110789
9,Beer Bar,256,Kollarko,48.149161,17.113223


In [168]:
venue = "bar"
radius = 5000
df_bars = pd.DataFrame()

for i in range(0, crash_data.shape[0]):
# for i in range(2,10): <-- to test output

    # extract longitude/latitude values from crash data
    longitude, latitude = crash_data[["Latitude", "Longitude"]].iloc[i]

    # use longitude/latitude values to search FSQ API for bars in the vicinity of crash
    # then convert result to JSON format
    bar_data_json = get_venues(venue, longitude, latitude, radius).json()

    # convert raw data into DataFrame, detailing bar name and location
    # then drop irrelevant columns
    bar_data_df = pd.json_normalize(
        data=bar_data_json["results"],
        record_path="categories",
        meta=["distance", "name", ["geocodes", "main", "latitude"], ["geocodes", "main", "longitude"]],
        record_prefix="cat_"
        ).drop(
            columns = ["cat_id", "cat_icon.prefix", "cat_icon.suffix"])

    # rename columns for readability
    bar_data_df.rename(columns={"geocodes.main.latitude" : "latitude", "geocodes.main.longitude" : "longitude"}, inplace=True)

    # add cash_data index to identify which bars are associated with which crash
    bar_data_df['crash_index'] = i

    # concatenate previous DataFrame with the new bar locations for each crash
    df_bars = pd.concat([df_bars, bar_data_df], ignore_index=True)

In [173]:
df_bars.groupby("crash_index").count()

Unnamed: 0_level_0,cat_name,distance,name,latitude,longitude
crash_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2,18,18,18,18,18
3,19,19,19,19,19
4,18,18,18,18,18
5,16,16,16,16,16
6,18,18,18,18,18
7,18,18,18,18,18
8,18,18,18,18,18
9,12,12,12,12,12


# World Weather Online API

World Weather Online API is [here](https://www.worldweatheronline.com/developer/api/historical-weather-api.aspx)

1. Sign up for FREE api key if you haven't done that before (it's free for **30 days**).
2. For each crush, get the weather for the location and date.
3. Find a relationship between the weather and severity of the crash.

Hints:

* pull weather only for smaller sample of crashes (250 or so) due to API limits
* for sending HTTP requests check out "requests" library [here](http://docs.python-requests.org/en/master/)
