# Data Wrangling Challenge
### Pull and manipulate the API data

The point of this exercise is to try data enrichment with data from external APIs. We are going to take data about car crashes in Monroe County, Indiana from 2003 to 2015 and try to figure out the weather during the accident and how many bars there are in the area. We will work with two different APIs during this challenge:

- Foursquare API
- World Weather Online API

We will try to find correlations between the severity of crash and weather/number of bars in the area. To indicate the severity of a crash, we will use column `Injury Type`.

## Data

The data for this exercise can be found [here](https://drive.google.com/file/d/1_KF9oIJV8cB8i3ngA4JPOLWIE_ETE6CJ/view?usp=sharing).

Just run the cells below to get your data ready. Little help from us.


In [1]:
import pandas as pd
import os # use this to access your environment variables
import requests # this will be used to call the APIs

In [13]:
data = pd.read_csv("monroe-county-crash-data2003-to-2015.csv", encoding="unicode_escape")
# ========================
# preparing data
data.dropna(subset=['Latitude', 'Longitude'], inplace=True)
print(data.shape)
data.head()

(53913, 12)


Unnamed: 0,Master Record Number,Year,Month,Day,Weekend?,Hour,Collision Type,Injury Type,Primary Factor,Reported_Location,Latitude,Longitude
0,902363382,2015,1,5,Weekday,0.0,2-Car,No injury/unknown,OTHER (DRIVER) - EXPLAIN IN NARRATIVE,1ST & FESS,39.159207,-86.525874
1,902364268,2015,1,6,Weekday,1500.0,2-Car,No injury/unknown,FOLLOWING TOO CLOSELY,2ND & COLLEGE,39.16144,-86.534848
2,902364412,2015,1,6,Weekend,2300.0,2-Car,Non-incapacitating,DISREGARD SIGNAL/REG SIGN,BASSWOOD & BLOOMFIELD,39.14978,-86.56889
3,902364551,2015,1,7,Weekend,900.0,2-Car,Non-incapacitating,FAILURE TO YIELD RIGHT OF WAY,GATES & JACOBS,39.165655,-86.575956
4,902364615,2015,1,7,Weekend,1100.0,2-Car,No injury/unknown,FAILURE TO YIELD RIGHT OF WAY,W 3RD,39.164848,-86.579625


In this section you'll use the requests library to access the Foursquare places API and pull points of interest
The Python Requests library is a popular and user-friendly HTTP library that simplifies the process of making HTTP requests.
It provides an elegant and intuitive API for sending various types of requests, handling headers, cookies, and authentication, 
making it an excellent choice for web scraping, API integration, and general HTTP communication tasks in Python applications.

# Foursquare API

Foursquare API documentation is [here](https://location.foursquare.com/developer/reference/place-search/)

1. Start a foursquare application and get your keys.
2. For each crash, create the function **get_venues** that will pull bars in the radius of 5km around the crash
3. Find a relationship (if there is any) between number of bars in the area and severity of the crash.

HINT: 
- **categories** for bars and nightlife needs to be found in the [foursquare API documentation](https://location.foursquare.com/places/docs/categories/)
- you'll have to parse latitude and longitude together as a string, separated by a comma for the API

In [17]:
# pay no attention to the OAuth credentials, you don't need them
# if you didn't save your API key when you first created it, you'll have to make a new one
# import your foursquare API key from an environment variable
# this is the safest way to store your API key
# be sure to pass the environment variable as you named it - it may not be the same as below
FOURSQUARE_KEY = os.getenv('FOURSQUARE_KEY')

In [4]:
def get_venues_fs(latitude, longitude, radius, api_key, categories):
    """
    Get venues from foursquare with a specified place type and coordinates.
    Args:
        latitude (float): latitude for query (must be combined with longitude)
        longitude (float): longitude for query (must be combined with latitude)
        api_key (str): foursquare API to use for query
        categories (str) : Foursquare-recognized place type. If not passed no place_type will be specified. Separate ids with commas
    
    Returns:
        response: response object from the requests library.
    """
    pass

In [18]:
# testing
res = get_venues_fs(latitude=51.51, longitude=-0.1337, radius=5000, api_key=FOURSQUARE_KEY, categories=None)

# World Weather Online API

World Weather Online API is [here](https://www.worldweatheronline.com/developer/api/historical-weather-api.aspx)

1. Sign up for FREE api key if you haven't done that before (it's free for **30 days**).
2. For each crush, get the weather for the location and date.
3. Find a relationship between the weather and severity of the crash.

Hints:

* pull weather only for smaller sample of crashes (250 or so) due to API limits
* for sending HTTP requests check out "requests" library [here](http://docs.python-requests.org/en/master/)


In [19]:
import time
api_key = os.getenv("<>")

In [20]:
# Load the car crash data
df_crashes = pd.read_csv('car_crashes.csv')
print(df_crashes.head())

FileNotFoundError: [Errno 2] No such file or directory: 'car_crashes.csv'

In [15]:
# Define the function to get weather data
def get_weather_data(date, location):
    api_key = 'YOUR_WORLD_WEATHER_ONLINE_API_KEY'
    url = f'http://api.worldweatheronline.com/premium/v1/past-weather.ashx'
    params = {
        'key': api_key,
        'q': location,
        'date': date,
        'format': 'json'
    }
    response = requests.get(url, params=params)
    if response.status_code == 200:
        return response.json()
    else:
        return None

# Example usage
sample_date = '2015-01-01'
sample_location = 'Bloomington,IN'
weather_data = get_weather_data(sample_date, sample_location)
print(weather_data)

None


In [16]:
# Assuming the DataFrame has columns 'date' and 'location'
df_crashes['weather'] = df_crashes.apply(lambda row: get_weather_data(row['date'], row['location']), axis=1)
df_crashes['num_bars'] = df_crashes.apply(lambda row: get_number_of_bars(row['location']), axis=1)

NameError: name 'df_crashes' is not defined