# ACM + Lyft HackNight - Python Starter Notebook

<strong>Prompt:</strong> In a city of only taxis from the Bauer Taxi Service, your team is put in charge to earn Lyft more business and win riders over. The ACM Executive team has provided you with this starter notebook, which showcases how you can utilize the RESTful API that has 5 public endpoints that you can make calls to. These endpoints are listed below:

<ul>
    <li><strong>/time:</strong> GET</li>
        <ul>
            <li>Returns the simulation period you are developing your pricing model for.</li>
        </ul>
    <li><strong>/trips:</strong> GET</li>
        <ul>
            <li>Returns the taxi trips that occured with the criterion specified.</li>
        </ul>
    <li><strong>/count:</strong> GET</li>
        <ul>
            <li>Returns the total amount of trips that occured during a certain time period.</li>
        </ul>
    <li><strong>/pricing:</strong> POST</li>
        <ul>
            <li>Submits your team's general pricing strategy.</li>
        </ul>
    <li><strong>/zones:</strong> POST</li>
        <ul>
            <li>Submits the second component of your pricing strategy: the areas you believe are power zones.</li>
        </ul>
</ul>

<strong>Game Overview:</strong> There are 5 simple steps you need to follow to succeed this HackNight!
<ol>
    <li>Wait for a "simulation" (access to taxi data of a specific week and prior)</li>
    <li>Use the endpoints with associated GET requests to do data analysis to develop your pricing model.</li>
    <li>Submit your pricing model</li>
    <li>View your revenue generated (the results) on the diagram shown at the end of each simulation.</li>
    <li>Repeat this process for every simulation</li>
</ol>

## Learning Objective: Using a RESTful API

An API (Application Programming Interface) is a set of methods that controls the access a programmer has to what they want to reach (in this case, the taxi dataset). Here, the Lyft HackNight API limits access to taxi data by bounding it to the simulation. What does this mean? It means that you can only access taxi data of a certain date or date range when it is within the simulation date range or prior.

Now, let's get started with set up process.

### Initial Setup and Imports

In [4]:
# Required imports
import requests # Python's go-to HTTP request library
from datetime import datetime

In [5]:
# Environment and other variables for the API to know who you are
URL = "http://localhost:8080"
TEAM = "team4"

### Our first GET request: /time

Format of a GET request with Python's request library:

requests.get(SERVER_URL + "/ENDPOINT_NAME/", params="{QUERY HERE}).json()

In [13]:
requests.get(URL + "/time/").json()

{'message': 'The simulation is not over, the time is 10/1/2017.',
 'sucess': True,
 'time': 1506834000000}

Note that the /time endpoint doesn't take any parameters, so there's no need to specify "params" in the method call. However, in the /trips endpoint below, we'll do just that.

### /trips endpoint

The /trips endpoint allows us to find all trips that have occured in a specific time range. Note, however, that you can only search for trips that happen in the current simulation. We don't want you to predict the future!

<strong>Endpoint: /trips</strong>
<ul>
    <li>team (str): the string name of your team as indicated in the "TEAM" variable above</li>
    <li>start (str): the date and time you want to start searching from, in format %m/%d/%Y %H:%M A/PM</li>
        <ul>
            <li>Example: "10/07/2017 5:00PM"</li>
        </ul>
    <li>end (str): the date and time you want to end the search on, in format %m/%d/%Y %H:%M A/PM</li>
    <li>limit (int): maximum number of results to return</li>
    <li>offset (int): indicates you only want to receive data points past this index</li>
</ul>

In [32]:
# Build the query
trips_query = {
    "team": TEAM,
    "start": "9/10/2017 2:00 PM",
    "end": "9/10/2017 3:00 PM",
    "limit": 1
}

# Send the request
trips_response = requests.get(URL + "/trips/", params=trips_query).json()
trips_response

{'length': 1,
 'response': [{'company': 'Bauer Taxi Service',
   'dropoff_centroid_latitude': '41.877406123',
   'dropoff_centroid_longitude': '-87.621971652',
   'dropoff_community_area': '32',
   'entry_idx': 0,
   'extras': '0',
   'fare': '4.75',
   'pickup_centroid_latitude': '41.877406123',
   'pickup_centroid_longitude': '-87.621971652',
   'pickup_community_area': '32',
   'taxi_id': '52789c130fe640cb73a4081eca9872f696ab63daa15effcd8b74739e40609b524f1dad55ea7b820852285b3ebffe00b626a93fae3751e9086f6e1c74ccc76e91',
   'tips': '0',
   'tolls': '0',
   'trip_end_timestamp': '2017-09-10T09:15:00',
   'trip_id': '0007c73adceba3169364c89a8ad2b2073530acf3',
   'trip_miles': '0.5',
   'trip_seconds': '240',
   'trip_start_timestamp': '2017-09-10T09:15:00',
   'trip_total': '4.75'}],
 'success': True}

The JSON (Javascript Object Notation) object we got back from our call to /trips is 1 ride that occured from 2:00 - 3:00PM. In Python, this JSON object is interpreted as a dictionary. <strong>We care about the "response" key.</strong> To retrieve the response, we simply use Python's dictionary bracket notation, which will give us the datapoint we want.

In [27]:
# Gives us a list of the taxi rides that fits the criterion we wanted. Only one in here since we set limit = 1.
trips_response["response"]

[{'company': 'Bauer Taxi Service',
  'dropoff_centroid_latitude': '41.877406123',
  'dropoff_centroid_longitude': '-87.621971652',
  'dropoff_community_area': '32',
  'entry_idx': 0,
  'extras': '0',
  'fare': '4.75',
  'pickup_centroid_latitude': '41.877406123',
  'pickup_centroid_longitude': '-87.621971652',
  'pickup_community_area': '32',
  'taxi_id': '52789c130fe640cb73a4081eca9872f696ab63daa15effcd8b74739e40609b524f1dad55ea7b820852285b3ebffe00b626a93fae3751e9086f6e1c74ccc76e91',
  'tips': '0',
  'tolls': '0',
  'trip_end_timestamp': '2017-09-10T09:15:00',
  'trip_id': '0007c73adceba3169364c89a8ad2b2073530acf3',
  'trip_miles': '0.5',
  'trip_seconds': '240',
  'trip_start_timestamp': '2017-09-10T09:15:00',
  'trip_total': '4.75'}]

#### get_trips(query) method

Since your goal is to look at many different taxi rides, we want to simplify things by creating a method that just takes in the query we want to process. To do this, we simply put the repeated information inside of the method that *wraps* the the actual GET request.

In [28]:
def get_trips(query):
    query["team"] = TEAM
    response = requests.get(URL + "/trips/", params=query)
    return response.json()

In [57]:
# Example usage of get_trips - no need to specify the team or GET request anymore!
get_trips({
        "start": "9/10/2017 3:45 AM",
        "end": "9/10/2017 3:46 AM",
        "limit": 5
    })

{'length': 4,
 'response': [{'company': 'Bauer Taxi Service',
   'dropoff_centroid_latitude': '41.980264315',
   'dropoff_centroid_longitude': '-87.913624596',
   'dropoff_community_area': '76',
   'entry_idx': 0,
   'extras': '0',
   'fare': '43.5',
   'pickup_centroid_latitude': '41.980264315',
   'pickup_centroid_longitude': '-87.913624596',
   'pickup_community_area': '76',
   'taxi_id': 'a3bc50ed4e7b5de8e74d56919ae6e14b91e33c63c6c7d9632165c99dd1781507f170419466c56588531ed4788c84cabc34a62efd4834750e7ae2753c3667fc1d',
   'tips': '0',
   'tolls': '0',
   'trip_end_timestamp': '2017-09-09T23:00:00',
   'trip_id': '007bcf68e2b126700ca88588bc4531e86ddef63f',
   'trip_miles': '17.6',
   'trip_seconds': '1380',
   'trip_start_timestamp': '2017-09-09T22:45:00',
   'trip_total': '43.5'},
  {'company': 'Bauer Taxi Service',
   'dropoff_centroid_latitude': '41.892658108',
   'dropoff_centroid_longitude': '-87.652534484',
   'dropoff_community_area': '24',
   'entry_idx': 1,
   'extras': '1',


#### Sending SQL queries

The GET request for /trips can accept SQL queries when passes as strings.

<strong>Parameters:</strong>
<ul>
    <li>team</li>
    <li>where : Indicates the SQL WHERE clause which defines the query.</li>
    <li>limit</li>
</ul>

In [59]:
get_trips({
        "where": "(trip_start_timestamp BETWEEN {9/10/2017 2:00 PM} AND {9/10/2017 3:00 PM}) AND (trip_total BETWEEN 10 AND 20)",
        "limit": 1
})

{'length': 1,
 'response': [{'company': 'Bauer Taxi Service',
   'dropoff_centroid_latitude': '41.899602111',
   'dropoff_centroid_longitude': '-87.633308037',
   'dropoff_community_area': '8',
   'entry_idx': 0,
   'extras': '0',
   'fare': '13.25',
   'pickup_centroid_latitude': '41.878865584',
   'pickup_centroid_longitude': '-87.625192142',
   'pickup_community_area': '32',
   'taxi_id': '1095c5432cb9cfd88296fb166e38d560936d4690801dddfae2bbae88a2d295542548a58e8e2394de8c67c82610261281b4303ad7f0b85236f9e7c4f46badf414',
   'tips': '0',
   'tolls': '0',
   'trip_end_timestamp': '2017-09-10T09:15:00',
   'trip_id': '00197d0ae9d887315d40fb4a5a7405a062137c7d',
   'trip_miles': '3.9',
   'trip_seconds': '960',
   'trip_start_timestamp': '2017-09-10T09:00:00',
   'trip_total': '13.25'}],
 'success': True}

NOTE: Times are rounded off to every quarter hour. By specifying that you want the rides in between 3:00PM and 3:15PM, you're also getting back rides from 3:15PM to ~3:23PM. This is because the rides that happened at this time rounded down to 3:15PM.

You might be asking... well, how do we parse this data? We'll take a look at that in the data analysis section.

### /count endpoint

In [43]:
def get_count(query):
    query["team"] = TEAM
    response = requests.get(URL + "/count/", params=query)
    return response.json()

#### /count example : What if our simulation lands on New Year's?

In [55]:
# New Years
new_years_count = get_count({
        "start": "12/31/2016 12:00 AM",
        "end": "01/01/2017 05:00 AM"
    })
print("# of rides in 29-hour span of New Years Eve -> Day: {}".format(new_years_count["count"]))

# of rides in 29-hour span of New Years Eve -> Day: 74162


## Learning Objective: Data Preprocessing and Analysis

Before applying any data analysis to your data, it's always important to preprocess your data. This means looking for any missing values, identifying categorical variables, accounting for outlier (Y) or leverage (X) points. We'll go through the process of getting the data for a specific simulation, processing it with the stack we're familiar with (scipy), and begin some explatory data analysis on it.