# Rose Hack 2019 Acceptance Processor

This notebook is used to process the registration data.

1. Accept UCR folks
2. Accept folks coming from Riverside
3. Accept folks coming from locations based on their distance

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('./registration-12-23-18-20-48.csv')

In [3]:
columns_to_keep = [
    "What's your first name?",
    "Hi {{answer_97549884}}, what's your <strong>last name</strong>?",
    "What's your <strong>student email (.edu)</strong>?",
    "What's your phone number?",
    "What are your preferred pronouns?",
    "Birthdate",
    "Current Education",
    "School Name",
    "School Name (Please use the full spelling)",
    "High School Name",
    "What's the name of your University?",
    "Vegan",
    "Vegetarian",
    "Gluten-Free",
    "Other.4",
    "Shirt Size (Unisex sizing)",
    "Other.5",
    "What would you like to build/learn at Rose Hack?",
    "Where will you be traveling from? (City, State)",
]

In [4]:
clean_data = data[columns_to_keep]

In [5]:
clean_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 487 entries, 0 to 486
Data columns (total 19 columns):
What's your first name?                                            487 non-null object
Hi {{answer_97549884}}, what's your <strong>last name</strong>?    487 non-null object
What's your <strong>student email (.edu)</strong>?                 487 non-null object
What's your phone number?                                          429 non-null object
What are your preferred pronouns?                                  481 non-null object
Birthdate                                                          487 non-null object
Current Education                                                  487 non-null object
School Name                                                        392 non-null object
School Name (Please use the full spelling)                         56 non-null object
High School Name                                                   29 non-null object
What's the name of your Uni

# Accepting People

1. UCR folks
2. Folks sorted by distance

In [6]:
accepted_folks = pd.DataFrame()

We are accepting folks that go to a university who's name contains 'iverside'. This assumption is made to catch folks who go to UCR or RCC

In [7]:
ucr_university = clean_data[clean_data['School Name'].str.contains('iverside', na=False)]

In [8]:
ucr_university_2 = clean_data[clean_data['School Name (Please use the full spelling)'].str.contains('iverside', na=False)]

In [9]:
accepted_folks = pd.concat([ucr_university, ucr_university_2])

Let's validate our assumption

In [10]:
accepted_folks['School Name'].unique()

array(['University of California, Riverside', 'Riverside City College',
       nan], dtype=object)

In [11]:
accepted_folks['School Name (Please use the full spelling)'].unique()

array([nan, 'University of California, Riverside',
       'University of California Riverside', 'UC- Riverside',
       'UC Riverside', 'UC Riverside ',
       'University Of California, Riverside',
       'University Of California Riverside'], dtype=object)

We also want to accept folks who are coming from Riverside.

Let's get a list of folks excluding out already accepted folks. We get this list by doing an outer join (grab rows that are not in both the accepted list and the all list)

In [13]:
diff = accepted_folks.merge(clean_data, indicator=True, how='outer')

In [19]:
diff = diff[diff['_merge'] == 'right_only'].drop('_merge', axis=1)

In [20]:
riveride_folks = diff[diff['Where will you be traveling from? (City, State)'].str.contains('iverside')]

In [21]:
print(riveride_folks.shape)
print(accepted_folks.shape)

(7, 19)
(146, 19)


In [23]:
accepted_folks = pd.concat([accepted_folks, riveride_folks], sort=True)

## Calculate their travel distance using a geocoder

In [25]:
import requests as re

In [27]:
API_KEY = ''
GEOCODER_API = 'http://www.mapquestapi.com/geocoding/v1/address?key={}&location='.format(API_KEY)

In [28]:
def geocoder(location):
    request_url = '{}{}'.format(GEOCODER_API, location)
    response = re.get(request_url).json()['results'][0]['locations'][0]['latLng']
    
    latitude = response['lat']
    longitude = response['lng']
    
    return latitude, longitude

We now want a list of folks who have not been accepted yet

In [29]:
everyone_else = accepted_folks.merge(clean_data, indicator=True, how='outer')

In [31]:
everyone_else = everyone_else[everyone_else['_merge'] == 'right_only'].drop('_merge', axis=1)

In [33]:
# check this dict to see if we have the lat long before doing another lookup
# We want to check if we have the info to avoid making an unneeded request
location_cache = {}

In [34]:
lat_lngs = []

In [35]:
num_lookups = 0

for index, row in everyone_else.iterrows():
    location = row['Where will you be traveling from? (City, State)'].replace(' ', '')
    
    if location not in location_cache:
        lat, lng = geocoder(location)
        num_lookups += 1
        
        location_cache[location] = (lat, lng)
    
    lat_lngs.append(location_cache[location])

In [36]:
everyone_else['lat-lng'] = lat_lngs

We next need to calculate the distance from Riverside to the location of the person

In [37]:
UCR_LAT_LONG = (33.973625, -117.328773)

In [38]:
distances = []

In [44]:
from math import radians, cos, sin, asin, sqrt

def haversine(person, location):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    
    Source: https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points
    """
    lat1 = person[0]
    lon1 = person[1]
    
    lat2 = location[0]
    lon2 = location[1]
    
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 3956 # Radius of earth in kilometers. Use 3956 for miles
    return c * r

In [51]:
for index, person in everyone_else.iterrows():
    cur_dist = haversine(person['lat-lng'], UCR_LAT_LONG)
    distances.append(cur_dist)

In [53]:
everyone_else['distance'] = distances

In [55]:
accepted_folks.to_csv('local-folk.csv')

In [56]:
everyone_else.to_csv('non-local-folks.csv')