# LA Metro Lines Characterization

## Introduction

Driving in Los Angeles (LA) traffic could be a daunting task for tourists or people who are staying in LA for only a short while. Not only does LA have the worst traffic in the US (according to CBS news), but also confusing traffic layouts and road surfaces in dire disrepair. With LA being one of the host cities of World Cup 2026 and the host city of 2028 Olympics, it is clear that LA metro will be playing a major role in transporting tourists. 

Therefore, the characterization of the metro lines would be of great interest to tourists and local businesses alike. In this study, we are looking to characterize the metro lines by the venues surrounding the metro stations. For tourists, this study provides an idea of what venues to expect along the metro lines. For businesses, this study presents an opportunities to find a niche along the lines and avoid areas where competition is overly saturated. 

For this study, we will focus our efforts on studying the metro lines that pass through the city of LA, namely, the Red, the Gold and the Expo lines. 

# Data

In order to characterize the metro lines described previously, we will be using Four Square to collect list of venues and their categories for all metro stations along the Red, Gold and Expo lines. 

To do so, we'll first set up a function to loop through all metro stations to collect venue data from Four Square.  

In [1]:
#import libraries 
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


In [4]:
#Set up Four Square credentials
CLIENT_ID = '0MYX4OMSFLDOA4MZDOUSIN0ERPQ0FYBIDPU033WRALUJIV03' # your Foursquare ID
CLIENT_SECRET = 'IP2HEOA1OBXYODOSDAEYHB0UW1DLKY4AS3SYFQVXK51HSNIC' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 500
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0MYX4OMSFLDOA4MZDOUSIN0ERPQ0FYBIDPU033WRALUJIV03
CLIENT_SECRET:IP2HEOA1OBXYODOSDAEYHB0UW1DLKY4AS3SYFQVXK51HSNIC


In [5]:
#define function to get venues

def getNearbyVenues(names, latitudes, longitudes, radius=400):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,            
            lat, 
            lng,
            v['venue']['id'],
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Station', 
                  'Station Latitude', 
                  'Station Longitude', 
                  'Venue ID',           
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

With the `getNearbyVenues` function defined, we then proceed to import data containing all coordinates data of all metro stations so that we can use the `getNearbyVenues`to loop through all metro stations and collect benue data from Four Square. 

In [6]:
#import Metro Station data
la_metro_stations = pd.read_csv('red_gold_expo_stns.csv')
la_metro_stations.head()

Unnamed: 0,StationName,Line,Latitude,Longitude
0,Atlantic Station,Gold,34.0334,-118.154
1,East LA Civic Center Station,Gold,34.0332,-118.1614
2,Maravilla Station,Gold,34.0331,-118.1684
3,Indiana Station,Gold,34.0343,-118.1922
4,Soto Station,Gold,34.044,-118.2106


Using the `read_csv` method, we imported the metro station data into a data frame. The data frame consists of each station's name, the line it belongs to, its latitidue and its longitude, respectively. The data appears to have been imported correctly. With that, use the `getNearbyVenues` to collect all venues in the proximity of the metro stations (within 400 meters radius). 

In [7]:
#get venues

metro_stn_venues = getNearbyVenues(names=la_metro_stations['StationName'],
                                   latitudes=la_metro_stations['Latitude'],
                                   longitudes=la_metro_stations['Longitude']
                                  )

print('Complete!')

Atlantic Station
East LA Civic Center Station
Maravilla Station
Indiana Station
Soto Station
Mariachi Plaza Station
Pico/Aliso Station
Little Tokyo/Arts District Station
Union Station Station
Chinatown Station
Lincoln/Cypress Station
Heritage Square Station
Southwest Museum Station
Highland Park Station
South Pasadena Station
Fillmore Station
Del Mar Station
Memorial Park Station
Lake Station
Allen Station
Sierra Madre Villa Station
Arcadia Station
Monrovia Station
Duarte/City of Hope Station
Irwindale Station
Azusa Downtown Station
APU/Citrus College Station
Civic Center/Grand Park Station
Pershing Square Station
7th Street/Metro Center Station
Westlake/MacArthur Park Station
Wilshire/Vermont Station
Vermont/Beverly Station
Vermont/Santa Monica Station
Vermont/Sunset Station
Hollywood/Western Station
Hollywood/Vine Station
Hollywood/Highland Station
Universal City/Studio City Station
North Hollywood Station
Pico Station
LATTC/Ortho Institute Station
Jefferson/USC Station
Expo Park/USC

With the quaries completed, let's insepct the data frame to see the data inside. 

In [8]:
metro_stn_venues.head()

Unnamed: 0,Station,Station Latitude,Station Longitude,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Atlantic Station,34.0334,-118.154,5397b842498ea56da1541a95,Tacos Ensenada,34.03356,-118.153599,Mexican Restaurant
1,Atlantic Station,34.0334,-118.154,4c37f8a83849c92844cebeb1,Bob's Freeze,34.032557,-118.154414,Ice Cream Shop
2,Atlantic Station,34.0334,-118.154,4ab55445f964a520fc7320e3,Los Molcajetes,34.03297,-118.155352,Latin American Restaurant
3,Atlantic Station,34.0334,-118.154,4b64c062f964a52048cd2ae3,Fish Taco Express,34.032529,-118.15453,Taco Place
4,Atlantic Station,34.0334,-118.154,4b83365ff964a520a1fd30e3,SUBWAY,34.03253,-118.153702,Sandwich Place


As shown above, the data set includes the ID, name, coordinates and categories of each vanue found as well as the station and its coordinates to which the venue is in close proximity to. With the geological data and labels, this data set allows us to see what is surrounding each metro station, and therefore able to characterize each station and metro line. 

Let's check to see how many venues are included in the data. 

In [10]:
metro_stn_venues.shape

(1379, 8)

With 1,379 venues found in the vacinity of metro stations of interest, we have the data to form the backbone of our analysis. 