# LA Metro Bus EDA and Visualization
We have a cleaned dataset (which was gathered with the LA Metro API), so let's analyze it

## Introduction

In [2]:
import pandas as pd
import io
from PIL import Image, ImageDraw, ImageFont
import matplotlib
import folium
import os
from matplotlib import pyplot as plt
from folium.plugins import HeatMap
import glob
from natsort import natsorted, ns
from PIL import Image
%matplotlib inline

In [3]:
bus_df = pd.read_csv("../Bus Data/[Cleaned] LA Metro Bus Data.csv")
bus_df

FileNotFoundError: [Errno 2] No such file or directory: '../Bus Data/[Cleaned] LA Metro Bus Data.csv'

Here is our dataset. Below, I will see which day has the most amount of data.

In [None]:
print(bus_df["day"].value_counts())
bus_df["day"].value_counts(normalize = True).plot.bar()

Above, I can see that the most data avaialble on a given data was Friday. So, let's isolate the data and analyze it.

In [None]:
bus_friday_df = bus_df
indexFriday = bus_friday_df[bus_friday_df["day"] != "Friday"].index
bus_friday_df.drop(indexFriday, inplace = True)
bus_friday_df

#df.drop(df[df['Age'] < 25].index, inplace = True) 

Above is the data that was collected on Friday. The data contains 127,737 rows, so there is still a significant amount of data that we can look at.

## EDA

In [None]:
bus_friday_df["direction"].unique()

Above are the unique directions that a bus can go.

In [None]:
print(bus_friday_df["direction"].value_counts())
bus_friday_df["direction"].value_counts(normalize = True).plot.bar()

Above is a graph that shows the count of data for each direction. As we can see, the main four directions (South, North, East, West) have about an equal amount of data. Clockwis, Xcounter, and none have significanly lower counts.

In [None]:
bus_friday_df["last report"].describe()

The last report column in the dataset represents the most recent time that a specific bus updated its information (the location being the most important) to the API. So, from the above, we can see that the average last reporting time was 63 seconds and at least 75% of the buses had data that was from 93 seconds or earlier.

Below, I will count the number of times each bus occured and plot it on a bar graph. This will allow us to see all buses from most to least frequent. I am choosing to show the data this way so that each individual bus can be seen with their corresponding frequency rate.

In [None]:
print(bus_friday_df["route"].value_counts())
print(bus_friday_df["route"].value_counts(normalize = True))
plt.figure(figsize=(75, 50))
plt.xticks(size = 20)
plt.yticks(size = 50)
bus_friday_df["route"].value_counts(normalize = True).plot.bar()

Here is our data visualized above. I had expected that the most frequenet lines would be 901 (Orange - Chatsworth to NoHo) and 910 (Silver - El Monte to Downtown to San Pedro). This is because these are BRT lines with at least some dedicated ROW, so I believed that LA Metro would justify this with extremely high frequencies. Instead, our most frequent lines are 51 (Compton to Wilshire/Vermont) and 16 (Downtown LA to Century City) with 720 (Wilshire/Vermont to Santa Monica) more in line with other bus frequencies.


Also it's important to note is there is an extension for the purple line that will connect Wilshire/Vermont Station all the way to the VA. Thus, it is likely that the 720 (and 16 in parallel) will become obsolete since it follows this route which is likely what LA Metro wants. Although the 16 and 720 make sense, I have no idea why the 16 is so popular considering that Compton and Wilshire/Vermont are already connected by rail, but I suppose it's the people living along the route that are using it.

In [None]:
print(bus_friday_df["hour"].value_counts())
plt.figure(figsize=(75, 50))
plt.xticks(size = 20)
plt.yticks(size = 20)
bus_df["hour"].value_counts(normalize = True).plot.bar()

9am is the busiest time of the day

## Visualization
### Part 1 - LA Metro System Animation

Here I've created a function to create a folium map of Los Angeles

In [None]:
def create_LA_map ():
    la_map = folium.Map(location=[34.0522,-118.2437],
                                       zoom_start=10,
                                       tiles='CartoDB dark_matter')
    return la_map

create_LA_map()

In [None]:
def pick_color(direction):
    if direction == 'North':
        color = 'blue'
    elif direction == 'East':
        color = 'yellow'
    elif direction == 'South':
        color = 'red'
    elif direction == 'West':
        color = 'green'
    elif direction == 'Clockwis':
        color = 'white'
    elif direction == 'Xcounter':
        color = 'purple'
    else:
        color = 'pink'
    
    return color

In [None]:
def plot_bus(lat, lon, color, m):
    folium.CircleMarker(location = (lat, lon),
                        radius = 2,
                        fill = False,
                        weight = 1,
                        color = color
                       ).add_to(m)

In [None]:
def map_to_png(bus_temp_df, direction, day):
    dirPath = '../Images/Bus ' + day + ' ' + direction # + '/' + str(image_no) + '.png' #change day here
    print(dirPath)
    if not os.path.isdir(dirPath):
        print('The directory is not present. Creating a new one..')
        os.mkdir(dirPath)
    else:
        print('The directory is present.')
    
    all_directions = False
    if direction == "All":
        all_directions == True
    bus_map = create_LA_map()
    curr_time = bus_temp_df['grouped time'].iloc[0]
    image_no = 1
    total_count = 0
    subset_count = 0

    for index, bus in bus_temp_df.iterrows():
        total_count += 1
        if all_directions == True:
            color = pick_color(bus["direction"])
        else:
            if bus["direction"] == direction:
                subset_count += 1
                color = pick_color(bus["direction"])
            else:
                color = 'gray'
        if curr_time == bus["grouped time"]:
            plot_bus(bus["latitude"], bus["longitude"], color, bus_map)

        else:
            img_data = bus_map._to_png()
            img = Image.open(io.BytesIO(img_data))
            draw = ImageDraw.ImageDraw(img)
            font = ImageFont.truetype("../Roboto/Roboto-Light.ttf", 30)
            draw.text((20,img.height - 50), 
                      str(curr_time),
                      fill=(255, 255, 255),
                      font = font)
            draw.text((img.width - 400,20),
                       "Total: " + str(total_count) + '\n' + "Subset: " + str(subset_count),
                       fill = (255, 255, 255),
                      font = font)
            name = dirPath + '/' + str(image_no) + '.png' #change day here
            img.save(name)
            image_no += 1
            #break
            #reset the map
            bus_map = create_LA_map()
            #change the time
            curr_time = bus["grouped time"]
            #plot
            total_count = 1
            subset_count = 0
            if bus["direction"] == direction:
                subset_count += 1
                color = pick_color(bus["direction"])
            else:
                color = 'gray'
            plot_bus(bus["latitude"], bus["longitude"], color, bus_map)
    #print(count)
    return bus_map

In [None]:
direction = "South" #Change this value
day = "Friday"
bus_map = map_to_png(bus_friday_df, direction, day)
bus_map

We will now convert all of the images to a gif

In [None]:
# filepaths
fp_in = "../Images/Bus " + day + ' ' + direction + "/*.png"
fp_out = "../Animations/Bus " + day + ' ' + direction + " Animation.gif"

# https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#gif
img, *imgs = [Image.open(f) for f in natsorted(glob.glob(fp_in))]
img.save(fp=fp_out, format='GIF', append_images=imgs,
         save_all=True, duration=500, loop=0)

In [None]:
# count = 1;
# for index, bus in bus_temp_df.iterrows():
#     color = pick_color_num(count)
#     bus_map = map_to_png(bus_temp_df, direction, day)
#     bus_map

In [None]:
# pop_routes = ["51", "16", "720", "18", "28"]
# bus_friday_df['route'] = bus_friday_df['route'].astype(str)

# top_5_index = bus_friday_df[bus_friday_df['route'] != pop_routes[0] | 
#                             bus_friday_df['route'] != pop_routes[1] |
#                             bus_friday_df['route'] != pop_routes[2] |
#                             bus_friday_df['route'] != pop_routes[3] |
#                             bus_friday_df['route'] != pop_routes[4]
#                            ].index
# len(top_5_index)

## Part 2: Heatmaps

Although the animation is neat, the data obtained had long intervals sets (around 3 minutes). Thus, the gif was not very smooth. So, let's try heat maps instead.

In [None]:
la_map = create_LA_map()
heat_data = [[row['latitude'],row['longitude']] for index, row in bus_friday_df.iterrows()]
HeatMap(heat_data).add_to(la_map)
la_map

There is so much information that it is mostly red. So, let's tone down the radius and look at specific hours.

In [None]:
def create_heat_map(complete_df, hour):
    df = complete_df.copy()
    la_map = folium.Map(location=[34.0522,-118.2437],
                                       zoom_start=10,
                                       tiles='CartoDB dark_matter')
    df.drop(df[df["hour"] != hour].index, inplace = True)
    heat_data = [[row['latitude'],row['longitude']] for index, row in df.iterrows()]
    HeatMap(heat_data, radius=15).add_to(la_map)
    return la_map

In [None]:
create_heat_map(bus_friday_df, 9)

In [None]:
create_heat_map(bus_friday_df, 11)

In [None]:
create_heat_map(bus_friday_df, 15)

In [None]:
create_heat_map(bus_friday_df, 17)

## Part 3: Time Lapse Heatmap

Here I will animate the heatmap for Friday (an ordinary day in LA)

In [None]:
def timed_heatmap(day):
    template_df = pd.DataFrame(columns = ['latitude', 'longitude'])
    heat_map_df = template_df

    image_no = 1
    total_count = 0

    curr_time = bus_friday_df['grouped time'].iloc[0]
    image_no = 1
    for index, bus in bus_friday_df.iterrows():
        if curr_time == bus['grouped time']:
            total_count += 1
            new_bus = {
                   'latitude' : bus['latitude'],
                   'longitude' : bus['longitude']
            }
            heat_map_df = heat_map_df.append(new_bus, ignore_index=True)
        else:
            print("cool")
            bus_map = create_LA_map()
            heat_data = [[row['latitude'],row['longitude']] for index, row in heat_map_df.iterrows()]
            HeatMap(heat_data, radius=15).add_to(bus_map)
            img_data = bus_map._to_png()
            img = Image.open(io.BytesIO(img_data))
            draw = ImageDraw.ImageDraw(img)
            font = ImageFont.truetype("../Roboto/Roboto-Light.ttf", 30)
            draw.text((20,img.height - 50), 
                      str(curr_time),
                      fill=(255, 255, 255),
                      font = font)
            draw.text((img.width - 400,20),
                       "Total: " + str(total_count),
                       fill = (255, 255, 255),
                      font = font)
            name = '../Images/Friday Heat Map/' + str(image_no) + '.png' #change day here
            img.save(name)
            image_no += 1
            total_count = 1
            curr_time = bus["grouped time"]
            heat_map_df = template_df
            

In [None]:
timed_heatmap("Friday")

In [None]:
# filepaths
fp_in = "../Images/Friday Heat Map/*.png"
fp_out = "../Animations/Friday Heat Map.gif"

# https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#gif
img, *imgs = [Image.open(f) for f in natsorted(glob.glob(fp_in))]
img.save(fp=fp_out, format='GIF', append_images=imgs,
         save_all=True, duration=500, loop=0)

If you open the gif in the animations folder you will see that Downtown is the most red. To be honest I was surprised at how impacted DTLA was. This exemplifies DTLA's importance to the city of LA and LA county.