# Mapping LA's Public Toilets
## Kevin Liu, Paola Tirado Escareno, Carolyn Chu
## UP 229: Urban Data Science, Spring 2022

**Introduction**

In the early months of the COVID-19 shutdown, restaurants, libraries, and gas stations closed their bathroom facilities due to the unknown nature of how the virus spread. For much of the general public, finding a restroom to use while out of their home was a novel problem borne by public health response. Though for unhoused people, this challenge precedes the pandemic, and the responses by many (most) city governments have been inadequate. The City of Los Angeles initially set up [363 hand washing stations and 182 portable toilets](https://www.latimes.com/opinion/story/2021-07-25/editorial-los-angeles-homeless-bathrooms) across the city. Since then, the city has struggled with maintenance and has faced backlash when trying to remove them (see [here](https://laist.com/news/housing-homelessness/la-city-council-to-consider-reinstating-porta-potties-for-unhoused-angelenos) or [here](https://www.lataco.com/porta-potty-homeless/)). Still, access to clean, accessible restrooms remains an ongoing and dire issue for unhoused people in our communities. In this research project, we examine the access and distribution to public restrooms by comparing their locations to the 2020 Homeless Count data from L.A. County.

Our research questions include, "What's the distribution of publicly available toilets in Los Angeles and how does this compare to the distribution of unhoused people in the county? Through spatial clustering, can we find patterns in this distribution? What kinds of correlation do we find between publicly available toilets and race/ethnicity of certain census tracts?" 

**Data Preparation**

Previous to this notebook, we compiled a dataset of publicly available toilets from a variety of data sources including LACAN, Los Angeles Recreation and Parks, and the L.A. and Los Angeles County Library systems. We also joined the Los Angeles County Homeless Count Data (2020) with 5-Year American Community Survey (ACS) population counts and census tract geometries (which we acquired from the census API and LA County Open Data Portal, respectively). We bring these datsets in at the beginning of this notebook.

In [None]:
import pandas as pd
import geopandas as gpd
import contextily as ctx
import matplotlib.pyplot as plt

### Import Data

In [None]:
#upload toilet data
toilets = gpd.read_file('alltoilets.geojson')
toilets.sample(10)

The toilets dataset is a combination of all of the various LAHSA recorded handwashing stations, LACAN recorded handwashing stations, park bathrooms, and libraries. We code the data points by type, allowing us to later on map out the various types of hygiene stations all across the city and county.

In [None]:
#upload homeless data count
counts = gpd.read_file('homelesscount_tracts.geojson').to_crs('EPSG:4326')
counts.head()

The homelesscount_tracts dataset is a joined json containing both the [2020 Economic Roundtable Homeless Count by Census Tract data](https://economicrt.org/publication/los-angeles-county-homeless-count-data-library/), and the 2019 ACS 5 Year Estimate Total Population of LA County divided by census tract. The 2020 Homeless Count data utilizes the older census tracts from the 2019 ACS 5-year as opposed to the new 2020 tracts. So, we joined the 2020 Homeless Count data with the 2019 ACS 5-year data on the census tract.

In [None]:
#upload race data
raceGdf = gpd.read_file('PTE Folder/race.geojson')
raceGdf.head()

The raceGdf dataset is 2019 ACS data containing multiple variables regarding race. Each ethnicity's percentage of the total population was calculated and put into new columns, and then joined with the 2010 Census tracts for geometry data. 

In [None]:
#upload income data

incomeGdf = gpd.read_file('PTE Folder/income.geojson')
incomeGdf.head()

The incomeGdf dataset is 2019 ACS estimate of median household income data. Null values and 0's have been dropped, and then joined with 2010 Census tracts for geometry data.

### Demographics of LA County and Hygiene Station Availability

Let's take a look at our toilets dataset overlayed across some demographic data around Los Angeles County. First we'll take a look at race, specifically % BIPOC population.

In [None]:
#LA county wide map

f, ax = plt.subplots(figsize=(15,15))

toilets.plot('type', ax=ax, marker='o', markersize=20, legend=True)

raceGdf.plot('Percent Non white', 
            ax=ax, 
            cmap='RdPu', 
            legend=True, 
            legend_kwds={'orientation': 'vertical'}, 
            alpha=0.3, 
            label='Count')
ax.set_xlim([-118.721783, -118.08]) 
ax.set_ylim([33.68, 34.4])
ax.set_title('Percent Black, Indidenous, People of Color (BIPOC) with Santitation Stations in LA County', fontsize=20)

ctx.add_basemap(ax,crs='EPSG:4326', source=ctx.providers.Stamen.TonerLite)

plt.show()

It's no surprise that most of the City of LA is majority BIPOC, and the hygiene station data doesn't seem to have any meaningful relationship to % BIPOC population. While most hygeien station data points are in majority BIPOC tracts, some areas like the southern part of the Valley, Westwood, and Venice are all majority white areas that still have a decent amount of hygiene stations. It is interesting to see how much whiter the westside of LA is, and also the outskirts of the Valley.

Second, let's take a look at median household income data.

In [None]:
f, ax = plt.subplots(figsize=(15,15))

toilets.plot('type', ax=ax, marker='o', markersize=20, legend=True)

incomeGdf.plot('median_income', 
            ax=ax, 
            cmap='RdPu', 
            legend=True, 
            legend_kwds={'orientation': 'vertical'}, 
            alpha=0.3, 
            label='Count')
ax.set_xlim([-118.721783, -118.08]) 
ax.set_ylim([33.68, 34.4])
ax.set_title('Median Income with Santitation Stations in LA County', fontsize=25)

ctx.add_basemap(ax,crs='EPSG:4326', source=ctx.providers.Stamen.TonerLite)

plt.show()

The immediate relationship that stands out is that some of the extremely wealthy communities have public hygiene stations available, like Rancho Palos Verdes, the tract immediately Southwest of Griffith Park, all of the Santa Monica Mountains (Topanga, Bel Air, etc), Redondo/Hermosa Beach, and the Rose Bowl Area of Pasadena. There's also a noticeable gap of hygiene station availability for lower income tracts between West AThens and Long Beach.

### Spatial Join of Toilets and Point in Time Count

To get a sense of how many toilets are available to each houseless person within a given census tract, we spatially join toilets and homelesscount_tracts, count the number of toilets within each census tract, and divide the houseless population in the tract by the number of toilets.

In [None]:
joinedGdf = gpd.sjoin(counts, toilets, how='left', predicate = 'intersects')
joinedGdf.head()

In [None]:
#check whether lengths are ok - looks reasonable
print('counts length: {}'.format(len(counts)))
print('toilets length: {}'.format(len(toilets)))
print('joinedGdf length: {}'.format(len(joinedGdf)))


In [None]:
#right join to the toilets, so drops all census tracts w/o toilets
joinedGdf2 = gpd.sjoin(counts, toilets, how='right', predicate='intersects')

In [None]:
#check that it looks correct
#joinedGdf2.head()

In [None]:
#bring back the original joinedGdf
#set index in preparation of joining n_TOILETS
joinedGdf.set_index('short_geoid', inplace=True)

In [None]:
#count the number of toilets per census tract
#creates a series
n_toilets = joinedGdf2.groupby('short_geoid').size()

#convert the series into a df
n_TOILETS=pd.DataFrame(n_toilets)

In [None]:
#rename the column to something that makes more sense
n_TOILETS.columns = ['number_toilets']

In [None]:
#join
joinedGdf3=joinedGdf.join(n_TOILETS)

In [None]:
#fill all NaN's with zero
joinedGdf3['number_toilets'] = joinedGdf3['number_toilets'].fillna(0)

In [None]:
#joinedGdf3.head()

In [None]:
#calculate a ratio of # of people per public toilet/handwashing station
joinedGdf3['toilet_ratio'] = joinedGdf3['homeless_count']/joinedGdf3['number_toilets']
joinedGdf3.head()

In [None]:
# Replacing all inf values in the dataframe with zero

import numpy as np
joinedGdf3 = joinedGdf3.replace([np.inf, -np.inf], 0)

### Exploring the toilet ratio

In this section, we examine our data by sorting, wrangling, and mapping.

In [None]:
#sort by census tracts with highest number of toilets
toilets_descending = joinedGdf3.sort_values(by='number_toilets',ascending = False)
toilets_descending.head(5)

In the lines of code and output above, we see that areas around Skid Row have highest concentrations of toilets, showers, and refresh spots. In this area, there are approximately 17 people to 1 restroom. (See toilet_ratio column.)

In [None]:
# which tracts had the highest ratio of houseless people per toilet
ratio_descending = joinedGdf3.sort_values(by='toilet_ratio',ascending = False)
ratio_descending.head(3) #chose 3 to shorten the length of this notebook


In [None]:
ratio_ascending = joinedGdf3.sort_values(by='toilet_ratio',ascending = True)
ratio_ascending.head(3) #chose 3 to shorten the length of this notebook


Most census tracts have 0-1 toilet in the entire tract. Therefore, we're seeing ratios of 0 toilets per 7,000+ unhoused people, or 2331 unhoused people to 1 toilet (in Eagle Rock, for example). However, we can likely assume that unhoused people in these tracts are likely using some publicly available restrooms elsewhere (in addition to using public streets, alleys, or other public spaces as restrooms). This illustrates one of our key limitations: our dataset of publicly available restrooms is not comprehensive.

In [None]:
# Mapping
import matplotlib.pyplot as plt
import contextily as ctx

#LA county wide map

f, ax = plt.subplots(figsize=(15,15))

joinedGdf3.plot('toilet_ratio', 
            ax=ax, 
            cmap='Reds', 
            legend=True, 
            legend_kwds={'orientation': 'vertical'}, 
            alpha=0.5, 
            label='Count')
ax.set_xlim([-118.721783, -118.08]) 
ax.set_ylim([33.68, 34.4])
ax.set_title('LA County Houseless People per Hygiene Station', fontsize=25)

ctx.add_basemap(ax,crs='EPSG:4326', source=ctx.providers.Stamen.TonerLite)

plt.show()

We see that areas that we would assume to be relatively well covered in terms of houseless services (Skid Row/Downtown, parts of the Valley) do seem to have lower overall people per hygiene station. Similarly, some of the more affluent areas such as Marina Del Rey, northern parts of the Valley, and Eagle Rock/Glendale seem to have higher people per hygiene station. But areas of South LA and NELA/East LA do look to be struggling with higher people per hygiene station.

### Exploring the datasets spatially

We also wanted to overlay the toilets data with the number of houseless individuals per census tract, with the aim at identifying certain tracts that have high houseless populations and a lack of hygiene stations available.

In [None]:
#LA county wide map
import contextily as ctx
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize=(20,20))

toilets.plot('type', ax=ax, marker='o', markersize=20, legend=True)

counts.plot('homeless_count', 
            ax=ax, 
            cmap='RdPu', 
            legend=True, 
            legend_kwds={'orientation':'vertical', 'label':'Houseless Count'}, 
            alpha=0.3, 
            label='Count')
ax.set_xlim([-118.721783, -118.08]) 
ax.set_ylim([33.68, 34.4])
ax.set_title('2020 Houseless Count Data with Hygiene Stations in LA County', fontsize=25)

ctx.add_basemap(ax,crs='EPSG:4326', source=ctx.providers.Stamen.TonerLite, zoom=12)

plt.show()

First, we take a wider look at all of the parts of LA County where our data is represented. We leave out the Antelope Valley and Catalina because we don't have any hygiene station data from those areas. <br>
<br>
Just from this zoomed out view of the datasets, we see that the tracts with the largest houseless populations are sprinkled throughout the County without any readily identifiable trends at first glance. We see some tracts with high houseless populations in the San Fernando Valley, particularly in the southern side up against the Studio City area. It seems that the most high concentration tracts occur throughout Central Los Angeles, Downtown LA, East LA, and the Mid City area, but also some parts of Culver-West and Westchester.<br>
<br>
To get an idea of how the toilets dataset looks countywide, we also conduct some spatial clustering below.

In [None]:
# Splitting out our geoemtry column into separate lat long columns
gdf = toilets
gdf['lon'] = gdf.geometry.apply(lambda p: p.x)
gdf['lat'] = gdf.geometry.apply(lambda p: p.y)

# Running kmeans to create the cluster_id column based on the lat long
# Selecting 5 clusters to begin with, knowing that we have around 5-6 different concentrations of data

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=5, random_state=1).fit(gdf[['lat','lon']])
gdf['cluster_id'] = kmeans.labels_

# Plotting

fig, ax = plt.subplots(figsize=(10,10))
gdf.to_crs('EPSG:3857').plot('cluster_id', categorical=True, legend=True, 
                                   ax=ax, alpha=0.4)

ctx.add_basemap(ax, source=ctx.providers.Stamen.TonerLite)
ax.set_title('Spatial Clustering All Hygiene Stations', fontsize=20)                           

ax.set_xticks([])
ax.set_yticks([])

It looks like the clusters breakdown as:

0 - Central & East LA <br>
1 - West San Fernando Valley <br>
2 - West LA <br>
3 - East San Fernando Valley <br>
4 - South LA and South Bay
<br>


Seeing as we have some clear clusters around Central LA, East LA and the San Fernando Valley, let's take a closer look at each of these neighborhoods

In [None]:
# Central LA specific map

f, ax = plt.subplots(figsize=(20,20))

toilets.plot('type', ax=ax, marker='o', markersize=20, legend=True)

counts.plot('homeless_count', 
            ax=ax, 
            cmap='RdPu', 
            legend=True, 
            legend_kwds={'orientation': 'vertical', 'label':'Houseless Count'}, 
            alpha=0.25, 
            label='Count')
ax.set_xlim([-118.40, -118.26]) 
ax.set_ylim([34.003307, 34.121767])
ax.set_title('2020 Houseless Count Data with Hygiene Stations in Central LA', fontsize=25)

ctx.add_basemap(ax,crs='EPSG:4326', source=ctx.providers.Stamen.TonerLite)

plt.show()


Acknowledging that we do not have information about every bathroom available in LA, there is a noticeable gap in hygiene station availability in the Mid City area. Notably, the LAHSA Handwashing Stations (light green dots) almost seem to avoid that area entirely, but are in areas directly adjacent to Mid City like Koreatown or Culver City. In general, we see a decline of hygiene station density as we move west from Downtown, with zero community handwashing stations (blue dots) west of Westlake and MacArthur Park.


In [None]:
# East LA specific map

f, ax = plt.subplots(figsize=(20,20))

toilets.plot('type', ax=ax, marker='o', markersize=20, legend=True)

counts.plot('homeless_count', 
            ax=ax, 
            cmap='RdPu', 
            legend=True, 
            legend_kwds={'orientation': 'vertical', 'label':'Houseless Count'}, 
            alpha=0.25)
ax.set_xlim([-118.293942, -118.12]) 
ax.set_ylim([33.986563, 34.105046])
ax.set_title('2020 Houseless Count Data with Hygiene Stations in Downtown & East LA', fontsize=25)

ctx.add_basemap(ax,crs='EPSG:4326', source=ctx.providers.Stamen.TonerLite)

plt.show()

In Boyle Heights, we see a large houseless population with no LAHSA handwashing stations, no community stations, and no Shower or Refresh spots. The only data points we have for publicly available restrooms are from the LA City Library and a park. Downtown has the largest concentration of hygiene stations out of all of the neighborhoods in our dataset, and the density of hygiene stations dissapates as we move out of Downtown. 
<br>
<br>
The Westlake and MacArthur Park area has an especially large concentration of LAHSA Handwashing Stations, but there is a noticeable deadzone in the Historic Filipinotown area where there is a larger concentration of unhoused folks.
<br>
<br>
Interestingly, the data Downtown seems to suggest that hygiene stations are placed just outside of where most houseless people were counted. This can be chalked up to the point data from the houseless dataset being right on the border between census tracts;.

In [None]:
# Valley

f, ax = plt.subplots(figsize=(20,20))

toilets.plot('type', ax=ax, marker='o', markersize=20, legend=True)

counts.plot('homeless_count', 
            ax=ax, 
            cmap='RdPu', 
            legend=True, 
            legend_kwds={'orientation': 'vertical', 'label':'Houseless Count'}, 
            alpha=0.25, 
            label='Count')
ax.set_xlim([-118.68, -118.35]) 
ax.set_ylim([34.13, 34.335567])
ax.set_title('2020 Houseless Count Data with Hygiene Stations in the Valley', fontsize=25)

ctx.add_basemap(ax,crs='EPSG:4326', source=ctx.providers.Stamen.TonerLite)

plt.show()

In the Valley, LAHSA Handwashing stations seem to be better placed in response to higher houseless populations. However, there is a noticeable disparity between houseless count and hygiene station availability in the Granada Hills South area. 
<br>
<br>
We also see LA City Library branches being positioned in tracts with large houseless populations, but because of the way that libraries are often policed, we remain doubtful that houselss folks are permitted to freely utilize libraries as hygiene stations.

In [None]:
# South LA

f, ax = plt.subplots(figsize=(20,20))

toilets.plot('type', ax=ax, marker='o', markersize=20, legend=True)

counts.plot('homeless_count', 
            ax=ax, 
            cmap='RdPu', 
            legend=True, 
            legend_kwds={'orientation': 'vertical', 'label':'Houseless Count'}, 
            alpha=0.25, 
            label='Count')
ax.set_xlim([-118.38, -118.12910]) 
ax.set_ylim([33.69, 34.06])
ax.set_title('2020 Houseless Count Data with Hygiene Stations in South LA and South Bay', fontsize=25)

ctx.add_basemap(ax,crs='EPSG:4326', source=ctx.providers.Stamen.TonerLite, zoom=13)

plt.show()

South LA and South Park are pretty well covered by LAHSA, but that stops as we move south of the I-105 Freeway, where there are quite high concentrations of houseless populations who are relying on just a couple of LA County Library branches.
<br>
<br>
South Alameda St also seems to be a stark divider for houseless counts. Tracts west of South Alameda St show higher concentrations of houseless folks, and those east of Alameda are relatively less dense. The number of hygiene stations available reflects this breakdown, as LAHSA Handwashing Stations go from nearly 0 east of Alameda to quite numerous west of Alameda.

## Takeaways & Concluding Thoughts

Because our list of hygiene stations is by no means a complete list of all publicy available stations in LA County, it is difficult to draw any sweeping conclusions about the availability of hygiene stations for our houseless neighbors. The houseless count data also has its own flaws, but the for the most part seems to reflect reality.
<br>
<br>
Utilizing the data we do have, it does seem that there are a few specific areas of the city and the county where more LAHSA handwashing stations can be made available. Some of the most egregious people per hygiene station numbers occur in Mid City, East LA, and South LA. Additionally, our dataset shows that LA Parks can be a particularly valuable source of hygiene stations in areas that are largely without options. Unfortunately, access to Parks, as well as City and County Libraries, is restricted by policing, and as a result, an untrustworthy source of hygiene for houseless folks.
<br>
<br>
Both hygiene stations and houseless folks are inherently difficult to produce data about, but hopefully our work here shows some of the ways that we can continue to work towards understanding more about houseless communities' access to basic hygiene. Readily available online maps currently do not display this variety of hygiene station types. We are building upon the work of LACAN, who built upon the LAHSA stations list, and we hope that this type of mapping will continue. Access to basic hygiene is a human right, and hopefully through mapping we can keep authorities like LAHSA and the City of LA accountable to where hygiene services are needed the most.