# Density and public transportation

Sufficient population and job density are important for a public transportation system. Therefore plotting densities and public transportation systems on the same map can reveal opportuntities to build more public transportation lines (where density is high, but there are no existing lines) and to build more housing/jobs (where public transportation lines exist, but densities are low).

First, let's create a map of population density. In the United States, we can get this geo-tagged population data from the United States Census Bureau. dpd.modeling has a class Zones to store this data and a method to automatically pull this data from the United States Census Bureau.

First, we'll define the counties that we are interested in mapping and the latest year for US Census data.

States seems to be the best level to get data for. If we get data by county, there are lots of requests so it takes too long. However, data for the whole country is unnecessary. The government assigns a number to each state so California is 06.

Now we can get the census data for California. B01003_001E is the population in each census tract.

Next we compute density as population per area.

And we filter for the counties we listed above. This way we can change our filter below and not have to redownload all the data.

Now we download our public transportation systems so we can plot the lines.

And finally we plot everything. (Image omitted to reduce file size.)

Now, it would be helpful to do the same exercise with job densiites. However, the US Census Bureau does not provide this information ("Worker Population": "B08604_001E") at the tract level like they do for population. This leaves us with two options which both require large downloads.

1. We can download zip code worker populations.
2. Or we can download LODES data which includes origin-destination information.

We'll take a look at option 2 below. The LODES data is divided into three files: residential data, work data, and origin-destination data. There is also a cross-walk file that includes a translation from LODES GEOIDs to census tracts.

We can then combine the LODES data with our original output DataFrame (which includes the geometry) to add a job_density column.

And we can plot the job density like we ploted the population density above.

Last, we can evaluate the sum of population density and job density. This sum gives the best measure of the number of potential public transportation users in each census tract.

In [None]:
import us
import ipywidgets as widgets
from IPython.display import display

In [None]:
YEAR = "2017"

state = widgets.Select(
    options=list(map(lambda x: x.name, us.STATES)),
    description="State",
    value="California",
)
display(state)

In [None]:
from dpd.modeling import Zones

zones = Zones.from_uscensus(str(us.states.lookup(state.value).fips), YEAR)

In [None]:
zones["geometry"] = zones["geometry"].apply(lambda x: x.simplify(0.001))

In [None]:
zones.head(1)

In [None]:
# zones.explore(column="ProductionAttractionSum Density")

In [None]:
from tobler.util import h3fy
from tobler.area_weighted import area_interpolate
from pyproj import CRS

aea = CRS.from_string("North America Albers Equal Area Conic")
zones.to_crs(aea)
h3_zones = h3fy(zones, buffer=True)

dc_hex_interpolated = area_interpolate(
    source_df=zones,
    target_df=h3_zones,
    intensive_variables=["Production", "Attraction", "ALAND"],
)
dc_hex_interpolated.head()

In [None]:
# Zones(dc_hex_interpolated).explore(column="ProductionAttractionSum Density")

In [None]:
h3_zones = h3fy(zones, resolution=8, buffer=True)

gdf = area_interpolate(
    source_df=zones,
    target_df=h3_zones,
    intensive_variables=["Production", "Attraction", "ALAND"],
)
gdf = Zones(gdf)
gdf.head()

In [None]:
from lonboard import Map, HeatmapLayer, SolidPolygonLayer
from lonboard.colormap import apply_continuous_cmap
from palettable.matplotlib import Viridis_20

layer = SolidPolygonLayer.from_geopandas(gdf, opacity=0.2)
df = gdf["ProductionAttractionSum Density"]
normalized_df = (df - df.min()) / (df.max() - df.min())

layer.get_fill_color = apply_continuous_cmap(normalized_df, Viridis_20)

m = Map(layers=[layer])
m

In [None]:
from dpd.driving.network import Network

query = """
[out:json][timeout:25];
(
  relation["network"="Metro Rail"];

);
out body;
>;
out skel qt;
"""

network = Network.from_osm_query(query)

In [None]:
import folium

folium_map = folium.Map()
zones.explore(m=folium_map, column="ProductionAttractionSum Density")
for route in network.routes:
    network.routes[route].explore(m=folium_map)

# folium_map

In [None]:
import gtfs_kit

feed = gtfs_kit.read_feed(
    # "http://www.bart.gov/dev/schedules/google_transit.zip", dist_units="mi"
    "https://gtfs.sfmta.com/transitdata/google_transit.zip",
    dist_units="mi",
)

In [None]:
feed.routes

In [None]:
from dpd.driving import Network

network = Network.from_gtfs(feed)

In [None]:
from dpd.modeling import TripDataFrame

In [None]:
od = TripDataFrame.from_lodes(us.states.lookup(state.value).abbr.lower(), YEAR)

In [None]:
od.head()

In [None]:
zones = zones.production_attraction_from_lodes(od)

In [None]:
h3_zones = h3fy(zones, resolution=8, buffer=True)

gdf = area_interpolate(
    source_df=zones,
    target_df=h3_zones,
    intensive_variables=["Production", "Attraction", "ALAND"],
)
gdf = Zones(gdf)
gdf.head()

In [None]:
network.routes["1"]

In [None]:
from lonboard import Map, PathLayer, ScatterplotLayer
from lonboard.colormap import apply_continuous_cmap
from palettable.matplotlib import Viridis_20

layer = SolidPolygonLayer.from_geopandas(gdf, opacity=0.2)
df = gdf["ProductionAttractionSum Density"]
normalized_df = (df - df.min()) / (df.max() - df.min())

layer.get_fill_color = apply_continuous_cmap(normalized_df, Viridis_20)
layers = [layer]
for route in network.routes:
    layers.append(
        ScatterplotLayer.from_geopandas(network.routes[route], radius_min_pixels=1)
    )
m = Map(layers=layers)
m

In [None]:
from astropy import units
from dpd.driving import Route

route = Route.from_osm_relation(relation=2351006)

In [None]:
from dpd.modeling import DistanceDataFrame

zones.to_crs("North_America_Albers_Equal_Area_Conic", inplace=True)
points = zones.polygons_to_points()
stops = route.stops.to_crs("North_America_Albers_Equal_Area_Conic")
distance_dataframe = DistanceDataFrame.from_origins_destinations(
    points.geometry, stops.geometry, method="distance"
)

In [None]:
points

In [None]:
distance_dataframe.columns = stops.name
distance_dataframe

In [None]:
times = [5, 10, 15]
data = []
for column in distance_dataframe.columns:
    row = []
    for time in times:
        # 1.35 meters/second and 60 seconds per minute
        row.append(
            points[(distance_dataframe / 1.35 < time * 60)[column]][
                "ProductionAttractionSum"
            ].sum()
        )
    data.append(row)

In [None]:
from pandas import DataFrame

DataFrame(data=data, index=distance_dataframe.columns, columns=times).plot(kind="bar")

In [None]:
from matplotlib import pyplot as plt

fig, ax = plt.subplots()
(distance_dataframe / 1.35).hist(
    weights=points["ProductionAttractionSum"],
    range=(0, 900),
    bins=30,
    cumulative=True,
    sharey=True,
    ax=ax,
)
ax.set_ylabel("Population (cumulative)")
ax.set_xlabel("Time (seconds)")