# TransitLand API


MEETING PERIODICALLY:

Team Proposal

Team Names:


*   Erick Buitrago
*   Felix Gonzalez
*   Kyle Rodriguez






Team Meetings

*   Monday at 6:00 PM time in Zoom



Team Charter:

Beginning: Our Goal is to define our topic, gather initial data, and test API access using Python.

Process: Data collection (API), data analysis, and data visualization. (optional: simple machine learning)

Goal: Use real-world MTA data to identify trends and build visual insights.



**API**
https://www.transit.land/operators/o-dr5r-nyct

**API KEY:** "CWPkh2zfZlR5r91gaS5sgIKLicN7WiVW"

**Research Questions**
We will use Python (NumPy, Pandas, matplotlib, Seaborn), and other tools that we will learn during class, to analyze and visualize data to answer the following questions.

## 1. Who runs the most transit routes in Manhattan?
* Dataset: routes_df['agency.name']
* Graph: Bar Chart

## 2. What is the ratio of simple curbside stops to complex stations?
* Dataset: stops_df['location_type']
* Graph: Bar Chart or Pie Chart

## 3. What is the overall percentage of wheelchair-accessible stops?
* Dataset: stops_df['wheelchair_boarding']
* Graph: Pie Chart

## 4. Where are the wheelchair-accessible stops located?
* Dataset: stops_df (geometry.coordinates and wheelchair_boarding)
* Graph: Map with colored points

## 5. What are the primary transit "arteries" of Manhattan?
* Dataset: stops_df['stop_name']
* Graph: Bar Chart

## 6. Where are the areas with the fewest transit stops?
* Dataset: stops_df['geometry.coordinates']
* Graph: Heatmap

In [10]:
from datetime import datetime
print(f'Run time: {datetime.now().strftime("%D %T")}')

Run time: 11/04/25 19:21:14


### Import libraries, and run API KEY

In [15]:
import requests
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

api_key = "CWPkh2zfZlR5r91gaS5sgIKLicN7WiVW"

def fetch_transit_data(endpoint, params):
    base_url = "https://transit.land/api/v2/rest"
    headers = {"apikey": api_key}

    print(f"Fetching data from '{endpoint}'...")
    try:
        response = requests.get(base_url + endpoint, headers=headers, params=params)
        response.raise_for_status()
        df = pd.json_normalize(response.json()[endpoint.strip('/')])
        print(f"Success! Found {len(df)} results.\n")
        return df
    except Exception as e:
        print(f"An error occurred: {e}\n")
        return None

# Corrected coordinates
manhattan_params = {
    "bbox": "-74.02,40.70,-73.93,40.88",
    "limit": 1000
}

stops_df = fetch_transit_data("/stops", manhattan_params)
# ... and so on for your other analyses

Fetching data from '/stops'...
Success! Found 1000 results.



## DataSets

In [16]:
# Fetch routes data using the same function and parameters
routes_df = fetch_transit_data("/routes", manhattan_params)
if routes_df is not None:
    print(f"Routes DataFrame shape: {routes_df.shape}")
    display(routes_df.head())

print("\n" + "="*50 + "\n") # Separator for clarity

# Fetch operators data
operators_df = fetch_transit_data("/operators", manhattan_params)
if operators_df is not None:
    print(f"Operators DataFrame shape: {operators_df.shape}")
    display(operators_df.head())

Fetching data from '/routes'...
Success! Found 671 results.

Routes DataFrame shape: (671, 22)


Unnamed: 0,continuous_drop_off,continuous_pickup,id,onestop_id,route_color,route_desc,route_id,route_long_name,route_short_name,route_sort_order,...,route_url,agency.agency_id,agency.agency_name,agency.id,agency.onestop_id,feed_version.feed.id,feed_version.feed.onestop_id,feed_version.fetched_at,feed_version.id,feed_version.sha1
0,,,123854,r-dr-325,,,NYP325,BUFFALO -> NEW YORK,325,0.0,...,,,Adirondack Trailways,2689,o-dr-adirondacktrailways,3376,f-trailways~nyp~ny,2021-09-21T22:00:58.545745Z,233696,5f49e5d50167623b39cb68f58db7bcbe9f62edbb
1,,,123855,r-dr6-324,,,NYP324,NEW YORK -> SYRACUSE,324,0.0,...,,,Adirondack Trailways,2689,o-dr-adirondacktrailways,3376,f-trailways~nyp~ny,2021-09-21T22:00:58.545745Z,233696,5f49e5d50167623b39cb68f58db7bcbe9f62edbb
2,,,123856,r-dr-231,,,NYP231,ROCHESTER -> NEW YORK,231,0.0,...,,,Adirondack Trailways,2689,o-dr-adirondacktrailways,3376,f-trailways~nyp~ny,2021-09-21T22:00:58.545745Z,233696,5f49e5d50167623b39cb68f58db7bcbe9f62edbb
3,,,123857,r-dr-246,,,NYP246,NEW YORK -> ROCHESTER,246,0.0,...,,,Adirondack Trailways,2689,o-dr-adirondacktrailways,3376,f-trailways~nyp~ny,2021-09-21T22:00:58.545745Z,233696,5f49e5d50167623b39cb68f58db7bcbe9f62edbb
4,,,123858,r-dr6-349,,,NYP349,SYRACUSE -> NEW YORK,349,0.0,...,,,Adirondack Trailways,2689,o-dr-adirondacktrailways,3376,f-trailways~nyp~ny,2021-09-21T22:00:58.545745Z,233696,5f49e5d50167623b39cb68f58db7bcbe9f62edbb




Fetching data from '/operators'...
Success! Found 40 results.

Operators DataFrame shape: (40, 15)


Unnamed: 0,agencies,feeds,id,name,onestop_id,short_name,website,tags.twitter_general,tags.twitter_service_alerts,tags.wikidata_id,tags.developer_site,tags.us_ntd_id,tags,tags.us_ntd_id2,tags.omd_provider_id
0,"[{'agency_id': '51', 'agency_name': 'Amtrak', ...","[{'id': 359, 'name': None, 'onestop_id': 'f-9-...",14356151,Amtrak,o-9-amtrak,,http://www.amtrak.com,amtrak,amtrakalerts,Q23239,,,,,
1,"[{'agency_id': '1', 'agency_name': 'NYC Ferry'...","[{'id': 917, 'name': None, 'onestop_id': 'f-dr...",14356372,NYC Ferry,o-dr5r-nycferry,,https://www.ferry.nyc/,nycferry,,Q26987418,https://www.ferry.nyc/developer-tools/,22930.0,,,
2,"[{'agency_id': 'da', 'agency_name': 'Downtown ...","[{'id': 810, 'name': None, 'onestop_id': 'f-dr...",14356782,Alliance for Downtown New York,o-dr5re-downtownalliance,Downtown Connection,http://www.downtownny.com/getting-around/downt...,,,,,,,,
3,"[{'agency_id': 'LI', 'agency_name': 'Long Isla...","[{'id': 38, 'name': None, 'onestop_id': 'f-dr5...",14357229,MTA Long Island Rail Road,o-dr5-longislandrailroad,LIRR,,LIRR,,Q125943,,20100.0,,,
4,"[{'agency_id': 'MTA NYCT', 'agency_name': 'MTA...","[{'id': 5, 'name': None, 'onestop_id': 'f-dr5r...",14357237,MTA New York City Transit,o-dr5r-nyct,MTA,,mta,NYCTSubway,Q1146109,,20008.0,,20188.0,


## .

In [None]:
# Question 1: Routes per Operator


print("Question 1: Who runs the most transit routes?")

# figure size
plt.figure(figsize=(10, 8))

# Create the plot and store the plot's axes in a variable `ax`
ax = sns.countplot(
    y=routes_df['agency.agency_name'],
    order=routes_df['agency.agency_name'].value_counts().index
)

# Add the data labels to the right of each bar
for p in ax.patches:
    width = p.get_width()
    ax.text(width + 1,
            p.get_y() + p.get_height() / 2.,
            '{:1.0f}'.format(width),
            ha='left',
            va='center')

# Titles and labels
plt.title('Number of Transit Routes Operated per Agency in Manhattan')
plt.xlabel('Number of Routes')
plt.ylabel('Agency Name')

# Adjust the x-axis to make room for the labels
plt.xlim(right=routes_df['agency.agency_name'].value_counts().max() * 1.1)

plt.tight_layout()
plt.show()

In [None]:
print("\n Question 2: What is the ratio of simple stops to complex stations?")

# Counts of each location type
location_counts = stops_df['location_type'].value_counts()
print(location_counts)

# Bar chart and store the plot's axes in a variable `ax`
plt.figure(figsize=(8, 6))
ax = sns.barplot(x=location_counts.index, y=location_counts.values)

# Data labels on top of each bar
for p in ax.patches:
    ax.annotate(format(p.get_height(), '.0f'),      # The text label (as a whole number)
               (p.get_x() + p.get_width() / 2., p.get_height()), # The (x,y) coordinate to annotate
               ha = 'center', va = 'center',      # Center the text
               xytext = (0, 5),                   # Offset the text 5 points vertically
               textcoords = 'offset points')

plt.title('Count of Transit Infrastructure Types')
plt.xlabel('Type (0=Stop, 1=Station)')
plt.ylabel('Total Count')

# Extra space at the top of the chart for the labels
y_max = location_counts.max()
plt.ylim(0, y_max * 1.1)

plt.show()

In [None]:
# Question 3: Wheelchair Accessibility Percentage


print("Question 3: What is the overall percentage of wheelchair-accessible stops? ")

# Drop any stops where accessibility is unknown (NaN) to get a clean count
accessible_counts = stops_df['wheelchair_boarding'].dropna().value_counts()

print("\nRaw Counts of Accessible vs. Not Accessible Stops:")
print(accessible_counts)

# Dynamic labels that include the raw count
label_map = {False: 'Not Accessible', True: 'Accessible'}

# Labels like "Not Accessible\n(Count: 850)"
labels_with_counts = [f"{label_map[label]}\n(Count: {count})" for label, count in accessible_counts.items()]


# Create the pie chart
plt.figure(figsize=(8, 8))
plt.pie(
    accessible_counts,
    labels=labels_with_counts,   # Use the new custom labels
    autopct='%1.1f%%',           # Format the text to show percentage
    startangle=90,               # Rotates the start of the pie chart
    colors=['#ff9999','#66b3ff'] # Custom colors
)

plt.title('Overall Percentage of Wheelchair-Accessible Stops in Manhattan')
plt.axis('equal')  # Ensures the pie chart is a circle
plt.show()

## Erick B.
### 5. What are the primary transit routes in Manhattan?
* Dataset: stops_df['stop_name']
* Graph: Bar Chart


In [None]:
print("Question 5: What are the primary transit arteries in Manhattan?")

# Count the most common stop names (these often represent major transit arteries or hubs)
top_stops = stops_df['stop_name'].value_counts().head(20)
print(top_stops)

# Bar chart of the top 20 most frequent stop names
plt.figure(figsize=(12, 8))
ax = sns.barplot(y=top_stops.index, x=top_stops.values, palette='viridis')
plt.title('Top 20 Primary Transit Arteries (Stops) in Manhattan')
plt.xlabel('Number of Stops')
plt.ylabel('Stop Name')
plt.tight_layout()
plt.show()

## Erick B.
### 6. Where are the areas with the fewest transit stops?
* Dataset: stops_df['geometry.coordinates']
* Graph: Heatmap

In [None]:
print("Question 6: Where are the areas with the fewest transit stops?")

# Extract coordinates for all stops
coords = stops_df['geometry.coordinates'].dropna().tolist()
if coords and isinstance(coords[0], list):
    # Some coordinates may be [lon, lat] or nested lists; flatten if needed
    lons = [c[0] for c in coords]
    lats = [c[1] for c in coords]

    # Create a heatmap using seaborn's kdeplot
    plt.figure(figsize=(10, 10))
    sns.kdeplot(x=lons, y=lats, cmap="Reds", fill=True, bw_adjust=0.5, thresh=0.05)
    plt.title('Heatmap of Transit Stop Density in Manhattan')
    plt.xlabel('Longitude')
    plt.ylabel('Latitude')
    plt.show()
else:
    print("No valid coordinates found for stops.")