## 1. Loading the Dataset

We'll use the NYC Yellow Taxi Trip data for our dashboard. If you don't have it, the code will download a sample CSV for you.

In [1]:
import os
import pandas as pd
nyc_url = 'https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yellow_tripdata_2020-01.csv.gz'
local_path = 'yellow_tripdata_2020-01.csv.gz'
if not os.path.exists(local_path):
    import urllib.request
    print('Downloading NYC taxi data...')
    urllib.request.urlretrieve(nyc_url, local_path)
    print('Download complete.')
df = pd.read_csv(local_path, compression='gzip', low_memory=False)
df.head()

Downloading NYC taxi data...
Download complete.


Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
0,1.0,2020-01-01 00:28:15,2020-01-01 00:33:03,1.0,1.2,1.0,N,238,239,1.0,6.0,3.0,0.5,1.47,0.0,0.3,11.27,2.5
1,1.0,2020-01-01 00:35:39,2020-01-01 00:43:04,1.0,1.2,1.0,N,239,238,1.0,7.0,3.0,0.5,1.5,0.0,0.3,12.3,2.5
2,1.0,2020-01-01 00:47:41,2020-01-01 00:53:52,1.0,0.6,1.0,N,238,238,1.0,6.0,3.0,0.5,1.0,0.0,0.3,10.8,2.5
3,1.0,2020-01-01 00:55:23,2020-01-01 01:00:14,1.0,0.8,1.0,N,238,151,1.0,5.5,0.5,0.5,1.36,0.0,0.3,8.16,0.0
4,2.0,2020-01-01 00:01:58,2020-01-01 00:04:16,1.0,0.0,1.0,N,193,193,2.0,3.5,0.5,0.5,0.0,0.0,0.3,4.8,0.0


## 2. Setting Up the Dash Dashboard

We'll use Plotly Dash to build our interactive dashboard. If Dash isn't installed, the code will handle it for you. Our dashboard will let you explore trip distance, fare amount, and passenger count with interactive charts and filters.

In [2]:
# Install Dash if needed
import sys
import subprocess
try:
    import dash
except ImportError:
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'dash'])
    import dash
from dash import Dash, dcc, html, Input, Output
import plotly.express as px

In [3]:
# Prepare a smaller sample for faster dashboard rendering
sample_df = df[['passenger_count', 'trip_distance', 'fare_amount']].dropna()
sample_df = sample_df[(sample_df['passenger_count'] > 0) & (sample_df['trip_distance'] > 0) & (sample_df['fare_amount'] > 0)]
sample_df = sample_df.sample(5000, random_state=42)

In [4]:
# Create Dash app
app = Dash(__name__)

app.layout = html.Div([
    html.H2('NYC Taxi Trip Dashboard'),
    html.Label('Select Passenger Count:'),
    dcc.Dropdown(
        id='passenger-dropdown',
        options=[{'label': str(i), 'value': i} for i in sorted(sample_df['passenger_count'].unique())],
        value=1,
        clearable=False
    ),
    dcc.Graph(id='fare-vs-distance'),
])

@app.callback(
    Output('fare-vs-distance', 'figure'),
    Input('passenger-dropdown', 'value')
)
def update_graph(selected_passenger):
    filtered = sample_df[sample_df['passenger_count'] == selected_passenger]
    fig = px.scatter(filtered, x='trip_distance', y='fare_amount',
                     title=f'Fare vs Trip Distance (Passenger Count: {selected_passenger})',
                     labels={'trip_distance': 'Trip Distance (miles)', 'fare_amount': 'Fare Amount ($)'})
    return fig

## 3. Running and Exploring the Dashboard

You'll be able to interactively explore how fare amounts relate to trip distance for different passenger counts. Try changing the dropdown and see how the data changes!

In [5]:
# To run the dashboard
app.run(debug=True)

<IPython.core.display.Javascript object>