What
Download NYC Taxi Trip data (3 months), build Kafka replayer, run LSTM + all baselines on real data.
Why
Proves LSTM generalizes beyond synthetic training data. Real temporal patterns (rush hours, event nights) are structurally different from synthetic bursts. Reviewers will reject synthetic-only evaluation.
Steps
- Download 3 months Yellow Taxi from https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
- Write replayer: read
tpep_pickup_datetime, send to Kafka at proportional rates
- Run all approaches on replayed data
- Measure per-event latency CDFs
- Measure LSTM prediction MAE on real rate patterns
Acceptance criteria
- LSTM MAE on Taxi data < 20% of baseline rate
- Latency CDFs for all approaches
- Adaptive beats fixed during rush-hour bursts
What
Download NYC Taxi Trip data (3 months), build Kafka replayer, run LSTM + all baselines on real data.
Why
Proves LSTM generalizes beyond synthetic training data. Real temporal patterns (rush hours, event nights) are structurally different from synthetic bursts. Reviewers will reject synthetic-only evaluation.
Steps
tpep_pickup_datetime, send to Kafka at proportional ratesAcceptance criteria