Skip to content

NYC Taxi dataset evaluation #3

@AayushBarhate

Description

@AayushBarhate

What

Download NYC Taxi Trip data (3 months), build Kafka replayer, run LSTM + all baselines on real data.

Why

Proves LSTM generalizes beyond synthetic training data. Real temporal patterns (rush hours, event nights) are structurally different from synthetic bursts. Reviewers will reject synthetic-only evaluation.

Steps

  1. Download 3 months Yellow Taxi from https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
  2. Write replayer: read tpep_pickup_datetime, send to Kafka at proportional rates
  3. Run all approaches on replayed data
  4. Measure per-event latency CDFs
  5. Measure LSTM prediction MAE on real rate patterns

Acceptance criteria

  • LSTM MAE on Taxi data < 20% of baseline rate
  • Latency CDFs for all approaches
  • Adaptive beats fixed during rush-hour bursts

Metadata

Metadata

Assignees

No one assigned

    Labels

    criticalMust have for paperevaluationRunning experiments

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions