nyc-taxi-fare-prediction

Google Cloud Playground Prediction Competition

Competition on Kaggle: https://www.kaggle.com/c/new-york-city-taxi-fare-prediction

In this playground competition, hosted in partnership with Google Cloud and Coursera, you are tasked with predicting the fare amount (inclusive of tolls) for a taxi ride in New York City given the pickup and dropoff locations. While you can get a basic estimate based on just the distance between the two points, this will result in an RMSE of $5-$8, depending on the model used. Your challenge is to do better than this using Machine Learning techniques!

File descriptions

train.csv - Input features and target fare_amount values for the training set (about 55M rows).
test.csv - Input features for the test set (about 10K rows). Your goal is to predict fare_amount for each row.
sample_submission.csv - a sample submission file in the correct format (columns key and fare_amount). This file 'predicts' fare_amount to be $11.35 for all rows, which is the mean fare_amount from the training set.

Data fields

key - Unique string identifying each row in both the training and test sets. Comprised of pickup_datetime plus a unique integer, but this doesn't matter, it should just be used as a unique ID field. Required in your submission CSV. Not necessarily needed in the training set, but could be useful to simulate a 'submission file' while doing cross-validation within the training set.
pickup_datetime - timestamp value indicating when the taxi ride started.
pickup_longitude - float for longitude coordinate of where the taxi ride started.
pickup_latitude - float for latitude coordinate of where the taxi ride started.
dropoff_longitude - float for longitude coordinate of where the taxi ride ended.
dropoff_latitude - float for latitude coordinate of where the taxi ride ended.
passenger_count - integer indicating the number of passengers in the taxi ride.
fare_amount - float dollar amount of the cost of the taxi ride. This value is only in the training set; this is what you are predicting in the test set and it is required in your submission CSV.

Dataset Download

https://www.kaggle.com/c/new-york-city-taxi-fare-prediction/data

RAPIDS Library

https://www.kaggle.com/cdeotte/rapids

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
nyc-taxi-fare-rapids-dask-gpu.ipynb		nyc-taxi-fare-rapids-dask-gpu.ipynb
nyc-taxi-fare.ipynb		nyc-taxi-fare.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

nyc-taxi-fare-rapids-dask-gpu.ipynb

nyc-taxi-fare-rapids-dask-gpu.ipynb

nyc-taxi-fare.ipynb

nyc-taxi-fare.ipynb

Repository files navigation

nyc-taxi-fare-prediction

Google Cloud Playground Prediction Competition

File descriptions

Data fields

Dataset Download

RAPIDS Library

About

Releases

Packages

Languages

License

allenkong221/nyc-taxi-fare-prediction

Folders and files

Latest commit

History

Repository files navigation

nyc-taxi-fare-prediction

Google Cloud Playground Prediction Competition

File descriptions

Data fields

Dataset Download

RAPIDS Library

About

Topics

Resources

License

Stars

Watchers

Forks

Languages