Bicing Availability Predictor

Predict how many bikes will be available at a Bicing station in Barcelona 15 minutes from now.

This is a small but complete time-series ML project: data pipeline, feature engineering, proper time-based train/test split, a persistence baseline, three sklearn/LightGBM models, evaluation, saving, prediction, and tests.

Why this problem

Bicing is the public bike-sharing service in Barcelona. If you've ever cycled up to a station at 8:55 and found zero bikes, or rolled up to drop your bike off at 9:05 and found zero docks, you already know why predicting availability a few minutes ahead is useful.

A 15-minute horizon is short enough to be useful for routing decisions and long enough to be non-trivial (you can't just return the current count — demand shifts by neighbourhood and hour).

What's in the repo

bicing-availability-predictor/
├── data.py              synthetic snapshot generator + feature engineering
├── fetch_live.py        optional: pull real GBFS snapshots from the live API
├── train.py             baseline + LinearRegression + RandomForest + LightGBM, saves winner
├── predict.py           load model, predict bikes_available at t+15 min
├── main.py              CLI menu (I - VII)
├── test_pipeline.py     pytest tests
├── requirements.txt
├── api/
│   ├── app.py           FastAPI server (model loaded once, REST predictions)
│   └── static/
│       └── index.html   web demo (pick a station, see the prediction)
├── data/                generated snapshots live here
├── models/              saved model + feature list + comparison.csv
└── plots/               true-vs-predicted scatter + feature importance chart

Dataset

The default path uses synthetic snapshots generated by data.py. They match the real Bicing GBFS schema (station_id, lat, lon, capacity, timestamp, bikes_available, docks_available), so swapping in real data requires no changes elsewhere.

Three station archetypes are baked in:

residential: full at night, empties in the morning
central: empty at night, fills during the workday
university: bell-shaped fill pattern around midday

Weekends flatten the daily pattern. Gaussian noise is added so the series isn't deterministic.

If you want to train on real data, run python fetch_live.py --loop on your laptop for a few days and the CSV will grow with live observations.

Features

For each row (station, timestamp) the model sees:

hour, minute, dayofweek, is_weekend — time features
capacity, bikes_available, docks_available — current state
lag_1, lag_2, lag_3 — bikes available 15, 30, 45 minutes ago (per station)
rolling_1h — rolling mean of bikes available over the last hour
type_residential, type_central, type_university — one-hot station type

Target: bikes_available at t + 15 minutes (same station).

Models

Model	Why it's here
Persistence	Baseline — "the future will look like right now". Must beat it.
Linear Regression	Sanity check. Fast and interpretable.
Random Forest	Handles interactions and non-linearities without tuning.
LightGBM ✓	Gradient-boosted trees — winner on most tabular time-series problems. Fast.

Latest results on the synthetic 14-day dataset (20 stations, 15-min snapshots):

Model	MAE	RMSE	R²
Persistence	2.44	3.12	0.79
LinearRegression	2.04	2.59	0.86
RandomForest	1.86	2.36	0.88
LightGBM	1.84	2.33	0.89

Metrics reported: MAE, RMSE, R². RMSE is the primary score because big misses (empty / full stations) hurt users the most.

Train / test split

Time-based, never random. The last 20% of the timeline is the test set. Random splitting leaks future info into training on time-series data and produces optimistic-but-wrong numbers.

How to run

pip install -r requirements.txt
python main.py         # interactive menu

Or drive each piece directly:

python data.py          # (nothing — functions only; use main.py option I)
python train.py         # generates data if missing, trains, saves best model
python predict.py       # asks you for a station id, prints a prediction
pytest -q               # runs the tests

Web demo (API)

After training, launch the API server:

python main.py          # pick option VI
# or directly:
uvicorn api.app:app --reload

Then open http://localhost:8000 in your browser. Pick a station from the dropdown, click Predict, and see the current bikes vs the 15-minute prediction with a visual capacity bar.

Endpoints:

GET / — the web demo
GET /stations — list all stations with metadata (type, capacity, coordinates)
POST /predict — send {"station_id": 5}, get back bikes now + predicted in 15 min
GET /health — server health check

What I'd do next

~~Build a Streamlit dashboard showing the live map with predicted availability per station.~~ Replaced with FastAPI + web demo!
Expand the horizon to a multi-step forecast (15 / 30 / 45 / 60 min) and plot error vs horizon.
Use weather features (rain → bike demand crashes). Barcelona publishes free weather data via the AEMET open API.
Deploy fetch_live.py on a cheap VM or a cron job and train on weeks of real Bicing history.
Tune LightGBM with n_estimators=500 + early stopping once real data is available.

Tests

pytest -q

Covers: CSV generation, feature matrix correctness, no train/test time leakage, model artifacts are saved, best model beats the persistence baseline, and predictions stay inside [0, capacity].

Built as part of my AI/ML portfolio. Feedback and issues welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bicing Availability Predictor

Why this problem

What's in the repo

Dataset

Features

Models

Train / test split

How to run

Web demo (API)

What I'd do next

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
api		api
data		data
models		models
plots		plots
.gitignore		.gitignore
README.md		README.md
data.py		data.py
fetch_live.py		fetch_live.py
main.py		main.py
predict.py		predict.py
requirements.txt		requirements.txt
test_pipeline.py		test_pipeline.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Bicing Availability Predictor

Why this problem

What's in the repo

Dataset

Features

Models

Train / test split

How to run

Web demo (API)

What I'd do next

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages