Simulates real-time sensor data and writes to MySQL (or dry-run mode).
Major improvements have been implemented in generator_v2.py to provide more realistic traffic data:
- Rush Hour Simulation
- Morning: 7-9 AM
- Lunch: 11AM-1PM
- Evening: 4-6PM
- Smart Data Distribution
- 50% during rush hours (6 hours)
- 30% during business hours (6 hours)
- 15% during evening (4 hours)
- 5% during overnight (8 hours)
- Rush Hours
- Speeds reduced to 60-80% of normal
- Vehicle counts increased by 150-200%
- Late Night (11PM-5AM)
- Speeds increased to 110-120%
- Vehicle counts reduced to 30-50%
- Normal Hours
- Standard variations (90-110%)
- Expanded from single sensor to 10,000 unique sensors
- Each sensor generates 2-5 readings in bursts
- Maintains chronological order within batches
- Date-specific data generation (--date option)
- Synthetic timestamp generation
- Configurable time period weights
| Feature | generator.py | generator_v2.py |
|---|---|---|
| Sensors | Single fixed | 10,000 rotating |
| Timing | Real-time only | 24-hour distribution |
| Speed | Random normal | Time-based patterns |
| Vehicle Count | Fixed | Dynamic by time |
| Data Pattern | Random | Time-based realistic |
To change which generator is used:
- In
runner.py:
# Comment/uncomment the desired import:
from generator_v2 import generate_records # Enhanced version
# from generator import generate_records # Original version- Data compatibility:
- Both generators produce identical field structures
- Database schema remains the same
- CSV format is unchanged
- Only the patterns and distributions differ
Setup
-
Create a virtualenv and install dependencies:
python -m venv .venv source .venv/bin/activate pip install -r requirements.txt
-
Copy
.env.exampleto.envand edit credentials. SetDB_WRITE=trueto enable actual DB writes.
Usage
# Generate 100 records with time-based patterns
python runner.py 100 # Uses generator_v2.py by default
# Generate data for specific date
python generator_v2.py 1000 --date 2025-10-27
# Run for duration with progress updates
python runner_runtime.py --duration 120 --progress 30Example output (generator_v2):
// Morning rush hour (8 AM) - higher vehicle count, lower speed
{"pgmid": "PMG02961", "vehicle_count": 8, "pepkspeed": 28.5, "timestamp": "08:15:37.919"}
// Late night (2 AM) - lower vehicle count, higher speed
{"pgmid": "PMG02961", "vehicle_count": 2, "pepkspeed": 45.2, "timestamp": "02:10:22.347"}# Basic generation (no time patterns)
python runner.py 100 --use-original # Uses generator.py
# With DB writes enabled
export DB_WRITE=true
python runner.py 100 --use-originalNotes
- The generator uses a normal distribution for
pepeakspeedaround a configurable mean (default 40) with stddev 5. - If you can't connect to the remote DB yet, the program will log the inserts it would perform.
CSV output
By default the project appends every generated record to data/sensor_readings.csv (controlled by CSV_WRITE and CSV_PATH in .env). This gives you an immediate, local copy of what would be written to the DB. To disable CSV output, set CSV_WRITE=false in .env.
Current behavior (what the programs do right now)
generator_v2.py(recommended) produces realistic traffic data with time-based patterns, multiple sensors, and configurable distribution across 24 hours.generator.py(original) produces basic simulated sensor readings at the rate inRECORDS_PER_SECOND.csv_writer.pyappends each generated record to the CSV atCSV_PATHfor local inspection. A header row is written once when the file is created.db_writer.pywill attempt to connect to the MySQL database only whenDB_WRITE=truein.env. By default it isfalseso the program logs the write action instead of attempting a network connection (useful while your IP is being whitelisted).runner.pycoordinates generation, CSV write, and DB write (CSV first, then DB). It accepts an optional integer argument (number of records) or readsNUM_RECORDSfrom.env.
Quick checklist to run locally
-
Create and activate a virtualenv, then install dependencies:
python -m venv .venv source .venv/bin/activate pip install -r requirements.txt
-
Copy the env template and inspect it (DO NOT commit
.env):cp .env.example .env
-
Run a short dry-run to produce CSV and see logs:
python runner.py 100
-
When your IP is whitelisted and you want to push to the remote DB:
- set
DB_WRITE=truein.env - confirm
DB_HOST,DB_USER,DB_PASSWORD, andDB_NAMEare correct - run: python runner.py 100
- set
Notes about GitHub push
The repository contains only code and .env.example. The .env file and .venv/ are ignored via .gitignore. Below are the steps I attempted to run for you (and you can re-run locally if push requires credentials):
Create a new repo and push (replace URL with your repo):
git init git add . git commit -m "Initial project: generator, CSV writer, DB writer, runner, README" git branch -M main git remote add origin https://github.com/johndutra1/python_to_sql.git git push -u origin main
If the remote already exists, set the URL then push:
git remote set-url origin https://github.com/johndutra1/python_to_sql.git git push -u origin main
If push fails because of authentication, you'll need to: use an HTTPS PAT (personal access token) with git push or configure SSH keys and use an SSH remote URL. I tried to push from this environment; if it failed I included the error below in the run output so you can follow up locally.
That's it — the code is ready to be added to your GitHub repo. Let me know if you want me to also create a simple GitHub Actions workflow that runs a quick lint/test when you push.
CI (GitHub Actions)
This repository includes a small GitHub Actions workflow (.github/workflows/ci.yml) that runs on push and pull requests to main. It:
- Tests on Python 3.11 and 3.12.
- Installs dependencies from
requirements.txt. - Executes a smoke run of the runner with writes disabled (it runs
python runner.py 1withDB_WRITE=falseandCSV_WRITE=false) to ensure imports and runtime start-up are OK.
This workflow is intentionally side-effect free (no DB connections, no CSV writes). It provides quick feedback that the code boots and basic dependencies are resolvable.