Skip to content

ActivitySpaceLab/gwm-data-processor

Repository files navigation

Gauteng Wellbeing Mapper – Data Pipeline

This repository packages everything required to download encrypted Qualtrics survey responses, decrypt them with the project private key, and export ready-to-analyse CSV tables (plus images) for the Gauteng Wellbeing Mapper study.

🚀 Quick start

  1. Navigate to pipeline_toolkit/ and copy .env.example to .env.
  2. Fill in the required secrets in .env:
    • QUALTRICS_API_TOKEN
  • QUALTRICS_BASE_URL
  • QUALTRICS_SURVEY_ID
  • Optional overrides for PRIVATE_KEY_PASSWORD / PRIVATE_KEY_PATH
  1. Place the RSA private key as pipeline_toolkit/secrets/private_key.pem (or point PRIVATE_KEY_PATH to your key).
  2. From the repository root, run the bundled launcher:
./run_pipeline.sh

The launcher creates a virtual environment, installs dependencies, loads your .env, and calls pipeline_runner.py. Structured CSVs land in data/structured/ and participant HTML reports (plus interactive maps) live in data/reports/. A JSON summary of the latest execution lives in pipeline_toolkit/last_run_summary.json.

📁 Directory overview

  • pipeline_toolkit/
    • pipeline_runner.py – orchestrates download → decrypt → structure → report.
    • requirements.txt – dependency list used by the launcher.
    • .env.example – template for required secrets.
    • secrets/ – drop private_key.pem here (ignored by git).
  • run_pipeline.sh – one-touch launcher at the repo root (loads .env, manages venv/deps, executes the runner).
  • qualtrics_tools/ – Qualtrics API download utilities (download_qualtrics_data.py).
  • decryption_tools/ – Automated RSA/AES decryption pipeline.
  • structure_tools/ – Data structuring (generate_survey_csvs.py), reporting (generate_participant_reports.py), and analytics helpers.
  • data/
    • raw/ – Fresh Qualtrics exports.
    • decrypted/ – Intermediate decrypted CSVs.
    • structured/ – Final CSVs (biweekly_survey.csv, initial_survey.csv, consent.csv, location_data.csv), extracted images, and processing_report.json.
    • reports/ – Timestamped participant HTML reports plus embedded folium map files.
  • validate_survey_data.py – Optional audit of the structured outputs.

🧭 Manual control (run steps individually)

# 1. Download from Qualtrics
python qualtrics_tools/download_qualtrics_data.py --output ./data/raw --all

# 2. Decrypt the encrypted payloads
PRIVATE_KEY_PASSWORD='•••' python decryption_tools/automated_decryption_pipeline.py \
  --input ./data/raw --output ./data/decrypted \
  --private-key pipeline_toolkit/secrets/private_key.pem

# 3. Build the analysis tables and extract images
python structure_tools/generate_survey_csvs.py \
  --input ./data/decrypted --output ./data/structured \
  --download-images --report --validate

# 4. Generate participant HTML reports (after structuring completes)
python structure_tools/generate_participant_reports.py \
  --input ./data/structured --output ./data/reports

Audit the final CSVs if needed:

python validate_survey_data.py --input ./data/structured --detailed-report

✅ Expected outputs

After a successful run you will have:

  • data/structured/biweekly_survey.csv
  • data/structured/initial_survey.csv
  • data/structured/consent.csv
  • data/structured/location_data.csv
  • data/structured/processing_report.json – summary generated by generate_survey_csvs.py
  • data/reports/index.html – directory listing of per-participant reports
  • data/reports/<timestamp>/<participant>.html – individual reports with embedded maps
  • pipeline_toolkit/last_run_summary.json – metadata for the latest execution

🔐 Secrets handling

  • .env (excluded from git repository) stores the Qualtrics credentials and configuration (QUALTRICS_API_TOKEN, QUALTRICS_BASE_URL, QUALTRICS_SURVEY_ID, and optionally PRIVATE_KEY_PASSWORD).
  • pipeline_toolkit/secrets/private_key.pem (excluded from git repository) houses the RSA private key.

🛠 Requirements

  • Python 3.10+ (tested with 3.12)
  • Internet connectivity for Qualtrics downloads
  • PIP packages installed automatically via run_pipeline.sh (pandas, requests, cryptography, folium, markdown)

🆘 Troubleshooting

Symptom Checks
Download fails Ensure .env has QUALTRICS_API_TOKEN. Verify Qualtrics permissions.
Decryption fails Confirm PRIVATE_KEY_PASSWORD and that private_key.pem matches incoming data.
No CSV output Confirm data/decrypted/ contains *_decrypted_responses.csv; rerun structuring step.

For deeper context explore the READMEs inside pipeline_toolkit/, qualtrics_tools/, decryption_tools/, and structure_tools/.

📄 License

This project is distributed under the terms of the GNU General Public License v3.0. © 2025 John R.B. Palmer.

About

Data processing tool for Gauteng Wellbeing Mapper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published