A desktop application for Parkinson's disease real-world data analysis.
Built for dissertation-grade research — accepts any dataset, produces meaningful insights, comparisons, and optional predictions — all inside the app. No SPSS required.
| Tab | Purpose |
|---|---|
| ① Load Data | Load one or more CSVs, preview, auto-normalise column names |
| ② Configure | Map columns to roles (patient ID, time, group, outcome), build derived-variable rules, set QC thresholds |
| ③ QC & Clean | Transparent quality control — every flagged row shows why it was flagged |
| ④ Insights | Descriptive stats, group comparisons (effect sizes), time trends, progression analysis, narrative report |
| ⑤ Predict | Exploratory classification or regression with feature importance |
| ⑥ Export | Save cleaned CSV, markdown report, plots, and model results |
Make sure you have Python 3.11 installed:
python --version
Should print Python 3.11.x. If not, download from https://python.org.
Place the pd_insight_studio folder somewhere on your machine, e.g.:
C:\Users\YourName\Documents\pd_insight_studio
File > Open Folder > select pd_insight_studio
Open the VS Code Terminal (Ctrl+`):
python -m venv .venv
.venv\Scripts\activateYou should see (.venv) in the terminal prompt.
pip install -r requirements.txtThis installs pandas, matplotlib, seaborn, scikit-learn, scipy, and other required packages.
Note:
tkinteris included with standard Python on Windows — no separate install needed.
python main.pyThe PD Insight Studio window will open.
-
Load Data → Add your mPower tapping CSV(s). The app auto-normalises column names.
-
Configure →
- Set
Patient/Subject ID→ e.g.healthcodeorrecordid - Set
Time/Visit column→ e.g.createdonortimestamp - Set
Primary group variable→ e.g.medtimingstring(medication timing) - Set
Outcome→ e.g.tap_countor derived tap rate
- Set
-
Rule Builder →
- Add a Keyword Map Rule: Source =
medtimingstring, map keywords:Output column:before,immediately before → BEFORE after,just after,a while after → AFTER none,na,no → NO_MEDSmed_state - Add a Formula Rule:
tap_count / duration_seconds→tap_rate
- Add a Keyword Map Rule: Source =
-
QC → Run QC to see how many rows pass/fail and why.
-
Insights → Click "Generate All Insights" to see:
- Distribution of tap rate per med group
- Group comparison: BEFORE vs AFTER vs NO_MEDS effect sizes
- Progression over time if visit timestamps are mapped
-
Predict → Optionally run a classification (PD vs Control proxy) or regression (tap rate).
-
Export → Save cleaned data, plots, and reports to your chosen folder.
entity_id→patnotime_col→infodtorvisdategroup_col→cohortorsexoutcome_col→updrs_totscoreor similar
Add a Months Since Baseline rule to convert dates to numeric visit month.
pd_insight_studio/
├── main.py # GUI entrypoint
├── standardise.py # Column normalisation, type inference, time parsing
├── rules.py # Rule builder — keyword map, formula, months-since-baseline
├── qc.py # Quality control — transparent flagging
├── insights.py # Descriptive stats, group comparisons, progression, narrative
├── plotting.py # All matplotlib/seaborn figures
├── modeling.py # Logistic/Ridge regression + Random Forest with metrics
├── requirements.txt
└── README.md
pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
scipy>=1.10.0
scikit-learn>=1.3.0
pillow>=10.0.0
openpyxl>=3.1.0
- Not hard-coded to mPower: any CSV dataset works — you configure the column roles yourself.
- No silent deletions: QC flags rows with reasons; you decide what to keep.
- Predictions are exploratory: All modelling output is clearly labelled as having no clinical validity.
- Large files: The app uses threading for QC and modelling to avoid freezing the UI.
| Problem | Fix |
|---|---|
ModuleNotFoundError |
Make sure .venv is activated and you ran pip install -r requirements.txt |
| Window doesn't open | Check Python version is 3.11; try python -m tkinter to verify Tkinter works |
| Plots don't show | Ensure matplotlib is installed; restart app |
| Large file is slow to load | This is normal for >100k rows. Wait for progress or split the file |