Senior Data Scientist | Statistical Genetics | Machine Learning
Senior Data Scientist with 7+ years of experience designing, validating, and deploying data-driven models across biomedical and population-scale datasets. Strong background in statistics, applied machine learning, and Bioinformatics with experience spanning research, production pipelines, and stakeholder-facing analytics.
- Design end-to-end analytical pipelines (QC → modeling → validation → reporting)
- Translate complex statistical results into actionable insights
- Ensure reproducibility, scalability, and robustness of analyses
- Collaborate with cross-functional teams (research, engineering, product)
- Mentor junior researchers and data scientists
- Polygenic Risk Scores (PRS) & genetic risk prediction
- GWAS, WES/WGS, large-scale genomic data
- Epidemiology & biostatistics
- Multi-omics integration
- Predictive modeling for complex diseases
Languages
- R, Python, SQL, Bash
Machine Learning & Statistics
- Regularized regression (Lasso, Ridge, Elastic Net)
- Tree-based models (XGBoost, LightGBM)
- Model evaluation & calibration
- Feature engineering for structured data
Genomics & Bioinformatics
- PLINK, Hail, GWAS workflows
- WES / WGS analysis
- Pathway & gene-set enrichment
Data & Infrastructure
- Linux, HPC (SLURM)
- Cloud: AWS, GCP
- Version control: Git
- BI & Visualization: Power BI, Tableau, ggplot2, Matplotlib, Plotly
-
Pathway-specific PRS in Epilepsy
Developed and validated pathway-level PRS models across generalized and focal epilepsy cohorts. -
Scalable GWAS & PRS Pipelines
Built reproducible pipelines handling large-scale genomic datasets with rigorous QC and validation. -
Risk Prediction & Analytics Dashboards
Designed dashboards to communicate complex genetic risk results to non-technical stakeholders.
- Contributions to international consortia (ILAE, Epi25)
- Reviewer for peer-reviewed genetics and epidemiology journals
- Focus on translating genetic insights into interpretable risk models