JPonsa

About me

Intellectually curious Health Data Scientist passionate about the data revolution in the healthcare sector. I am particularly interested in creating innovative AI/ML solutions that deliver high value in the pharmaceutical/healthcare sector. After 6+ years of consultancy experience delivering complex data products and assisting large client organisations, I wish to expand my Data Science skills and knowledge. My goal is to pivot my career to continue as a Data Scientist in the healthcare industry.

To know more about me, please feel free to contact me or visit my LinkedIn

Project summary

Main Portfolio Project

As part of my Health Data Science MSc dissertation at UCL, I have built a Knowledge Graph Retrieve Augmented Generation (KG-RAG) system that leverages Large Language Models to efficiently interrogate and analyse a large collection of clinical trial protocols from ClinicalTrials.gov.

Key learnings:

Deploy open-source Large Language Models (LLMs), such as Llama3 or Mixtral8x7b, in High-Performance Computing (HPC) using vLLM.
Process semi-structured Clinical Trial Protocols using Non-SQL/MongoDB.
Creation and hosting of a Knowlege Graph using BioCypher and Neo4j AuraDB.
Implementation of a ReAct design using DSPy, creating custom tools that can be used by an LLM to query Knowledge Graphs and SQL dbs.
Use high-level frameworks such as Llama-index and LangChain for txt-2-SQL and txt-2-Cypher.
How to evaluate Large Language Models.

Do you want to know more about this project?

Full Portfolio

Please, see below a summary of a few projects showcasing my Data Science skills.

Skill \ Technology	UCI Heart Disease	Card Fraud	Disaster Tweets	Causal Impact
Business question	Diagnose which patients are suffering heart diseases	Detect likely fraudulent transactions	Identify disaster events mentioned in text/tweets	Quantify the effect of COVID lockdown in stock price
Language	Python	Python	Python	Python / R
ML type	Classifier	Classifier	NLP Classifier	Time Series Regression
Data Engineering		pySpark
Feature Engineering		Time Series Features	Word Embedding
Over / Under sampling		SMOTE
Traditional ML	Sklearn	Sklearn		Causal Impact
Gradient Boosting	XGBoost	CatBoost
Deep Learning			LSTM, GRU, DistilBert
Hyper fine tunning		Optuna
Explainable ML	SHAP Values
User Interface	Streamlit
ML Ops		MLFlow	MLFlow

Volunteering

I participated with the NHS Pycom in the development of nhspy-plothedots, a package for Statistical Process Control analysis and plotting. My mean contribution was creating unit test scripts. This gave me an opportunity to (a) know more about the package so I can contribute in other areas in the future and (b) practice software development skills (e.g. unit testing, raise pull request) that I have used in my professional career but they may not show up in my Data Science portfolio.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JPonsa

Achievements

Achievements

Highlights

Block or report JPonsa

About me

Project summary

Main Portfolio Project

Full Portfolio

Volunteering

Pinned Loading