Skip to content

JCharline/ei_cs_2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eds-tutorial

About

In this tutorial we introduce some issues related to the analysis of real world data that are made available for research in clinical data warehouses. It is targeted towards data scientists that master the basics of Python programming and data analysis. The tutorial is decomposed in a series of small exercises and a final project. Whereas small exercises illustrate specific issues, the final project mimics an end-to-end research study that may be reported in a scientific article.

Data is fake, and this project can consequently be freely shared without impacting patients’ privacy. A fake data generator is made available and can be tuned to illustrate various use cases. Its development has been freely inspired by the characteristics and issues observed while analyzing data of the Greater Paris University Hospitals.

The 2023 session for CentraleSupélec worked on the 0.0.1 version.

Getting started

Environment and kernel creation

Python, JupyterLab and an environment manager are recommended. You may choose for instance Anaconda.

First clone the project locally : git clone {URL}

If you use Conda as an environment manager, create a new Python environment with the required packages:

  1. conda create -n eds-tuto python=3.7
  2. conda activate eds-tuto
  3. pip install -r requirements.txt

Create and name a Jupyter kernel related to this virtual environment: 4. pip install --user ipykernel 5. python -m ipykernel install --user --name eds_tutorial A kernel named eds_tutorial is now available in your jupyterlab!

NB: For VS Code users, in order to see clearly the plots, it is recommended to enable the Theme Matplotlib Plots in your setting > Extensions > Jupyter.

Scientific libraries installation

The following scientific libraries developed in the context of Paris’ clinical data warehouse may moreover be leveraged to facilitate the resolution of some exercises:

  • eds-scikit: a set of tools to assist data scientists working on a clinical data warehouse (structured data).
  • edsnlp: a set of spaCy components that are used to extract information from clinical notes written in French (unstructured data).
  • edsteva: a set of tools to measure indicators describing data quality and its temporal variation (quality indicators).

To install these libraries:

  1. conda activate eds-tuto
  2. pip install edsnlp

Acknowledgement

We would like to thank Assistance Publique – Hôpitaux de Paris and AP-HP Foundation for funding this project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published