Description

This repository contains our code for a competition organised by Centrale Supelec and Illuin Technology. It is possible to learn more about the tasks by looking at the content of the folder Explication dataset or by looking at the final presentation in presentation.

The competitions contained 2 parts, the first contains 3 tasks of NER, NLI, text classification... and the second is the creation of a search engine capable of finding patients based on filters and a search Query.

Mainly used technologies

Transformers library by HuggingFace
Scibert
Biobert
Electramed
MiniLM-L6
Streamlit
Flask
Annoy

How to use

Evaluation

First, we need to download the submodule for evaluation :

$ git submodule init
$ git submodule update

Build dataset

You can find the data here https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/

First we need to have the initial data as follows :

medical_txt_parser
	├── Explication dataset/
	├── train_data/
		├── beth/
			├── ast/
				...
				└── record-13.ast
			├── concept
				...
				└── record-13.con
			├── rel
				...
				└── record-13.rel
			└── txt
				...
				└── record-13.txt
		└── partners/
			├── ast/
				...
				└── record-10.ast
			├── concept
				...
				└── record-10.con
			├── rel
				...
				└── record-10.rel
			└── txt
				...
				└── record-10.txt
	
	└── src/

Then execute the following command to build the dataset from the root of the project:

$ ./src/data_merger.sh

To prepare the embeddings and clusters for the search API:

$ cd src
$ python -m clustering.prepare_embeddings

To launch the app, start in the root directory of the project by executing :

$ python src/api.py
$ streamlit run app/search_engine.py

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
Explication dataset		Explication dataset
app		app
assets		assets
health_data_challenge @ edbe310		health_data_challenge @ edbe310
presentation		presentation
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Readme.md		Readme.md
dataset.jsonl		dataset.jsonl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Mainly used technologies

How to use

Evaluation

Build dataset

About

Releases

Packages

Contributors 3

Languages

License

Mustapha-AJEGHRIR/medical_txt_parser

Folders and files

Latest commit

History

Repository files navigation

Description

Mainly used technologies

How to use

Evaluation

Build dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages