GitHub - KimChengSHEANG/MedCWI: Complex Word Identification (CWI) for French medical documents

Install dependencies

Test on Python 3.7.5 with pyenv

pip install -r requirements.txt

CNN model

To train the CNN model, simply running the follow script.

python scripts/train_cnn.py

If you run this for the first time, it will around 30mn (not include training) for downloading resources and preprocessing.

To customize the features, edit feature array in each training script, e.g.,

# features containing only CamemBert embedding
features_args =['CamemBertEmbeddingFeature'] 

# features containing only FastText Embedding and Word Length
features_args =['FastTextEmbeddingFeature', 'WordLengthFeature']

train_and_evaluate_n_times(features_args, n=1) # n=1 means train the model one time, n=5 train 5 times.

All the model checkpoints and report will be saved to the folder models/FR/*

Features

FastTextEmbeddingFeature
CamemBertEmbeddingFeature
WordLengthFeature
WordSyllableFeature
VowelCountFeature
TFIDFFeature
WordRankFeature
LangGenFrequencyFeature
ClearFrequencyFeature

CatBoost Model

python scripts/train_catboost.py

CatBoost Model

python scripts/train_catboost.py

Results

The report of each training and evaluation is stored in the folder /models/CNN|CatBoost|XGBoost/*/reports.txt

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
notebooks		notebooks
resources		resources
scripts		scripts
src		src
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notebooks

notebooks

resources

resources

scripts

scripts

src

src

.gitignore

.gitignore

readme.md

readme.md

requirements.txt

requirements.txt

Repository files navigation

Install dependencies

CNN model

CatBoost Model

CatBoost Model

Results

About

Releases

Packages

Languages

KimChengSHEANG/MedCWI

Folders and files

Latest commit

History

Repository files navigation

Install dependencies

CNN model

CatBoost Model

CatBoost Model

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages