tree-kernel

Code of a baseline method to automatically detect disease-chemical relationships in biomedical papers.

The method works by computing a word embedding of the training corpus, concatenating the embeddings of disease-chemical pairs (into one vector of ~100 dimensions), to train a SVM with a quadratic kernel. Tree kernels were also tried, but their impact on classification was negative (compared to embeddings or bags of words).

As training and test corpus we use the known CDR corpus (BioCreative). The baseline has an accuracy of 80%. The whole experiment is self-contained. Download and type on the command line (in the package directory):

   python main.py

to run the experiment.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
DevData		DevData
FeatureExtraction		FeatureExtraction
SVM		SVM
Test		Test
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DevData

DevData

FeatureExtraction

FeatureExtraction

SVM

SVM

Test

Test

README.md

README.md

Repository files navigation

tree-kernel

About

Releases

Packages

Languages

camilothorne/tree-kernel

Folders and files

Latest commit

History

Repository files navigation

tree-kernel

About

Resources

Stars

Watchers

Forks

Languages