GENERAL PURPOSE

The goal of this project is to set up a pipeline to extract relations between proteins in biomedical literature automatically. There is an existing pipeline, consisting of various modules in different programming languages; this project unifies them into one consistent pipeline written in python 3.

USAGE

The pipeline is run using 'python3 main.py'. Make sure you consult config.py to change configuration of the pipeline. Have a look at the example.py to see how the pipeline is run.

STRUCTURE

The pipeline has several stages, each of which is its own module. The pipeline is then run from main.py.

configuration All paths to various input or output directories, as well as pmids, for example, can be specified in the config/config.py
import This module allows downloading of articles from PubMed via Biopython, or to read them from a file. Returns an Article
low level text processing This module uses NLTK for tokenisation and PoS tagging of the text.
entity recognition This module uses a supplied list of named entities and returns their positions in the text

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
converter		converter
entity_recognition		entity_recognition
pmids		pmids
similarity		similarity
text_import		text_import
text_processing		text_processing
.gitignore		.gitignore
README.md		README.md
article.py		article.py
config.py		config.py
example.py		example.py
helpers.py		helpers.py
main.py		main.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GENERAL PURPOSE

USAGE

STRUCTURE

About

Languages

Aequivinius/python-ontogene

Folders and files

Latest commit

History

Repository files navigation

GENERAL PURPOSE

USAGE

STRUCTURE

About

Topics

Resources

Stars

Watchers

Forks

Languages