Skip to content

Some tools build for collecting related medical information

Notifications You must be signed in to change notification settings

FalsitaFine/info_collect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

info_collect

Some tools written for collecting useful information.

We are working on some medical-related project, where we need to find a large amount of trainning data. Medical records are usually quite private and not accessible for us, so we collect not-so-private data from social medias instead.

This program can generate a list of medical terms, and collect related text files related to each term.

For example, check the folders in this repo, which contains collected text related to "breast cancer" (this list is generated by search "breast cancer" in detail files). The dictionary file shows which terms are related to "breast cancer", and the twi_collection files are collected text from twitter.

mediterm.py: Generate a local medical dictionary by running a scraper.

related.py: For a particular topic, find all related terms, generate a wordlist.

twi_collection.py: Collecting information from twitter.

classify

A classifier demo is in the classify folder, which can be run with python3.

The idea is to use a neural network to classify a text sentence(in this demo, try to classify whether it is about "state fair" or "breast cancer"). With this kind of classification, we can tag the text files and make better decisions (pick best language model to use, etc) in the further designs.

About

Some tools build for collecting related medical information

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages