Classifying Diseases by Clustering Phenotypes using Automated Text Analysis
A versatile tool to classify diseases using cluster analysis of published phenotypic data
What does PubClass do?
PubClass is a pipeline that will take your PubMed query and output clusters of phenotypes and/or genes based on their co-occurrence in the published literature.
Why should you use PubClass?
The ability to subclassify a disease can directly affect diagnosis and treatment options for a patient. PubClass allows you to determine if a disease of interest can be further classified based on phenotypic data from the entire corpus of published data.
Autism is a brain-based disorder that affects how information is processed. Its cause is not known. Genetic factors are thought to be responsible for 70-90% of cases. There are also related environmental factors.
Autism is a spectrum disorder, which means that patients with it can have a wide range of conditions.
Right now, there is a split in the field. There are some researchers who believe that there is a single common cause among all of the types of autism. Another camp hypothesizes that there could be many completely different types of problems that can result in a patient getting diagnosed with autism.
Cluster analysis has been used by a number of researchers to try to determine if there are distinct subtypes of autism, or just one with varying conditions.
We are taking the approach of doing a literature-wise analysis by processing all abstracts from autism-related publications. We will try cluster analysis using diseases, genes, or topics mentioned in the abstracts. Scientific analysis of the resulting clusters from all related literature may help shed light on the etiology of autism.
- INPUT: Collect user-supplied query e.g. "autism" and additional clustering parameters.
2a. Extract abstracts and keywords from the result of NCBI's PubMed query.
2b. Load dictionary to tokenize abstracts for clustering and cluster-metadata analysis.
Perform text-processing (maybe just stemming and tokenization) to extract segments/tokens.
OUTPUT: Output disease-specific clusters based on co-occurrence of topics/tokens/segments.
Analysis of output clusters.
Autism Clustering Results
Papers that have used cluster analysis
Hrdlicka M, Dudova I, Beranova I, Lisy J, Belsan T, Neuwirth J, Komarek V, Faladova L, Havlovicova M, Sedlacek Z, et al. Subtypes of autism by cluster analysis based on structural MRI data. Eur Child Adolesc Psychiatry. 2005;14(3):138–44.
Bitsika V, Sharpley CF, Orapeleng S. An exploratory analysis of the use of cognitive, adaptive and behavioural indices for cluster analysis of ASD subgroups. J Intellect Disabil Res. 2008;52(11):973–85.
Cuccaro ML, Tuchman RF, Hamilton KL, Wright HH, Abramson RK, Haines JL, Gilbert JR, Pericak-Vance M. Exploring the relationship between autism spectrum disorder and epilepsy using latent class cluster analysis. J Autism Dev Disord. 2012;42(8):1630–41.
Kitazoe N, Fujita N, Izumoto Y, Terada SI, Hatakenaka Y. Whether the autism Spectrum quotient consists of two different subgroups? Cluster analysis of the autism Spectrum quotient in general population. Autism. 2017;21(3):323–32.
Tanaka S, Oi M, Fujino H, Kikuchi M, Yoshimura Y, Miura Y, Tsujii M, Ohoka H. Characteristics of communication among Japanese children with autism spectrum disorder: a cluster analysis using the Children's communication Checklist-2. Clin Linguist Phon. 2017;31(3):234–49.
People / Team
Syed Hussain Ather
Olaitan I. Awe