Text-Mining-Heathcare

Aim

To build a classification model based on the Text in the Summary and Description of the call to classify the ticket to appropriate Category (out of 5 Categories) and Subcategories (Out of 20 Sub Categories).

Procedures

Used data.table, dplyr, plyr libraries to prepare the data for analysis and qdap and tm libraries to clean the text column by forming corpus followed by applying dtm and further sparsity.
Used wordcloud and ggplot2 libraries to create visualizations between text column and categories & subcategories column.
Used the concept of removing correlated terms with common terms as part of feature engineering.
Designed a H2o based random forest, gradient boost, deeplearning and xgboost and finally stacked them all to form the final model which gave an accuracy of 99% on classifying categories and 65% accuracy on classifying sub categories.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
text classification-healthcare		text classification-healthcare
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Mining-Heathcare

Aim

Procedures

About

Releases

Packages

Languages

Architectshwet/Text-Mining-Heathcare

Folders and files

Latest commit

History

Repository files navigation

Text-Mining-Heathcare

Aim

Procedures

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages