The goal of this project is to develop or find high-performing NLP model that can correctly classify the medical specialties based on the transcription text with significant accuracy.
This project was inspired by the existing Kaggle dataset Medical Transcriptions and mtsamples.com that has a collection of transcribed medical reports (Boyle, 2019).
This project might be useful in understanding the nature of the language used in medical transcriptions of various kinds, and this tool might be useful for other kinds of NLP data analysis involving medical transcriptions.
The libraries and packages that are needed to run the code are sklearn, numpy, pandas, random, re, xgboost, keras, and collections.
Boyle, T. (2019). Medical Transcriptions: Medical transcription data scraped from mtsamples.com. Retrieved February 22, 2021, from https://www.kaggle.com/tboyle10/medicaltranscriptions
MTSamples.com. (n.d.). Welcome to MTSamples. Retrieved February 22, 2021, from https://mtsamples.com/
RSREETech. (2020, July 16). Clinical Text Classification on Medical Transcription Kaggle Dataset #NLproc [Video file]. Retrieved April 14, 2021, from https://www.youtube.com/watch?v=EvHncKQ96jo&ab_channel=RSREETech