# Medical Name Entity Recognition
1. I want you to apply Natural Language tools to the unstrutured medical data and provide some clean outputs.
For example:
**Input:** Itching on the whole body might be a symptom of an underlying illness, such as liver disease, kidney disease, anemia, diabetes, thyroid problems and certain cancers. Nerve disorders. Examples include multiple sclerosis, pinched nerves and shingles (herpes zoster). Psychiatric conditions.

**Output:** Itching, illness, liver disease, kidney disease, anemia, diabetes, thyroid, cancer, Nerve disorders, sclerosis, pinched nerves, shingles, herpes zoster, Psychiatric conditions.

2. This is an interesting thing and I don't think much diffcult. You can use NLP techniques like sentence detection, word detection, stop-word removal, etc to clean the data and then you can use the trained Medical Name entity recognition algorithms to find medical concepts.

3. These library can be explored for NLP tasks. Some of the popular NLP libraries for data cleaning include:

NLTK (Natural Language Toolkit): NLTK is a comprehensive library for NLP tasks, including data cleaning, tokenization, stemming, lemmatization, part-of-speech tagging, and more.

spaCy: spaCy is a fast and efficient NLP library that provides functionalities for tokenization, named entity recognition, and part-of-speech tagging, which can be useful for data cleaning.

TextBlob: TextBlob is built on top of NLTK and provides a simplified interface for common NLP tasks, including text cleaning and sentiment analysis.

gensim: While gensim is primarily known for topic modeling, it also offers tools for data preprocessing, such as text cleaning and word tokenization.

BeautifulSoup: BeautifulSoup is not an NLP library per se, but it is a widely used library for parsing and extracting data from HTML and XML documents, which can be helpful when dealing with web text data.

regex (re module): Python's built-in re module provides support for regular expressions, which are powerful tools for pattern matching and text cleaning.

4. For medical name entity recognition try to explore this link --> https://paperswithcode.com/task/medical-named-entity-recognition

In [1]:
import sqlite3

In [2]:
conn = sqlite3.connect("hierarchyMedicalConcepts.db")
cursor = conn.cursor()
# drop the table first and only then create
dropQuery= "DROP TABLE IF EXISTS hierarchyMedicalConcepts"
cursor.execute(dropQuery)
cursor.execute("""
    CREATE TABLE IF NOT EXISTS hierarchyMedicalConcepts(
        classDetailsId NVARCHAR(160),
        prefLabelParent NVARCHAR(160),
        distance1ID NVARCHAR(160) DEFAULT NULL,
        prefLabel1 NVARCHAR(160) DEFAULT NULL,
        distance2ID NVARCHAR(160) DEFAULT NULL,
        prefLabel2 NVARCHAR(160) DEFAULT NULL,
        distance3ID NVARCHAR(160) DEFAULT NULL,
        prefLabel3 NVARCHAR(160) DEFAULT NULL,
        distance4ID NVARCHAR(160) DEFAULT NULL,
        prefLabel4 NVARCHAR(160) DEFAULT NULL,
        distance5ID NVARCHAR(160) DEFAULT NULL,
        prefLabel5 NVARCHAR(160) DEFAULT NULL
    )
""")

<sqlite3.Cursor at 0x7bf47fa205c0>

In [3]:
displayAllQuery="PRAGMA table_info (hierarchyMedicalConcepts)"#"Select * from hierarchyMedicalConcepts"
cursor.execute(displayAllQuery)
results = cursor.fetchall()
print(results)
#for row in results:
 # display += f"classDetailsId: {row[0]}, distance1: {row[1]}, distance2: {row[2]}, distance3: {row[3]}, distance4:{row[4]}, distance5:{row[5]}"

[(0, 'classDetailsId', 'NVARCHAR(160)', 0, None, 0), (1, 'prefLabelParent', 'NVARCHAR(160)', 0, None, 0), (2, 'distance1ID', 'NVARCHAR(160)', 0, 'NULL', 0), (3, 'prefLabel1', 'NVARCHAR(160)', 0, 'NULL', 0), (4, 'distance2ID', 'NVARCHAR(160)', 0, 'NULL', 0), (5, 'prefLabel2', 'NVARCHAR(160)', 0, 'NULL', 0), (6, 'distance3ID', 'NVARCHAR(160)', 0, 'NULL', 0), (7, 'prefLabel3', 'NVARCHAR(160)', 0, 'NULL', 0), (8, 'distance4ID', 'NVARCHAR(160)', 0, 'NULL', 0), (9, 'prefLabel4', 'NVARCHAR(160)', 0, 'NULL', 0), (10, 'distance5ID', 'NVARCHAR(160)', 0, 'NULL', 0), (11, 'prefLabel5', 'NVARCHAR(160)', 0, 'NULL', 0)]


In [None]:
# Code here

# NESTED Dictionary creation for Bioportal Onotologies Generated data about Hierarchies.

**Example Input:**

Class details
	id: http://purl.bioontology.org/ontology/SNMI/G-A425
	prefLabel: Malignant
	ontology: https://data.bioontology.org/ontologies/SNMI
Annotation details
	from: 1
	to: 9
	match type: PREF

	Hierarchy annotations
		Class details
			id: http://purl.bioontology.org/ontology/SNMI/G-A200
			prefLabel: Positive
			ontology: https://data.bioontology.org/ontologies/SNMI
			distance from originally annotated class: 1
		Class details
			id: http://purl.bioontology.org/ontology/SNMI/G-A000
			prefLabel: Severity of illness, NOS
			ontology: https://data.bioontology.org/ontologies/SNMI
			distance from originally annotated class: 2
		Class details
			id: http://purl.bioontology.org/ontology/SNMI/G
			prefLabel: GENERAL LINKAGE/MODIFIERS
			ontology: https://data.bioontology.org/ontologies/SNMI
			distance from originally annotated class: 3

**Example output in Nested Dictionary Form:**
hierarchies = {
    1: {
        'hiearachy_1': 'Positive',
        'hiearachy_2': 'Severity of illness, NOS',
        'hiearachy_3': 'GENERAL LINKAGE/MODIFIERS'
        'match_type': 'PREF'
    },
    2: {
        'hiearachy_1': 'Positive',
        'hiearachy_2': 'Severity of illness',
        'hiearachy_3': 'GENERAL LINKAGE/MODIFIERS'
        'match_type': 'Synon'
    }
}

Things to consider:
1. Data should be in English language, translate or remove any content out of english.
2. Try to remove repetatives.
3. Python dictionary default size may exceed. Try using json format and create file instead.




In [4]:
import urllib.request, urllib.error, urllib.parse
import json
import os
#from typing import None
from pprint import pprint

REST_URL = "http://data.bioontology.org"
API_KEY = "396993d0-4ce2-4123-93de-214e9b9ebcf2"
concepts_list=['sciatic pain, swollen feet, pregnancy']
# BIOPORTAL SHOWS Matches in 21 ontologies

def get_json(url):
    opener = urllib.request.build_opener()
    opener.addheaders = [('Authorization', 'apikey token=' + API_KEY)]
    return json.loads(opener.open(url).read())

def print_annotations(annotations, get_class=True):
    for result in annotations:
        class_details = result["annotatedClass"]
        if get_class:
            try:
                class_details = get_json(result["annotatedClass"]["links"]["self"])
                classDetailsId=class_details["@id"]
                prefLabelParent=str(class_details["prefLabel"])
            except urllib.error.HTTPError:
                print(f"Error retrieving {result['annotatedClass']['@id']}")
                continue
        print("Class details")
        print("\tid: " + class_details["@id"])
        print("\tprefLabel: " + str(class_details["prefLabel"]))
        print("\tontology: " + class_details["links"]["ontology"])

        print("Annotation details")
        for annotation in result["annotations"]:
            print("\tfrom: " + str(annotation["from"]))
            print("\tto: " + str(annotation["to"]))
            print("\tmatch type: " + annotation["matchType"])

        if result["hierarchy"]:
            print("\n\tHierarchy annotations")
            for annotation in result["hierarchy"]:
                try:
                    class_details = get_json(annotation["annotatedClass"]["links"]["self"])
                except urllib.error.HTTPError:
                    print(f"Error retrieving {annotation['annotatedClass']['@id']}")
                    continue
                pref_label = class_details["prefLabel"] or "no label"
                print("\t\tClass details")
                print("\t\t\tid: " + class_details["@id"])
                print("\t\t\tprefLabel: " + str(class_details["prefLabel"]))
                print("\t\t\tontology: " + class_details["links"]["ontology"])
                print("\t\t\tdistance from originally annotated class: " + str(annotation["distance"]))

                if str(annotation["distance"])=="1":
                  prefLabel1=pref_label
                  distance1ID=class_details["@id"]
                  # setting initial value to NULL for rest Hierarchies
                  prefLabel2 = None
                  distance2ID= None
                  prefLabel3=None
                  distance3ID=None
                  prefLabel4=None
                  distance4ID=None
                  prefLabel5=None
                  distance5ID=None
                  pass
                elif str(annotation["distance"])=="2":
                  prefLabel2=pref_label
                  distance2ID=class_details["@id"]
                  pass
                elif str(annotation["distance"])=="3":
                  prefLabel3=pref_label
                  distance3ID=class_details["@id"]
                  pass
                elif str(annotation["distance"])=="4":
                  prefLabel4=pref_label
                  distance4ID=class_details["@id"]
                  pass
                elif str(annotation["distance"])=="5":
                  prefLabel5=pref_label
                  distance5ID=class_details["@id"]
                  pass

            insertQuery="INSERT INTO hierarchyMedicalConcepts (classDetailsId,prefLabelParent,prefLabel1,distance1ID,prefLabel2,distance2ID,prefLabel3,distance3ID,prefLabel4,distance4ID,prefLabel5,distance5ID) VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,?) "
            conn.execute(insertQuery,(classDetailsId,prefLabelParent,prefLabel1,distance1ID,prefLabel2,distance2ID,prefLabel3,distance3ID,prefLabel4,distance4ID,prefLabel5,distance5ID))

        print("\n\n")

for i in concepts_list:
 text_to_annotate = i

# Annotate using the provided text
 annotations = get_json(REST_URL + "/annotator?text=" + urllib.parse.quote(text_to_annotate))

# Print out annotation details
 print_annotations(annotations)

# Annotate with hierarchy information
 annotations = get_json(REST_URL + "/annotator?max_level=5&text=" + urllib.parse.quote(text_to_annotate))
 print_annotations(annotations)

# Annotate with prefLabel, synonym, definition returned
 annotations = get_json(REST_URL + "/annotator?include=prefLabel,synonym,definition&text=" + urllib.parse.quote(text_to_annotate))
 print_annotations(annotations, False)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m


Class details
	id: http://purl.obolibrary.org/obo/PATO_0002210
	prefLabel: bulbous
	ontology: https://data.bioontology.org/ontologies/FLOPO
Annotation details
	from: 15
	to: 21
	match type: SYN

	Hierarchy annotations
		Class details
			id: http://purl.obolibrary.org/obo/PATO_0001865
			prefLabel: spheroid
			ontology: https://data.bioontology.org/ontologies/FLOPO
			distance from originally annotated class: 1
		Class details
			id: http://purl.obolibrary.org/obo/PATO_0002007
			prefLabel: convex 3-D shape
			ontology: https://data.bioontology.org/ontologies/FLOPO
			distance from originally annotated class: 2
		Class details
			id: http://purl.obolibrary.org/obo/PATO_0002266
			prefLabel: 3-D shape
			ontology: https://data.bioontology.org/ontologies/FLOPO
			distance from originally annotated class: 3
		Class details
			id: http://purl.obolibrary.org/obo/PATO_0000052
			prefLabel: shape
			ontology: https://data.bioon

In [6]:
cursor.execute("SELECT * FROM hierarchyMedicalConcepts")

<sqlite3.Cursor at 0x7bf47fa205c0>

In [7]:
rows = cursor.fetchall()

# Print the results
for row in rows:
    print(row)

('http://www.limics.fr/ontologies/ontoparonmed#DouleurSciatique', 'sciatic pain', 'http://www.limics.fr/ontologies/ontoparonmed#DouleurNevralgique', 'Nerve pain', 'http://www.limics.fr/ontologies/ontoparonmed#DouleurNeurologique', 'neurological pain', 'http://www.limics.fr/ontologies/ontoparonmed#DouleurSpecifiee', 'specified pain', 'http://www.limics.fr/ontologies/ontoparonmed#Douleur', 'Pain function', 'http://www.limics.fr/ontologies/ontoparonmed#SigneClinique', 'clinical sign')
('http://www.tcmkg.com/ISPO/ISPO_00002087', 'Sciatica, bilateral', 'http://www.tcmkg.com/ISPO/ISPO_00001076', 'Polyarthralgias', 'http://www.tcmkg.com/ISPO/ISPO_00001816', 'Pain', 'http://www.tcmkg.com/ISPO/ISPO_00002649', 'Musculoskeletal system symptoms', 'http://www.tcmkg.com/ISPO/ISPO_99999999', 'symptoms', None, None)
('http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C53794', 'Adverse Event Associated with Pain', 'http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C53781', 'Adverse Event by CTCAE Cat

In [8]:
print(cursor.execute("SELECT count(*) FROM hierarchyMedicalConcepts").fetchall())

[(213,)]
