# Exercise 5

## Parsing MeSH Data (`desc2023.xml`)

We aim to extract the `DescriptorName` associated with the `DescriptorUI` `D007154` from the MeSH XML data.

### Steps:
1. Download the `desc2023.xml` file from the provided URL.
2. Read the XML file using the `xml.etree.ElementTree` module in Python.
3. Traverse the XML tree to identify the `DescriptorUI` with value `D007154`.
4. Extract and display the associated `DescriptorName`.

### Python Code Implementation:

In [36]:
import urllib.request
import xml.etree.ElementTree as ET

# Download the XML file
url = "https://nlmpubs.nlm.nih.gov/projects/mesh/MESH_FILES/xmlmesh/desc2023.xml"
urllib.request.urlretrieve(url, "desc2023.xml")

('desc2023.xml', <http.client.HTTPMessage at 0x7f9f09aaf8b0>)

In [37]:
def get_descriptor_name(file_name, descriptor_ui):
    tree = ET.parse(file_name)
    root = tree.getroot()

    # Traverse the XML tree to find the desired DescriptorUI
    for descriptor in root.findall('DescriptorRecord'):
        ui = descriptor.find('DescriptorUI')
        if ui is not None and ui.text == descriptor_ui:
            descriptor_name = descriptor.find('DescriptorName').find('String')
            return descriptor_name.text
    return None

In [38]:
descriptor_ui_to_search = "D007154"
name = get_descriptor_name("desc2023.xml", descriptor_ui_to_search)

In [39]:
if name:
    print(f"The DescriptorName associated with DescriptorUI {descriptor_ui_to_search} is: {name}")
else:
    print(f"No DescriptorName found for DescriptorUI {descriptor_ui_to_search}")

The DescriptorName associated with DescriptorUI D007154 is: Immune System Diseases


## Finding DescriptorUI for a Given DescriptorName
We aim to extract the `DescriptorUI` (MeSH Unique ID) associated with the `DescriptorName` "Nervous System Diseases" from the MeSH XML data.

In [40]:
def get_descriptor_ui(file_name, descriptor_name_target):
    tree = ET.parse(file_name)
    root = tree.getroot()

    # Traverse the XML tree to find the desired DescriptorName
    for descriptor in root.findall('DescriptorRecord'):
        descriptor_name = descriptor.find('DescriptorName').find('String')
        if descriptor_name is not None and descriptor_name.text == descriptor_name_target:
            ui = descriptor.find('DescriptorUI')
            return ui.text
    return None

In [41]:
descriptor_name_to_search = "Nervous System Diseases"
ui = get_descriptor_ui("desc2023.xml", descriptor_name_to_search)

In [42]:
if ui:
    print(f"The DescriptorUI associated with DescriptorName \"{descriptor_name_to_search}\" is: {ui}")
else:
    print(f"No DescriptorUI found for DescriptorName \"{descriptor_name_to_search}\"")

The DescriptorUI associated with DescriptorName "Nervous System Diseases" is: D009422


## Extracting DescriptorNames of Common Descendants in MeSH Data

Our goal is to find `DescriptorNames` in the MeSH hierarchy that are descendants of both "Nervous System Diseases" and `D007154`. The relationship between terms is determined by their `TreeNumber`, with descendants having extended `TreeNumber` values.

### Steps:
1. Extract the `TreeNumber` for both "Nervous System Diseases" and `D007154`.
2. Traverse the XML to find descendants (based on `TreeNumber`) for both terms.
3. Determine common descendants by intersecting both lists.


In [43]:
def get_tree_numbers_for_descriptor(file_name, descriptor_ui=None, descriptor_name=None):
    tree = ET.parse(file_name)
    root = tree.getroot()
    tree_numbers = set()

    for descriptor in root.findall('DescriptorRecord'):
        ui = descriptor.find('DescriptorUI').text
        name = descriptor.find('DescriptorName').find('String').text
        
        if (descriptor_ui and descriptor_ui == ui) or (descriptor_name and descriptor_name == name):
            for tree_number_element in descriptor.findall('TreeNumberList/TreeNumber'):
                tree_numbers.add(tree_number_element.text)

    return tree_numbers

In [44]:
def get_descendant_names(file_name, tree_numbers):
    tree = ET.parse(file_name)
    root = tree.getroot()
    names = set()

    for descriptor in root.findall('DescriptorRecord'):
        for tree_number_element in descriptor.findall('TreeNumberList/TreeNumber'):
            for target_tree_number in tree_numbers:
                if tree_number_element.text.startswith(target_tree_number):
                    names.add(descriptor.find('DescriptorName').find('String').text)
                    
    return names

In [49]:
tree_numbers_nervous = get_tree_numbers_for_descriptor("desc2023.xml", descriptor_name="Nervous System Diseases")
tree_numbers_d007154 = get_tree_numbers_for_descriptor("desc2023.xml", descriptor_ui="D007154")
print(tree_numbers_nervous)
print(tree_numbers_d007154)

{'C10'}
{'C20'}


The MeSH tree number of "Nervous System Diseases" and D007154 are “C10” and “C20” respectively.

In [46]:
descendant_names_nervous = get_descendant_names("desc2023.xml", tree_numbers_nervous)
descendant_names_d007154 = get_descendant_names("desc2023.xml", tree_numbers_d007154)

In [47]:
# Find intersection of the two sets to get common descendants
common_descendants = descendant_names_nervous.intersection(descendant_names_d007154)

print(common_descendants)

{'Multiple Sclerosis', 'Autoimmune Diseases of the Nervous System', 'Multiple Sclerosis, Relapsing-Remitting', 'Anti-N-Methyl-D-Aspartate Receptor Encephalitis', 'AIDS Dementia Complex', 'Giant Cell Arteritis', 'Encephalomyelitis, Acute Disseminated', 'Myelitis, Transverse', 'Multiple Sclerosis, Chronic Progressive', 'Encephalomyelitis, Autoimmune, Experimental', 'Demyelinating Autoimmune Diseases, CNS', 'AIDS Arteritis, Central Nervous System', 'Myasthenia Gravis, Neonatal', 'Myasthenia Gravis', 'Myasthenia Gravis, Autoimmune, Experimental', 'Nervous System Autoimmune Disease, Experimental', 'Lambert-Eaton Myasthenic Syndrome', 'POEMS Syndrome', 'Uveomeningoencephalitic Syndrome', 'Leukoencephalitis, Acute Hemorrhagic', 'Kernicterus', 'Polyradiculoneuropathy', 'Ataxia Telangiectasia', 'Guillain-Barre Syndrome', 'Vasculitis, Central Nervous System', 'Diffuse Cerebral Sclerosis of Schilder', 'Microscopic Polyangiitis', 'Autoimmune Hypophysitis', 'Mevalonate Kinase Deficiency', 'Stiff-Pe

## Retrieved Results Overview

The results obtained point towards conditions that span the intricacies of both the **immune** and **nervous systems**. These conditions can be categorized as follows:

- **Autoimmune Disorders Affecting the Nervous System**: Conditions where the body's immune response mistakenly targets and damages its own nervous system. Examples include:
  - Multiple Sclerosis
  - Miller Fisher Syndrome

- **Nervous System Inflammatory Conditions**: Diseases characterized by inflammation predominantly within the nervous system. An example is:
  - Transverse Myelitis

- **Immune Hemolytic Diseases Leading to Neurological Impairment**: Diseases where an immune response against certain blood components causes neurological issues. An example is:
  - Kernicterus

- **Hereditary Disorders Impacting Both Systems**: These are genetically inherited conditions that manifest symptoms in both the immune and nervous systems. For instance:
  - Ataxia Telangiectasia

- **Neurological Manifestations from Severe Infections**: Some infections, when severe, can have pronounced neurological symptoms. A notable example is:
  - AIDS Dementia Complex: This results from an advanced HIV infection primarily affecting the brain.