# Scraping an xml document

In [2]:
%pip install lxml

Note: you may need to restart the kernel to use updated packages.


In [1]:
from lxml import etree

# Load and parse the XML file as bytes
with open("./xml-file/P16File-20.xml", "rb") as file:  # Open file in binary mode
    xml_content = file.read()

# Parse the XML content using lxml's etree
root = etree.fromstring(xml_content)

# Print the root as a pretty-formatted tree
print(etree.tostring(root, pretty_print=True).decode("utf-8"))

<us-patent-application lang="EN" dtd-version="v4.6 2022-02-17" file="US20230225638A1-20230720.XML" status="PRODUCTION" id="us-patent-application" country="US" date-produced="20230704" date-publ="20230720">
<us-bibliographic-data-application lang="EN" country="US">
<publication-reference>
<document-id>
<country>US</country>
<doc-number>20230225638</doc-number>
<kind>A1</kind>
<date>20230720</date>
</document-id>
</publication-reference>
<application-reference appl-type="utility">
<document-id>
<country>US</country>
<doc-number>18047627</doc-number>
<date>20221018</date>
</document-id>
</application-reference>
<us-application-series-code>18</us-application-series-code>
<classifications-ipcr>
<classification-ipcr>
<ipc-version-indicator><date>20060101</date></ipc-version-indicator>
<classification-level>A</classification-level>
<section>A</section>
<class>61</class>
<subclass>B</subclass>
<main-group>5</main-group>
<subgroup>1455</subgroup>
<symbol-position>F</symbol-position>
<classifica

## Selecting nodes 

In [2]:
# choose the <invention-title> element in the xml document 
invention_title = root.xpath('//invention-title/text()')
invention_title # note that it is a list

['Handheld Oximeter with Display of Real-Time, Average Measurements and Status Indicator']

In [3]:
claim_text = root.xpath('//claim-text/text()')
claim_text

['. An oximetry device comprising:\n',
 'a housing;',
 '\n',
 'a processor housed by the housing;',
 '\n',
 'a memory, housed by the housing, coupled to the processor;',
 '\n',
 'a display, housed by the housing and visible from an exterior of the housing, coupled to the processor; and',
 '\n',
 'a sensor head, housed by the housing and visible from an exterior of the housing, comprising at least a first source structure and at least a first detector structure,',
 '\n',
 'wherein the processor controls the at least first source structure and the at least first detector structure of the sensor head to make a number n of optical oximetry measurements of tissue to be measured,',
 '\n',
 'wherein when the n measurement is greater or less than a threshold amount, the processor control storage of a value for the n−1 measurement in the memory and storage of one in the memory,',
 '\n',
 'after the n optical oximetry measurements of the tissue are measured, the processor controls the at least f

Note that this XPath does not include the text of the child elements. Let us modify the XPath to include the texts of the child elements too.

In [4]:
claim_text = root.xpath('//claim-text//text()')
claim_text

['1',
 '. An oximetry device comprising:\n',
 'a housing;',
 '\n',
 'a processor housed by the housing;',
 '\n',
 'a memory, housed by the housing, coupled to the processor;',
 '\n',
 'a display, housed by the housing and visible from an exterior of the housing, coupled to the processor; and',
 '\n',
 'a sensor head, housed by the housing and visible from an exterior of the housing, comprising at least a first source structure and at least a first detector structure,',
 '\n',
 'wherein the processor controls the at least first source structure and the at least first detector structure of the sensor head to make a number n of optical oximetry measurements of tissue to be measured,',
 '\n',
 'wherein when the n measurement is greater or less than a threshold amount, the processor control storage of a value for the n−1 measurement in the memory and storage of one in the memory,',
 '\n',
 'after the n optical oximetry measurements of the tissue are measured, the processor controls the at l

In [5]:
# extract the text and let us join them to form a single string
import re
def claims(claim_text):
    claim_text = ''.join(claim_text).replace('\n', '')

    # Split the string whenever '1.', '2.', ..., '25.' is encountered
    claims_list = re.split(r'\b([1-9]|1[0-9]|2[0-5])\.\s*', claim_text)

    # Split by pattern for numbers followed by dot and space, then remove empty strings
    sections = [s.strip() for s in claims_list if s.strip()]

    # Filter out any sections that are just numbers and join the text content
    contents = [section for section in sections if not section.isdigit()]
    return contents

claims(claim_text=claim_text)


['An oximetry device comprising:a housing;a processor housed by the housing;a memory, housed by the housing, coupled to the processor;a display, housed by the housing and visible from an exterior of the housing, coupled to the processor; anda sensor head, housed by the housing and visible from an exterior of the housing, comprising at least a first source structure and at least a first detector structure,wherein the processor controls the at least first source structure and the at least first detector structure of the sensor head to make a number n of optical oximetry measurements of tissue to be measured,wherein when the n measurement is greater or less than a threshold amount, the processor control storage of a value for the n−1 measurement in the memory and storage of one in the memory,after the n optical oximetry measurements of the tissue are measured, the processor controls the at least first source structure and the at least first detector structure of the sensor head to make a 

## Using Predicates and Attributes

In [6]:
# say I want to see the claim text for id="CLM-00002" or id="CLM-00003"
# note that //claim/[@id] selects all the claim items with an id attribute 

claim_text = root.xpath('//claim[@id="CLM-00002" or @id="CLM-00003"]/claim-text//text()')
claims(claim_text=claim_text)

['The device of claim 1 wherein after the m optical oximetry measurements of the tissue are measured, the processor controls the at least first source structure and the at least first detector structure of the sensor head to make a number p optical oximetry measurements of the tissue to be measured,wherein when a value of the p measurement is greater or less than the threshold amount, the processor controls storage of a value for a second sum of the first sum and a p−1 value for the q−1 measurement in the memory and storage of three in the memory, andafter the p optical oximetry measurements of the tissue are measured, the processor generates a second average value for the second sum divided by the stored value three and controls the display to display the second average value.',
 'The device of claim 2 wherein after the p optical oximetry measurements of the tissue are measured, the processor controls the at least first source structure and the at least first detector structure of the

In [14]:
# say I want all the ids of the claims in a list
claim_ids = root.xpath('//claim/@id')
claim_ids

['CLM-00001',
 'CLM-00002',
 'CLM-00003',
 'CLM-00004',
 'CLM-00005',
 'CLM-00006',
 'CLM-00007',
 'CLM-00008',
 'CLM-00009',
 'CLM-00010',
 'CLM-00011',
 'CLM-00012',
 'CLM-00013',
 'CLM-00014',
 'CLM-00015',
 'CLM-00016',
 'CLM-00017',
 'CLM-00018',
 'CLM-00019',
 'CLM-00020',
 'CLM-00021',
 'CLM-00022',
 'CLM-00023']

In [16]:
# However, there can be sutuations where we don't require a list and we want to extract the id text and the claim text one by one 
claims = root.xpath('//claim')
for claim in claims:
    id = claim.get('id') # get method is used to get that attribute data 
    print(id)
    claim_text = claim.xpath('./claim-text//text()') # Note here that the root has changed - it is now the caim element and hence we start XPath as ./ indicating the current root i.e. claim
    claim_text = ''.join(claim_text).replace('\n', '')

    # Split the string whenever '1.', '2.', ..., '25.' is encountered
    claims_list = re.split(r'\b([1-9]|1[0-9]|2[0-5])\.\s*', claim_text)

    # Split by pattern for numbers followed by dot and space, then remove empty strings
    sections = [s.strip() for s in claims_list if s.strip()]

    # Filter out any sections that are just numbers and join the text content
    contents = [section for section in sections if not section.isdigit()]
    print(contents)

CLM-00001
['An oximetry device comprising:a housing;a processor housed by the housing;a memory, housed by the housing, coupled to the processor;a display, housed by the housing and visible from an exterior of the housing, coupled to the processor; anda sensor head, housed by the housing and visible from an exterior of the housing, comprising at least a first source structure and at least a first detector structure,wherein the processor controls the at least first source structure and the at least first detector structure of the sensor head to make a number n of optical oximetry measurements of tissue to be measured,wherein when the n measurement is greater or less than a threshold amount, the processor control storage of a value for the n−1 measurement in the memory and storage of one in the memory,after the n optical oximetry measurements of the tissue are measured, the processor controls the at least first source structure and the at least first detector structure of the sensor head 

In [None]:
# However, there can be sutuations where we don't require a list and we want to extract the id text and the claim text one by one 
claims = root.xpath('//claim')
for claim in claims:
    id = claim.xpath('./@id') # this also gets the requuired data but note it is a list
    num = claim.get('num') # this is the earlier way using het() method- this doesnot give a list rather a string 
    print(id)
    print(num)
    claim_text = claim.xpath('./claim-text//text()') # Note here that the root has changed - it is now the caim element and hence we start XPath as ./ indicating the current root i.e. claim
    claim_text = ''.join(claim_text).replace('\n', '')

    # Split the string whenever '1.', '2.', ..., '25.' is encountered
    claims_list = re.split(r'\b([1-9]|1[0-9]|2[0-5])\.\s*', claim_text)

    # Split by pattern for numbers followed by dot and space, then remove empty strings
    sections = [s.strip() for s in claims_list if s.strip()]

    # Filter out any sections that are just numbers and join the text content
    contents = [section for section in sections if not section.isdigit()]
    print(contents)

['CLM-00001']
00001
['An oximetry device comprising:a housing;a processor housed by the housing;a memory, housed by the housing, coupled to the processor;a display, housed by the housing and visible from an exterior of the housing, coupled to the processor; anda sensor head, housed by the housing and visible from an exterior of the housing, comprising at least a first source structure and at least a first detector structure,wherein the processor controls the at least first source structure and the at least first detector structure of the sensor head to make a number n of optical oximetry measurements of tissue to be measured,wherein when the n measurement is greater or less than a threshold amount, the processor control storage of a value for the n−1 measurement in the memory and storage of one in the memory,after the n optical oximetry measurements of the tissue are measured, the processor controls the at least first source structure and the at least first detector structure of the se

In [None]:
# say I want all ids starting with CLM i.e the claim ids - this is another alternative way to get the ids 
ids_CLM = root.xpath("//claim[starts-with(@id , 'CLM')]/@id") # here first I select all the claim elements whose id starts with CLM and then extract all the ids as a list
ids_CLM

['CLM-00001',
 'CLM-00002',
 'CLM-00003',
 'CLM-00004',
 'CLM-00005',
 'CLM-00006',
 'CLM-00007',
 'CLM-00008',
 'CLM-00009',
 'CLM-00010',
 'CLM-00011',
 'CLM-00012',
 'CLM-00013',
 'CLM-00014',
 'CLM-00015',
 'CLM-00016',
 'CLM-00017',
 'CLM-00018',
 'CLM-00019',
 'CLM-00020',
 'CLM-00021',
 'CLM-00022',
 'CLM-00023']

In [None]:
# Say I want all the p tag cntent which has the letters p-000 in the id 
p_text = root.xpath("//p[contains(@id, 'p-000')]//text()")
# Note however, that if there are multiple ids for each p and you want to select all p elements based on a particular id say p-0001, always use concat and add spaces //p[contains(concat(' ',@id,' '), ' p-0001 ')]//text()
p_text

['An oximetry device sealed in a sheath directs a user to allow the oximetry device to make oximetry readings at a number of different tissue locations of a patient and average two or more of the oximetry readings by directing the lifts and placements of the oximetry device and sheath to and from the different tissue locations and detecting the lift and placements. The averages are generated and displayed on a display of the device for the oximetry readings if the lifts are made while use directions for the lifts are displayed on a display of the oximetry device. The averages are not generated if the lifts are not made while the user directions for the lifts are not displayed. The averages are simultaneously displayed with the oximetry readings which are instantaneous measurement for patient tissue.',
 'This patent application claims the benefit of U.S. patent application 63/262,680, filed Oct. 18, 2021. This application is incorporated by reference along with all other references cite

In [81]:
# How about looking at the p elements which ahs an attribute num that ends with 00
p_attr_endswith_00 = root.xpath("//p[substring(@num, string-length(@num) - string-length('00') + 1)='00']/text()")
p_attr_endswith_00

['An oximetry device sealed in a sheath directs a user to allow the oximetry device to make oximetry readings at a number of different tissue locations of a patient and average two or more of the oximetry readings by directing the lifts and placements of the oximetry device and sheath to and from the different tissue locations and detecting the lift and placements. The averages are generated and displayed on a display of the device for the oximetry readings if the lifts are made while use directions for the lifts are displayed on a display of the oximetry device. The averages are not generated if the lifts are not made while the user directions for the lifts are not displayed. The averages are simultaneously displayed with the oximetry readings which are instantaneous measurement for patient tissue.',
 'The power block may include one more magnets ',
 ' that are arranged in an arrangement, such as a square, a rectangular, or another arrangement. A system unit may also have one or more 

The XPath expression:

```xpath
p_attr_endswith_00 = root.xpath("//p[substring(@num, string-length(@num) - string-length('00') + 1)='00']/text()")
```

is used to select `<p>` elements where the `num` attribute ends with the characters `"00"`. Here’s a breakdown of each part:

1. **`//p`**:
   - This selects all `<p>` elements in the XML document, regardless of where they are located.

2. **`[substring(@num, string-length(@num) - string-length('00') + 1)='00']`**:
   - This filter applies a condition to each `<p>` element, specifically targeting those whose `num` attribute ends with `"00"`.
   - Here’s how it works:
     - `@num`: Refers to the `num` attribute of the `<p>` element.
     - `string-length(@num)`: Calculates the length of the `num` attribute's string.
     - `string-length('00')`: Calculates the length of the string `"00"`, which is `2`.
     - `string-length(@num) - string-length('00') + 1`: This expression calculates the starting position of the last two characters in the `num` attribute. For instance, if `num` is `"1200"`, the expression evaluates to `3`, which is the position where `"00"` starts in `"1200"`.
     - `substring(@num, ..., 2)='00'`: Extracts the last two characters from `@num` and checks if they are equal to `"00"`.

3. **`/text()`**:
   - Finally, this part retrieves the text content of each `<p>` element that meets the condition.

### Summary
This expression selects the text content of all `<p>` elements in the XML document that have a `num` attribute ending with `"00"`.

In [76]:
# say i want to select all items with id = 'h-0001'
id_item = root.xpath('//*[@id="h-0001"]/text()')
id_item

['CROSS-REFERENCE TO RELATED APPLICATIONS']

## Axes and some Alternatives 

### child

In [None]:
'''
<publication-reference>
<document-id>
<country>US</country>
<doc-number>20230225638</doc-number>
<kind>A1</kind>
<date>20230720</date>
</document-id>
</publication-reference>
'''
# I want to get the country data 
# //publication-reference/document-id - selects the child element i.e. document-id of the node publication-reference 
# document-id/child::country - selects the child node i.e. country of the node document-id
# Note //publication-referencechild::document-id is also valid 
country = root.xpath('//publication-reference/document-id/child::country/text()')
country

['US']

### parent & ancestor

In [57]:
# Each claim references some other claim while some claims don't reference to others 
# I want to list such references 
'''
<claim id="CLM-00015" num="00015">
<claim-text><b>15</b>. The device of <claim-ref idref="CLM-00014">claim 14</claim-ref> wherein the first and second tissues are different tissue of the patient.</claim-text>
</claim>
''' 
# So, all I want is to find those claim elements that contain the clain-ref element 
# WAY 1
print('WAY 1')
claims = root.xpath('//claim[claim-text[claim-ref]]')
for claim in claims:
    claim_id = claim.get('id')
    claim_ref_id = claim.xpath('.//claim-ref/@idref')[0]
    print(f"The claiam : {claim_id} references {claim_ref_id}")
    
# WAY 2
print('WAY 2')
claims = root.xpath('//claim-ref/parent::claim-text/parent::claim')
for claim in claims:
    claim_id = claim.get('id')
    claim_ref_id = claim.xpath('.//claim-ref/@idref')[0]
    print(f"The claiam : {claim_id} references {claim_ref_id}")
    
# WAY 3 
print('WAY 3')
claims = root.xpath('//claim-ref/ancestor::claim') # since claim is an ancestor and not a parent of the claim-ref element. The parent for the claim-ref element is the claim-text.
for claim in claims:
    claim_id = claim.get('id')
    claim_ref_id = claim.xpath('.//claim-ref/@idref')[0]
    print(f"The claiam : {claim_id} references {claim_ref_id}")
    
# WAY 4 - another way to get ancestor 
print('WAY 4')
claims = root.xpath('//claim[.//claim-ref]')
for claim in claims:
    claim_id = claim.get('id')
    claim_ref_id = claim.xpath('.//claim-ref/@idref')[0]
    print(f"The claiam : {claim_id} references {claim_ref_id}")

WAY 1
The claiam : CLM-00002 references CLM-00001
The claiam : CLM-00003 references CLM-00002
The claiam : CLM-00004 references CLM-00001
The claiam : CLM-00006 references CLM-00005
The claiam : CLM-00007 references CLM-00005
The claiam : CLM-00008 references CLM-00005
The claiam : CLM-00009 references CLM-00005
The claiam : CLM-00010 references CLM-00005
The claiam : CLM-00011 references CLM-00010
The claiam : CLM-00012 references CLM-00011
The claiam : CLM-00013 references CLM-00005
The claiam : CLM-00015 references CLM-00014
The claiam : CLM-00016 references CLM-00014
The claiam : CLM-00017 references CLM-00014
The claiam : CLM-00018 references CLM-00014
The claiam : CLM-00019 references CLM-00014
The claiam : CLM-00020 references CLM-00019
The claiam : CLM-00021 references CLM-00019
The claiam : CLM-00022 references CLM-00021
The claiam : CLM-00023 references CLM-00021
WAY 2
The claiam : CLM-00002 references CLM-00001
The claiam : CLM-00003 references CLM-00002
The claiam : CLM-000

### descendants

In [None]:
# descendants can be child, grandchild etc.
print('Way 1')
claim_ref = root.xpath('//claim/descendant::claim-ref/@idref')
print(claim_ref)

print('Way 2') # using //
claim_ref = root.xpath('//claim//claim-ref/@idref')
print(claim_ref)

Way 1
['CLM-00001', 'CLM-00002', 'CLM-00001', 'CLM-00005', 'CLM-00005', 'CLM-00005', 'CLM-00005', 'CLM-00005', 'CLM-00010', 'CLM-00011', 'CLM-00005', 'CLM-00014', 'CLM-00014', 'CLM-00014', 'CLM-00014', 'CLM-00014', 'CLM-00019', 'CLM-00019', 'CLM-00021', 'CLM-00021']
Way 2
['CLM-00001', 'CLM-00002', 'CLM-00001', 'CLM-00005', 'CLM-00005', 'CLM-00005', 'CLM-00005', 'CLM-00005', 'CLM-00010', 'CLM-00011', 'CLM-00005', 'CLM-00014', 'CLM-00014', 'CLM-00014', 'CLM-00014', 'CLM-00014', 'CLM-00019', 'CLM-00019', 'CLM-00021', 'CLM-00021']


### sibling

```xml
<heading id="h-0002" level="1">BACKGROUND OF THE INVENTION</heading>
<p id="p-0003" num="0002">This invention relates generally to optical systems that monitor parameters related to oxygen levels in tissue. More and sheaths for the optical probes that shield the optical probes from contaminants during use and specifically, the present invention relates to optical probes, such as compact, handheld oximeters, communicate status information to the optical probes regarding contaminant protection so that the optical probes are reusable.</p>
<p id="p-0004" num="0003">Oximeters are medical devices used to measure the oxygen saturation of tissue in humans and living things for various purposes. For example, oximeters are used for medical and diagnostic purposes in hospitals and other medical facilities (e.g., operating rooms for surgery, recovery room for patient monitoring, or ambulance or other mobile monitoring for, e.g., hypoxia); sports and athletic purposes at a sports arena (e.g., professional athlete monitoring); personal or at-home monitoring of individuals (e.g., general health monitoring, or person training for a marathon); and veterinary purposes (e.g., animal monitoring).</p>
<p id="p-0005" num="0004">In particular, assessing a patient's oxygen saturation, at both the regional and local levels, is important as it is an indicator of the state of the patient's health. Thus, oximeters are often used in clinical settings, such as during surgery and recovery, where it can be suspected that the patient's tissue oxygenation state is unstable. For example, during surgery, oximeters should be able to quickly deliver accurate oxygen saturation measurements under a variety of non-ideal conditions.</p>
<p id="p-0006" num="0005">Pulse oximeters and tissue oximeters are two types of oximeters that operate on different principles. A pulse oximeter requires a pulse in order to function. A pulse oximeter typically measures the absorbance of light due to pulsing arterial blood. In contrast, a tissue oximeter does not require a pulse in order to function, and can be used to make oxygen saturation measurements of a tissue flap that has been disconnected from a blood supply.</p>
<p id="p-0007" num="0006">Human tissue, as an example, includes a variety of light-absorbing molecules. Such chromophores include oxygenated hemoglobin, deoxygenated hemoglobin, melanin, water, lipid, and cytochrome. Oxygenated and deoxygenated hemoglobins are the dominant chromophores in tissue for much of the visible and near-infrared spectral range. Light absorption differs significantly for oxygenated and deoxygenated hemoglobins at certain wavelengths of light. Tissue oximeters can measure oxygen levels in human tissue by exploiting these light-absorption differences.</p>
<p id="p-0008" num="0007">Despite the success of existing oximeters, there is a continuing desire to improve oximeters by, for example, improving the reuse of oximeters; reducing or eliminating contamination during use; improving remote communication; improving measurement accuracy; reducing measurement time; lowering cost through reuse; reducing size, weight, or form factor; reducing power consumption; and for other reasons, and any combination of these.</p>
<p id="p-0009" num="0008">Therefore, there is a need for improved tissue oximetry devices and methods of shielding oximetry devices during use for reuse of the devices.</p>
```

In [69]:
# above is the xml part we will use to distinguish between adjacent and general sibling 
adj_heading_p = root.xpath('//heading[@id="h-0002"]/following-sibling::*[1][self::p]/text()')
print(adj_heading_p)

adj_second_heading_p = root.xpath('//heading[@id="h-0002"]/following-sibling::*[2][self::p]/text()')
print(adj_second_heading_p)

['This invention relates generally to optical systems that monitor parameters related to oxygen levels in tissue. More and sheaths for the optical probes that shield the optical probes from contaminants during use and specifically, the present invention relates to optical probes, such as compact, handheld oximeters, communicate status information to the optical probes regarding contaminant protection so that the optical probes are reusable.']
['Oximeters are medical devices used to measure the oxygen saturation of tissue in humans and living things for various purposes. For example, oximeters are used for medical and diagnostic purposes in hospitals and other medical facilities (e.g., operating rooms for surgery, recovery room for patient monitoring, or ambulance or other mobile monitoring for, e.g., hypoxia); sports and athletic purposes at a sports arena (e.g., professional athlete monitoring); personal or at-home monitoring of individuals (e.g., general health monitoring, or person 

The XPath expression:

```xpath
//heading[@id="h-0002"]/following-sibling::*[1][self::p]/text()
```

can be broken down as follows:

1. **`//heading[@id="h-0002"]`**:
   - This part of the XPath selects a `<heading>` element anywhere in the XML document that has an attribute `id` with the value `"h-0002"`.
   
2. **`/following-sibling::*[1]`**:
   - The `/following-sibling::*[1]` part selects the *first sibling element* that follows the `<heading>` element with `id="h-0002"`. 
   - The `*` means "any element type," so it picks the first element of any type that immediately follows the `<heading>` element.

3. **`[self::p]`**:
   - This filter restricts the first following sibling element to only `<p>` elements.
   - So, even though the XPath found the first following sibling of any type, this part checks if the sibling is specifically a `<p>` element.

4. **`/text()`**:
   - Finally, `/text()` retrieves the text content of the selected `<p>` element.

### Summary
In summary, this XPath expression:
- Looks for a `<heading>` element with `id="h-0002"`.
- Finds its first following sibling element.
- Checks if this sibling is a `<p>` element.
- If it is, it extracts and returns the text content of that `<p>` element.

If the first following sibling is not a `<p>` element, this XPath expression will return nothing because of the `[self::p]` condition.

In [None]:
# select now all the general p siblings of the heading element 
gen_heading_p = root.xpath('//heading[@id="h-0002"]/following-sibling::p/text()')
gen_heading_p # Note that it prints all the p elements which are general siblings - the list is hence long!

['This invention relates generally to optical systems that monitor parameters related to oxygen levels in tissue. More and sheaths for the optical probes that shield the optical probes from contaminants during use and specifically, the present invention relates to optical probes, such as compact, handheld oximeters, communicate status information to the optical probes regarding contaminant protection so that the optical probes are reusable.',
 'Oximeters are medical devices used to measure the oxygen saturation of tissue in humans and living things for various purposes. For example, oximeters are used for medical and diagnostic purposes in hospitals and other medical facilities (e.g., operating rooms for surgery, recovery room for patient monitoring, or ambulance or other mobile monitoring for, e.g., hypoxia); sports and athletic purposes at a sports arena (e.g., professional athlete monitoring); personal or at-home monitoring of individuals (e.g., general health monitoring, or person 

In [74]:
# Note thta, the headings re all siblings to each other i.e. general siblings to each other 
headings = root.xpath('//heading[@id="h-0001"]/following-sibling::heading/text()')
headings

['BACKGROUND OF THE INVENTION',
 'BRIEF SUMMARY OF THE INVENTION',
 'DETAILED DESCRIPTION OF THE INVENTION']