**XML**: XML stands for eXtensible Markup Language. It was designed to store and transport data. It was designed to be both human- and machine-readable.That’s why, the design goals of XML emphasize simplicity, generality, and usability across the Internet.
The XML file to be parsed in this tutorial is actually a RSS feed.

**RSS**: RSS(Rich Site Summary, often called Really Simple Syndication) uses a family of standard web feed formats to publish frequently updated informationlike blog entries, news headlines, audio, video. RSS is XML formatted plain text.

In [2]:
import csv 
import requests 
import xml.etree.ElementTree as ET

In [None]:
def loadRSS():
    # url of rss feed 
    url = 'http://www.hindustantimes.com/rss/topnews/rssfeed.xml'

In [3]:
tree = ET.parse('country_data.xml')

In [4]:
root = tree.getroot()

In [5]:
root.tag

'data'

In [6]:
root.attrib

{}

In [7]:
for child in root:
    print(child.tag,child.attrib)

country {'name': 'Liechtenstein'}
country {'name': 'Singapore'}
country {'name': 'Panama'}


In [11]:
## XML index and position works like index and position for Python
root[0][2].text

'141100'

In [None]:
# Another way to define the XML root if we have XML content as string
root = ET.fromstring(xml_content)

In [None]:
def extract_essential_contract_data(root):
    contract_data = {
        "contract_identification": {},
        "dates": {},
        "financial": {},
        "contractor": {},
        "line_items": [],
        "reference_data":{}
    }

    ns = {"": root.tag.split('}')[0].strip('{') if '}' in root.tag else ""}

### Code explain


The last line is used to create a dictionary ns that contains a mapping of the namespace (if any) associated with the XML element tag, typically used in XML parsing when dealing with XML namespaces.

Let's break it down:

1) root.tag

root.tag refers to the tag name of the root element of the XML document. In XML, element names can include a namespace, which appears before the element name, separated by a }. For example:
            <ns:contract>...</ns:contract> (where ns is the namespace prefix)

* The full tag might look like {http://example.com/ns}contract if the XML uses namespaces in this form.
root.tag.split('}')[0]
    - root.tag.split('}')[0] splits the root.tag string at the } character, which separates the namespace URI from the tag name.

* For example, if root.tag is {http://example.com/ns}contract, the result of split('}') would be:
                ['{http://example.com/ns', 'contract']

* The split('}')[0] part selects the first part of the split result: '{http://example.com/ns'.strip('{').strip('{') removes the { character from the beginning of the string, which is used to mark the start of a namespace in XML.

* So, {http://example.com/ns becomes http://example.com/ns. if '}' in root.tag else ""
* This part of the code checks if there is a } character in root.tag. If the } character is present, it means the tag includes a namespace, and the logic will extract the namespace URI (i.e., the portion before }).

* If there's no } (i.e., no namespace), the else "" ensures that an empty string "" is used as the namespace.
Putting it together
    - ns = {"": root.tag.split('}')[0].strip('{') if '}' in root.tag else ""}

* What it does: The line creates a dictionary ns with the key "" (an empty string, which typically represents the default namespace in XML) and the value as the namespace URI extracted from root.tag. If no namespace exists in root.tag, the value is an empty string.

Example 1: If the root.tag is {http://example.com/ns}contract, the result would be:

            ns = {"": "http://example.com/ns"}
Example 2: If root.tag is simply contract (without a namespace), the result would be:

            ns = {"": ""}

**This ns dictionary is often used in XML parsing to define namespaces when using methods that require them, like when querying for elements with specific namespace**