## XML Sitemap Generator

This notebook will generate all of the sitemaps that are stored within the Platform webapp ``/sitemap`` directory and linked to the [Platform index sitemap](https://www.targetvalidation.org/sitemaps/1804/index.xml). The index sitemap is the one that we submit to Google, Bing, Yahoo, Yandex, etc. to improve our SEO.

This notebook should be **run at least 3 times per year (every other release)** to ensure that search engines are correctly indexing the Platform. 

Links and documentation:

* [JS sitemap generator (stored in webapp repo)](https://github.com/opentargets/webapp/blob/master/sitemap-generator.js)
* [Creating XML sitemap from list](https://stackoverflow.com/questions/16681543/create-xml-file-with-python-by-iterating-over-lists)
* [ElementTree XML API documentation](https://docs.python.org/3.4/library/xml.etree.elementtree.html#building-xml-documents)
* [Sitemap XML format](https://www.sitemaps.org/protocol.html)

In [2]:
diseases = [
    {
        "efo_id": "EFO_0000400",
        "efo_label": "diabetes mellitus"
    }, 
    {
        "efo_id": "EFO_0000305",
        "efo_label": "breast carcinoma"
    },
    {
        "efo_id": "EFO_0003060",
        "efo_label": "non small cell lung carcinoma"
    },
    {
        "efo_id": "EFO_0003843",
        "efo_label": "pain"
    },
    {
        "efo_id": "EFO_0000616",
        "efo_label": "neoplasm",
    },    
]

targets = [
    {
        "ensembl_id": "ENSG00000113580",
        "symbol": "NR3C1"
    },
    {
        "ensembl_id": "ENSG00000146648",
        "symbol": "EGFR"
    },
    {
        "ensembl_id": "ENSG00000095303",
        "symbol": "PTGS1"
    },

    {
        "ensembl_id": "ENSG00000131747",
        "symbol": "TOP2A"
    },
    {
        "ensembl_id": "ENSG00000091831",
        "symbol": "ESR1"
    },  
]

In [5]:
import xml.etree.ElementTree as ET

# create root node
urlset = ET.Element("urlset")

# set XML standards and validation properties
urlset.set("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9")
urlset.set("xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance")
urlset.set("xsi:schemaLocation", "http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd")

#create child nodes of root node using diseases list values
for disease in diseases:
    url = ET.SubElement(urlset,"url")
    loc = ET.SubElement(url, "loc")
    loc.text = "https://www.targetvalidation.org/disease/" + disease["efo_id"]


