# Extracting relevant EIC codes from all available codes.

To use the ENTSO-E API, you need to reference various types of _Energy Identification_ (EIC) codes. There are different kinds of EICs, and I think the one of interest to us as users of ENTSO-E's API to get power generation data are the _Control Area_ (CTA) codes and the _Bidding Zone_ (BDZ) codes.

## Goal of this notebook

To extract the relevant codes to use with the ENTSO-E API in order to scrape the ENTSO-E API for generation plant production data.

I think what we need is a dictionary of _country codes_ (CTY) with a dictionary of _control areas_ (CTA), and their respective EIC codes. Quite often the CTY and CTA is the same, but not always and then it seems like the API require the CTA. I've never seen a CTA that belongs to two different countries.

## Introduction to EIC codes

An EIC code encodes different types of information in different characters within the code. For example, consider this EIC code: `10Y1001Câ€”00003F`.
- The two first digits sais something about who issued the code, 10 represents ENTSO-E as can be seen in [this list](https://www.entsoe.eu/data/energy-identification-codes-eic/#eic-lio-websites).
- The following character sais what object type the code represents, we will be interested in the `Y` codes representing an geographical area, for more details see [this document](https://docstore.entsoe.eu/Documents/EDI/Library/EIC_Short_Guide_and_FAQ_V3_Approved%20April%202016.pdf).
- The following numbers dashes or digits, except the last describe the object that the code represents somehow.
- The final character is a verification character to make sure it is a valid code, it can be determined by an algorithm from the previous characters.

## References

- [PDF: Code specification and FAQ](https://docstore.entsoe.eu/Documents/EDI/Library/EIC_Short_Guide_and_FAQ_V3_Approved%20April%202016.pdf)
- [PDF: An overview of EICs from maps](https://www.entsoe.eu/fileadmin/user_upload/edi/library/downloads/Market_Areas_v2.0.pdf)
- [WEB: Documentation](https://www.entsoe.eu/data/energy-identification-codes-eic/#energy-identification-codes-eic-documentation)
- [XML: List of EIC codes](https://www.entsoe.eu/fileadmin/user_upload/edi/library/eic/allocated-eic-codes.zip)
- [ENTSO-E's transparency platform which we want to access using an API](https://transparency.entsoe.eu/generation/r2/actualGenerationPerGenerationUnit/show).

## XML data download

We'll download some relevant data.

In [210]:
import os
if not os.path.exists("data"):
    os.makedirs("data")

file = "data/allocated-eic-codes"
    
import urllib.request
urllib.request.urlretrieve("https://www.entsoe.eu/fileadmin/user_upload/edi/library/eic/allocated-eic-codes.zip", file + ".zip")

import zipfile
with zipfile.ZipFile(file + ".zip", 'r') as zip_ref:
    zip_ref.extractall("data")

## XML data examples

This is one of the relevant entries from the XML list of EIC codes in the reference that we will extract information from. I believe it to be a XML representation of the PDF that gives an overview of EICs from maps, also in the reference.

Here we can see United Kingdom with a EIC code of `10Y1001A1001A92E` that only functions as a Member State code. Since it does not function as a `Control Area`, I don't think we can use it with the API.

```xml
<EICCode_MarketDocument>
    <mRID>10Y1001A1001A92E</mRID>
    <docStatus>
        <value>A05</value>
    </docStatus>
    <attributeInstanceComponent.attribute>International</attributeInstanceComponent.attribute>
    <long_Names.name>United Kingdom</long_Names.name>
    <display_Names.name>UK</display_Names.name>
    <lastRequest_DateAndOrTime.date>2016-10-10</lastRequest_DateAndOrTime.date>
    <deactivationRequested_DateAndOrTime.date/>
    <eICCode_MarketParticipant.streetAddress>
        <townDetail/>
    </eICCode_MarketParticipant.streetAddress>
    <eICCode_MarketParticipant.aCERCode_Names.name/>
    <eICResponsible_MarketParticipant.mRID>10X1001A1001A515</eICResponsible_MarketParticipant.mRID>
    <description>Member State</description>
    <Function_Names>
    <name>Member State</name>
    </Function_Names>
</EICCode_MarketDocument>
```

Here is Northen Ireland and Great Britain, two control areas (CTAs) part of United Kingdom.

```xml
<EICCode_MarketDocument>
    <mRID>10Y1001A1001A016</mRID>
    <docStatus><value>A05</value></docStatus>
    <attributeInstanceComponent.attribute>International</attributeInstanceComponent.attribute>
    <long_Names.name>Northern Ireland</long_Names.name>
    <display_Names.name>GB-NI</display_Names.name>
    <lastRequest_DateAndOrTime.date>2018-10-12</lastRequest_DateAndOrTime.date>
    <deactivationRequested_DateAndOrTime.date/>
    <eICCode_MarketParticipant.streetAddress>
        <townDetail>
            <country>GB</country>
        </townDetail>
    </eICCode_MarketParticipant.streetAddress>
    <eICCode_MarketParticipant.aCERCode_Names.name/><description>Control Area, Scheduling Area</description>
    <Function_Names>
        <name>Market Balance Area</name>
    </Function_Names>
    <Function_Names>
        <name>Control Area</name>
    </Function_Names>
</EICCode_MarketDocument>

<EICCode_MarketDocument>
    <mRID>10YGB----------A</mRID>
    <docStatus>
        <value>A05</value>
    </docStatus>
    <attributeInstanceComponent.attribute>International</attributeInstanceComponent.attribute>
    <long_Names.name>Great Britain</long_Names.name>
    <display_Names.name>GB</display_Names.name>
    <lastRequest_DateAndOrTime.date>2016-10-10</lastRequest_DateAndOrTime.date>
    <deactivationRequested_DateAndOrTime.date/>
    <eICCode_MarketParticipant.streetAddress>
        <townDetail>
            <country>GB</country>
        </townDetail>
    </eICCode_MarketParticipant.streetAddress>
    <eICCode_MarketParticipant.aCERCode_Names.name/>
    <description>Bidding Zone, Control Area, Market Balance Area, Scheduling Area</description>
    <Function_Names>
        <name>Market Balance Area</name>
    </Function_Names>
    <Function_Names>
        <name>Control Area</name>
    </Function_Names>
    <Function_Names>
        <name>Bidding Zone</name>
    </Function_Names>
</EICCode_MarketDocument>
```

## Goal restated

Extract all CTA's and organize them into a data structure of this form.

```yaml
# an example data structure for represented YAML
# here with only one country
countries:
  - abbrev: "UK"
    name: "United Kingdom"
    control_areas:
      - abbrev: "GB-NI"
        country: "GB"
        eic: "10Y1001A1001A016"
        name: "Northen Ireland"
      - abbrev: "GB"
        country: "GB"
        eic: "10YGB----------A"
        name: "Great Britain"
```

In [225]:
import collections
import pickle
import xml.etree.ElementTree as ET

# NOTE: Python XML Parsing documentation is available here
# https://docs.python.org/3/library/xml.etree.elementtree.html#parsing-xml
tree = ET.parse(file + ".xml")
root = tree.getroot()

"""
Countries not listed by the allocated-eic-codes.xml

Hmm... countries sometimes have multiple Control Areas (CTA).
Also, sometimes a country has multiple bidding zones, and a bidding
zone can also have areas from multiple countries...

While clicking around in transparency.entsoe.eu,
in the URL we sometimes see query strings like:

    area.values=CTY|10YGR-HTSO-----Y!CTA|10YGR-HTSO-----Y
    
They use "|" to separate a key and a value,
and "!" to separate key/value pairs
"""

from entsoe.mappings import DOCSTATUS

control_areas = []
for eic_code in root:
    if "EICCode_MarketDocument" not in eic_code.tag:
        continue
        
    control_area_dict = {"abbrev": "", "eic": "", "name": "", "status": ""}
    control_area_bool = False
    debug = False
    for e in eic_code:
        # We want only EIC codes functioning as a Member State.
        # EIC code can have multiple functions though, such as:
        # - Bidding Zone
        # - Control Area
        # - Control Block
        # - Member State
        # - Market Balance Area
        # - Scheduling Area
        if "mRID" in e.tag:
            control_area_dict["eic"] = e.text
        if "docStatus" in e.tag and len(e):
            control_area_dict["status"] = DOCSTATUS[e[0].text]
        elif "long_Names.name" in e.tag:
            control_area_dict["name"] = e.text
        elif "display_Names.name" in e.tag:
            control_area_dict["abbrev"] = e.text
        elif "streetAddress" in e.tag:
            if len(e) and len(e[0]):
                control_area_dict["country"] = e[0][0].text
        elif "Function_Names" in e.tag:
            for function_name in e:
                if "control area" in function_name.text.lower():
                    control_area_bool = True
    if not control_area_bool:
        continue
        
    # assume missing country information
    if "country" not in control_area_dict:
        # 1: is the eic 4-5 character non-numeric, then lets use them.
        if control_area_dict["eic"][3:5].isalpha():
            control_area_dict["country"] = control_area_dict["eic"][3:5]
        # 2: use the part of the abbrev before - or _
        else:
            country_abbrev = control_area_dict["abbrev"].split("-")[0].split("_")[0]
            # 2+: Trim away indexes from DK1 DK2
            if len(country_abbrev) == 3 and country_abbrev[2].isnumeric():
                country_abbrev = country_abbrev[0:2]
            # Fix malta
            if country_abbrev == "MALTA":
                country_abbrev = "MT"
            assert len(country_abbrev) == 2
            control_area_dict["country"] = country_abbrev
        
    
    # fix quirks
    if control_area_dict["abbrev"] == "CA-------DENMARK":
        control_area_dict["abbrev"] = "DK"
    elif control_area_dict["abbrev"] == "MALTA_AREA":
        control_area_dict["abbrev"] = "MT"
    elif control_area_dict["country"] == "GB":
        control_area_dict["country"] = "UK"

    control_areas.append(control_area_dict)


# Construct the final result, a dictionary with countries
# and their respective control areas.
country_control_areas = {}
for cta in control_areas:
    if cta["country"] not in country_control_areas:
        country_control_areas[cta["country"]] = []
    country_control_areas[cta["country"]].append(cta)

# Sort dictionary by keys
country_control_areas = dict(collections.OrderedDict(
    sorted(country_control_areas.items())
))


# Write result to a file
with open('country_control_areas.pickle', 'wb') as handle:
    pickle.dump(country_control_areas, handle) 

In [226]:
from entsoe.mappings import DOMAIN_MAPPINGS

# countries in entsoe.mappings.DOMAIN_MAPPINGS but not in our result
for country in DOMAIN_MAPPINGS:
    if country not in country_control_areas:
        print(country)

print("-----")

# in our result but not in entsoe.mappings.DOMAIN_MAPPINGS
for country in country_control_areas:
    if country not in DOMAIN_MAPPINGS:
        print(country)

CY
ICELAND
-----
ES
MT
TR
UA


In [233]:
sorted(DOMAIN_MAPPINGS)

['AL',
 'AT',
 'BA',
 'BE',
 'BG',
 'CH',
 'CY',
 'CZ',
 'DE',
 'DK',
 'EE',
 'FI',
 'FR',
 'GR',
 'HR',
 'HU',
 'ICELAND',
 'IE',
 'IT',
 'LT',
 'LU',
 'LV',
 'ME',
 'MK',
 'NL',
 'NO',
 'PL',
 'PT',
 'RO',
 'RS',
 'SE',
 'SI',
 'SK',
 'UK']

In [232]:
sorted([cta["country"] for cta in control_areas])

['AL',
 'AT',
 'BA',
 'BE',
 'BG',
 'CH',
 'CZ',
 'DE',
 'DE',
 'DE',
 'DE',
 'DK',
 'DK',
 'DK',
 'EE',
 'ES',
 'FI',
 'FR',
 'GR',
 'HR',
 'HU',
 'IE',
 'IT',
 'LT',
 'LU',
 'LV',
 'ME',
 'MK',
 'MT',
 'NL',
 'NO',
 'PL',
 'PT',
 'RO',
 'RS',
 'SE',
 'SI',
 'SK',
 'TR',
 'UA',
 'UA',
 'UA',
 'UA',
 'UK',
 'UK']