<a href="https://colab.research.google.com/github/charlotter62/EU-ETS-EUTL/blob/main/A1_xml_accounts_byaccountID.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Downloading Accounts XML files


---


**Description**:

The following code downloads account XML files from the [European Union Transaction Log](https://ec.europa.eu/clima/ets/account.do?languageCode=en). The files are downloaded by registry and account type. As inputs, the script requires the registry code and accountID for each account, which helps create the link where the XML file can be downloaded. The following script parses the account XML files to a csv: [xml-accounts-byaccountID_Parse_XML.ipynb](https://colab.research.google.com/drive/1nIU_lGZz-lnzHZ7lMQsQEY-kgPU-LB_E?usp=sharing)

**Author**: Charlotte Rivard
**Contact**: 15crivard@gmail.com
**Date**: 1/13/2022

*Please reach out with questions and coauthorship considerations if using this script for publications*

---

In [None]:
from google.colab import drive
drive.mount('/gdrive')

Mounted at /gdrive


In [None]:
import pandas as pd

In [None]:
!pip install wget
import wget



In [None]:
import requests
from bs4 import BeautifulSoup
import time
import pandas as pd
import os
from socket import error as SocketError
import errno

In [None]:
def isXML(filepath):
  f = open(filepath, "r")
  xml=False
  if "<?xml" in f.readline():
    xml=True
  return(xml)

In [None]:
def patientDownload(link,savename):
  success=0
  while(success!=1):
    try:
      wget.download(link,savename)
      if(isXML(savename)):
        success=1
      else:
        print("Download failed, attempting again")
        os.remove(savename)
        time.sleep(10)
    except SocketError as e:
      if e.errno != errno.ECONNRESET:
          raise # Not error we are looking for
      print("Download failed, attempting again")
      time.sleep(10)

In [None]:
def getXMLbyAccountID(id,regcode):
  savename = workingdir+"XML files/"+ regcode+"_"+str(id)+"_account.xml"
  if(os.path.exists(savename)):
    print("File already exists! " + savename)
  else:
    print("Downloading..." + savename)
    link = "https://ec.europa.eu/clima/ets/exportEntry.do?form=singleAccount&accountID="+str(id)+"&accountID="+str(id)+"&registryCode="+regcode+"&searchType=&action=details&languageCode=en&returnURL=accountHolder%3D%26search%3DSearch%26languageCode%3Den%26searchType%3Daccount%26currentSortSettings%3D"+"&exportType=1&exportAction=accountAll&exportOK=exportOK"
    print(link)
    patientDownload(link,savename)

Single test...

In [None]:
ID = int(6206)
getXMLbyAccountID(ID,"AT")

https://ec.europa.eu/clima/ets/exportEntry.do?form=singleAccount&accountID=6206&accountID=6206&registryCode=AT&searchType=&action=details&languageCode=en&returnURL=accountHolder%3D%26search%3DSearch%26languageCode%3Den%26searchType%3Daccount%26currentSortSettings%3D&exportType=1&exportAction=accountAll&exportOK=exportOK


In [None]:
workingdir = "/gdrive/MyDrive/Brookings/XML_downloads/xml-accounts-byaccountID/"
ID = int(101987)
getXMLbyAccountID(ID,"NO")

Downloading.../gdrive/MyDrive/Brookings/XML_downloads/xml-accounts-byaccountID/XML files/NO_101987_account.xml
https://ec.europa.eu/clima/ets/exportEntry.do?form=singleAccount&accountID=101987&accountID=101987&registryCode=NO&searchType=&action=details&languageCode=en&returnURL=accountHolder%3D%26search%3DSearch%26languageCode%3Den%26searchType%3Daccount%26currentSortSettings%3D&exportType=1&exportAction=accountAll&exportOK=exportOK


Download all in loop...

In [None]:
workingdir = "/gdrive/MyDrive/Brookings/XML_downloads/xml-accounts-byaccountID/"
accounts = pd.read_csv(workingdir + "all_AccountIDs_Regcodes.csv")
accounts

Unnamed: 0,Account.ID,Registry.Code
0,6206,AT
1,7412,AT
2,6337,AT
3,13396,IT
4,13397,IT
...,...,...
33044,25120,SE
33045,74418,SI
33046,64364,SI
33047,65524,SI


In [None]:
for i in range(0,len(accounts)):
  ID = int(accounts.loc[i,"Account.ID"])
  Reg = accounts.loc[i,"Registry.Code"]
  #print(Reg+str(ID))
  getXMLbyAccountID(ID,Reg)

Downloading.../gdrive/MyDrive/Brookings/XML_downloads/xml-accounts-byaccountID/XML files/NO_101992_account.xml
https://ec.europa.eu/clima/ets/exportEntry.do?form=singleAccount&accountID=101992&accountID=101992&registryCode=NO&searchType=&action=details&languageCode=en&returnURL=accountHolder%3D%26search%3DSearch%26languageCode%3Den%26searchType%3Daccount%26currentSortSettings%3D&exportType=1&exportAction=accountAll&exportOK=exportOK
Downloading.../gdrive/MyDrive/Brookings/XML_downloads/xml-accounts-byaccountID/XML files/NO_101986_account.xml
https://ec.europa.eu/clima/ets/exportEntry.do?form=singleAccount&accountID=101986&accountID=101986&registryCode=NO&searchType=&action=details&languageCode=en&returnURL=accountHolder%3D%26search%3DSearch%26languageCode%3Den%26searchType%3Daccount%26currentSortSettings%3D&exportType=1&exportAction=accountAll&exportOK=exportOK
Downloading.../gdrive/MyDrive/Brookings/XML_downloads/xml-accounts-byaccountID/XML files/NO_101973_account.xml
https://ec.euro