# Cleaning the NUTS database

**Resources:**

https://github.com/sumtxt/regionaldata-guide-eu/blob/main/README.md

https://github.com/rOpenGov/regions/tree/0890afe9ffd3e0d05372a2b1eb6bfa00d20d7cb5

https://docs.ropensci.org/nuts/articles/nuts.html#nuts-codes

https://ec.europa.eu/eurostat/web/gisco/geodata/statistical-units/territorial-units-statistics

https://ec.europa.eu/statistical-atlas/viewer/?config=typologies.json&mids=BKGCNT,NUTS2024L2,CNTOVL&o=1,1,0.7&ch=NUTS&center=48.96257,17.22023,3&lcis=NUTS2024L2&

https://en.wikipedia.org/wiki/First-level_NUTS_of_the_European_Union#Czech_Republic

"eu nuts database"

In [2]:
import pandas as pd

## NUTS 2024 (Not used)

In [3]:
df24 = pd.read_excel("datasets/nuts21-24.xlsx")
df24

Unnamed: 0,Country code,NUTS Code,NUTS label,NUTS level,Country order,#
0,BE,BE1,Région de Bruxelles-Capitale/Brussels Hoofdste...,1,1,1
1,BE,BE10,Région de Bruxelles-Capitale/Brussels Hoofdste...,2,1,2
2,BE,BE100,Arr. de Bruxelles-Capitale/Arr. Brussel-Hoofdstad,3,1,3
3,BE,BE2,Vlaams Gewest,1,1,4
4,BE,BE21,Prov. Antwerpen,2,1,5
...,...,...,...,...,...,...
1577,SE,SE331,Västerbottens län,3,27,1578
1578,SE,SE332,Norrbottens län,3,27,1579
1579,SE,SEZ,Extra-Regio NUTS 1,1,27,1580
1580,SE,SEZZ,Extra-Regio NUTS 2,2,27,1581


In [32]:
df24.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1582 entries, 0 to 1581
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Country code   1582 non-null   object
 1   NUTS Code      1582 non-null   object
 2   NUTS label     1582 non-null   object
 3   NUTS level     1582 non-null   int64 
 4   Country order  1582 non-null   int64 
 5   #              1582 non-null   int64 
dtypes: int64(3), object(3)
memory usage: 74.3+ KB


In [33]:
#df24 = df24[df24["NUTS level"] < 3]

Unnamed: 0,Country code,NUTS Code,NUTS label,NUTS level,Country order,#
0,BE,BE1,Région de Bruxelles-Capitale/Brussels Hoofdste...,1,1,1
1,BE,BE10,Région de Bruxelles-Capitale/Brussels Hoofdste...,2,1,2
3,BE,BE2,Vlaams Gewest,1,1,4
4,BE,BE21,Prov. Antwerpen,2,1,5
8,BE,BE22,Prov. Limburg (BE),2,1,9
...,...,...,...,...,...,...
1569,SE,SE31,Norra Mellansverige,2,27,1570
1573,SE,SE32,Mellersta Norrland,2,27,1574
1576,SE,SE33,Övre Norrland,2,27,1577
1579,SE,SEZ,Extra-Regio NUTS 1,1,27,1580


In [35]:
df24["Country code"].unique()

array(['BE', 'BG', 'CZ', 'DK', 'DE', 'EE', 'IE', 'EL', 'ES', 'FR', 'HR',
       'IT', 'CY', 'LV', 'LT', 'LU', 'HU', 'MT', 'NL', 'AT', 'PL', 'PT',
       'RO', 'SI', 'SK', 'FI', 'SE'], dtype=object)

In [38]:
df24[df24["NUTS Code"] == "RO42"]

Unnamed: 0,Country code,NUTS Code,NUTS label,NUTS level,Country order,#
1476,RO,RO42,Vest,2,23,1477


In [48]:
df24[df24["NUTS label"] == "Garonne"]

Unnamed: 0,Country code,NUTS Code,NUTS label,NUTS level,Country order,#


In [77]:
df24[df24["Country code"] == "SE"].sort_values("NUTS level").head(60)

Unnamed: 0,Country code,NUTS Code,NUTS label,NUTS level,Country order,#
1547,SE,SE1,Östra Sverige,1,27,1548
1579,SE,SEZ,Extra-Regio NUTS 1,1,27,1580
1556,SE,SE2,Södra Sverige,1,27,1557
1568,SE,SE3,Norra Sverige,1,27,1569
1580,SE,SEZZ,Extra-Regio NUTS 2,2,27,1581
1569,SE,SE31,Norra Mellansverige,2,27,1570
1562,SE,SE22,Sydsverige,2,27,1563
1548,SE,SE11,Stockholm,2,27,1549
1557,SE,SE21,Småland med öarna,2,27,1558
1550,SE,SE12,Östra Mellansverige,2,27,1551


## 2021 dataset (USED)

Downloaded from: https://ec.europa.eu/eurostat/documents/345175/629341/NUTS2021.xlsx

In [56]:
df21 = pd.read_excel("datasets/NUTS2021.xlsx", sheet_name="NUTS & SR 2021")

In [57]:
df21 = df21[["Code 2021", "Country", "NUTS level 1", "NUTS level 2", "NUTS level 3", "NUTS level"]]

In [59]:
df21["Country code"] = df21["Code 2021"].str[0:2]

In [60]:
df21

Unnamed: 0,Code 2021,Country,NUTS level 1,NUTS level 2,NUTS level 3,NUTS level,Country code
0,BE,Belgique/België,,,,0.0,BE
1,BE1,,Région de Bruxelles-Capitale/Brussels Hoofdste...,,,1.0,BE
2,BE10,,,Région de Bruxelles-Capitale/ Brussels Hoofdst...,,2.0,BE
3,BE100,,,,Arr. de Bruxelles-Capitale/Arr. Brussel-Hoofdstad,3.0,BE
4,BE2,,Vlaams Gewest,,,1.0,BE
...,...,...,...,...,...,...,...
2119,TRC33,,,,Şırnak,3.0,TR
2120,TRC34,,,,Siirt,3.0,TR
2121,TRZ,,Extra-Regio NUTS 1,,,1.0,TR
2122,TRZZ,,,Extra-Regio NUTS 2,,2.0,TR


In [62]:
# Includes Norway, Turkey and other irrelevant regions that need to be manually dropped

df21 = df21.drop(df21.index[1608:])

In [65]:
df21["NUTS label"] = df21["NUTS level 3"]
df21["NUTS label"] = df21["NUTS label"].fillna(df21["NUTS level 2"])
df21["NUTS label"] = df21["NUTS label"].fillna(df21["NUTS level 1"])
df21["NUTS label"] = df21["NUTS label"].fillna(df21["Country"])

In [69]:
df21 = df21[["Country code", "Code 2021", "NUTS label", "NUTS level"]]

In [73]:
df21 = df21.rename(columns={"Code 2021" : "NUTS Code"})

In [74]:
df21

Unnamed: 0,Country code,NUTS Code,NUTS label,NUTS level
0,BE,BE,Belgique/België,0.0
1,BE,BE1,Région de Bruxelles-Capitale/Brussels Hoofdste...,1.0
2,BE,BE10,Région de Bruxelles-Capitale/ Brussels Hoofdst...,2.0
3,BE,BE100,Arr. de Bruxelles-Capitale/Arr. Brussel-Hoofdstad,3.0
4,BE,BE2,Vlaams Gewest,1.0
...,...,...,...,...
1603,SE,SE331,Västerbottens län,3.0
1604,SE,SE332,Norrbottens län,3.0
1605,SE,SEZ,Extra-Regio NUTS 1,1.0
1606,SE,SEZZ,Extra-Regio NUTS 2,2.0


In [76]:
df21.to_csv("./datasets/clean/nuts21.csv", index=False)

In [163]:
df21[df21["Country code"] == "EL"].sort_values("NUTS level").head(60)

Unnamed: 0,Country code,NUTS Code,NUTS label,NUTS level
636,EL,EL,Ελλάδα,0.0
637,EL,EL3,Αττική,1.0
706,EL,ELZ,Extra-Regio NUTS 1,1.0
682,EL,EL6,Κεντρική Ελλάδα,1.0
646,EL,EL4,"Νησιά Αιγαίου, Κρήτη",1.0
659,EL,EL5,Βόρεια Ελλάδα,1.0
674,EL,EL53,Δυτική Μακεδονία,2.0
707,EL,ELZZ,Extra-Regio NUTS 2,2.0
683,EL,EL61,Θεσσαλία,2.0
666,EL,EL52,Κεντρική Μακεδονία,2.0


## Lookup

From:
- **df_lk2**, the less detailed one: https://cohesiondata.ec.europa.eu/2014-2020/ESIF-2014-2020-LOOKUP-TABLE-ERDF-ESF-CF-Programme-/466c-pqi8/about_data
- **df_lk**, the more detailed one: https://cohesiondata.ec.europa.eu/2014-2020/ESIF-2014-2020-Regional-coverage-NUTS-2010/vcwh-rdfj/about_data 

In [6]:
df_lk = pd.read_csv("datasets/nuts-lookup.csv")

In [16]:
len(df_lk["CCI"].unique()) # 23043 total vs 544 unique

544

In [17]:
df_lk2 = pd.read_csv("datasets/nuts-lookup2.csv")

In [19]:
len(df_lk2["CCI_code"].unique()) # 4520 total vs 403 unique

403

In [165]:
df_lk[df_lk["CCI"] == "2014FR06RDRP093"]

Unnamed: 0,Country,CCI,Programme short Title,Programme_long_Title,Programme version,Fund-All,Fund-sole,NUTS_code,NUTS_Title
13478,FR,2014FR06RDRP093,Provence-Alpes-Côte d'Azur - Rural Development,France - Rural Development Programme (Regional...,12.1,EAFRD,EAFRD,FR,France


In [110]:
df_lk2[df_lk2["CCI_code"] == "2014IT06RDRP013"]

Unnamed: 0,Country,CCI_code,Programme_Short_Title,programme_version,Fund-All,Fund-Sole,NUTS_(2010)_code,NUTS_title


In [23]:
df_lk.sample(10)

Unnamed: 0,Country,CCI,Programme short Title,Programme_long_Title,Programme version,Fund-All,Fund-sole,NUTS_code,NUTS_Title
5444,DE,2014DE16RFOP009,Nordrhein-Westfalen - ERDF,OP Nordrhein-Westfalen ERDF 2014-2020,2.1,ERDF,ERDF,DEA5,Arnsberg
14384,HU,2014HU05M2OP001,Human Resources Development - HU - ESF/ERDF,Human Resources Development Operational Programme,10.0,ERDF+ESF,ESF,HU311,Borsod-Abaúj-Zemplén
13430,TC,2014TC16RFCB002,Interreg V-A - Austria-Czech Republic,Interreg V-A - Austria-Czech Republic,3.0,ERDF,ERDF,CZ010,Hlavní město Praha
7741,TC,2014TC16RFTN003,Interreg V-B - Central Europe,Central Europe,2.1,ERDF,ERDF,SI,SLOVENIJA
17650,PL,2014PL16M1OP001,Infrastructure and Environment - PL - ERDF/CF,OP Infrastructure and Environment,17.0,ERDF+CF,ERDF,PL33,Świętokrzyskie
8765,FI,2014FI16M2OP001,Sustainable growth and jobs - FI - ERDF/ESF,Sustainable growth and jobs 2014-2020 - Struct...,6.0,ERDF+ESF,ESF,FI1D2,Pohjois-Savo
803,BE,2014BE05M9OP001,Wallonie-Bruxelles 2020.eu - ESF/YEI,ESF Operationnal Programme Wallonie-Bruxelles ...,10.0,ESF+YEI,YEI,BE1,RÉGION DE BRUXELLES-CAPITALE/BRUSSELS HOOFDSTE...
6211,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,Enterprise and Innovation for Competitiveness,9.0,ERDF,ERDF,CZ080,Moravskoslezský kraj
4870,GR,2014GR16M2OP001,Competitiveness Entrepreneurship and Innovatio...,"COMPETITIVENESS, ENTREPRENEURSHIP AND INNOVAT...",4.1,ERDF+ESF,ESF,EL24,Στερεά Ελλάδα (Sterea Ellada)
9793,DE,2014DE16M2OP001,Niedersachsen - ERDF/ESF,OP Niedersachsen ERDF/ESF 2014-2020,1.3,ERDF+ESF,ESF,DE916,Goslar


### Comparison between the two lookups for the same program
Both too detailed to be used directly with ABS, however useful for manual fixing

In [25]:
df_lk[df_lk["CCI"] == "2014CZ16RFOP001"]

Unnamed: 0,Country,CCI,Programme short Title,Programme_long_Title,Programme version,Fund-All,Fund-sole,NUTS_code,NUTS_Title
235,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,Enterprise and Innovation for Competitiveness,14.0,ERDF,ERDF,CZ064,Jihomoravský kraj
433,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,Enterprise and Innovation for Competitiveness,14.0,ERDF,ERDF,CZ053,Pardubický kraj
1197,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,Enterprise and Innovation for Competitiveness,14.0,ERDF,ERDF,CZ071,Olomoucký kraj
1232,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,Enterprise and Innovation for Competitiveness,14.0,ERDF,ERDF,CZ080,Moravskoslezský kraj
1349,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,Enterprise and Innovation for Competitiveness,14.0,ERDF,ERDF,CZ032,Plzeňský kraj
...,...,...,...,...,...,...,...,...,...
21485,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,Enterprise and Innovation for Competitiveness,12.0,ERDF,ERDF,CZ063,Kraj Vysočina
21527,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,Enterprise and Innovation for Competitiveness,4.0,ERDF,ERDF,CZ052,Královéhradecký kraj
21795,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,Enterprise and Innovation for Competitiveness,13.0,ERDF,ERDF,CZ042,Ústecký kraj
21897,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,Enterprise and Innovation for Competitiveness,2.0,ERDF,ERDF,CZ020,Středočeský kraj


In [26]:
df_lk2[df_lk2["CCI_code"] == "2014CZ16RFOP001"]

Unnamed: 0,Country,CCI_code,Programme_Short_Title,programme_version,Fund-All,Fund-Sole,NUTS_(2010)_code,NUTS_title
88,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,4.0,ERDF,ERDF,CZ06,Jihovýchod
89,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,4.0,ERDF,ERDF,CZ03,Jihozápad
90,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,4.0,ERDF,ERDF,CZ08,Moravskoslezsko
91,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,4.0,ERDF,ERDF,CZ05,Severovýchod
92,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,4.0,ERDF,ERDF,CZ04,Severozápad
93,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,4.0,ERDF,ERDF,CZ02,Střední Čechy
94,CZ,2014CZ16RFOP001,Enterprise and Innovation for Competitiveness ...,4.0,ERDF,ERDF,CZ07,Střední Morava
