# Comparison of the uncertified Italian expenditure database against the European Commission payment database

In this notebook the figures for the _ERDF_ funding scheme over the programming period _2007-2013_ are compared for Italy across the member-state and the commission database. The aim is to assess the magnitude of the uncertified expenditures and which funded programmes are the most affected. 

Let's start off by importing all the relevant libraries and report the system requirements.

In [None]:
import matplotlib
import matplotlib.pyplot as plt
import pkg_resources
import pandas as pd
import numpy as np
import sobol_seq
import time
import types

def get_imports():
    for name, val in globals().items():
        if isinstance(val, types.ModuleType):
            
            name = val.__name__.split(".")[0]

        elif isinstance(val, type):
            name = val.__module__.split(".")[0]
            
        poorly_named_packages = {
            "PIL": "Pillow",
            "sklearn": "scikit-learn"
        }
        if name in poorly_named_packages.keys():
            name = poorly_named_packages[name]

        yield name
imports = list(set(get_imports()))

requirements = []
for m in pkg_resources.working_set:
    if m.project_name in imports and m.project_name!="pip":
        requirements.append((m.project_name, m.version))

for r in requirements:
    print("{}=={}".format(*r))

The databases can then be imported and the relevant figures singled out to ease the comparision.

In [None]:
df = pd.read_excel('20181231 Pagamenti ammessi PO 2007-2013.xls',usecols=[0,1,2,3,4,5,7])
df_IT = df[(df['CCI'].str.contains("161"))|df['CCI'].str.contains("162")].groupby('CCI').PAGAMENTO_AMMESSO_UE.sum()

df_EC = pd.read_excel('2007-13 categorisation cohesion policy FIR.xlsx',sheet_name='Details')
old = pd.read_excel('Database_Final_UPD(3).xlsx')

df_EC_IT = df_EC[(df_EC['Country Cd']=='IT')&(df_EC.DG=='REGIO')].groupby('Cci')['AR Community Amount in M. €'].sum()*1e6
old_IT = old.ERDF_TOTAL[(old.ProgrammingPeriod=='2007-2013')&(old.Country=='IT')].sum()

The differences can be finally assessed across programmes (indetified by their CCI code) and the most relevant trends can be visually identified.

In [None]:
diff = df_EC_IT-df_IT
diff = pd.concat([diff,(diff*100/df_EC_IT).round(1)],axis=1)

diff['colors'] = ['red' if x < 0 else 'blue' for x in diff[0]]

# Draw plot
plt.figure(figsize=(14,10), dpi= 80)
plt.hlines(y=diff.index, xmin=0, xmax=diff[0], color=diff.colors, alpha=0.4, linewidth=5)

# Decorations
plt.gca().set(ylabel='$Programme$', xlabel='$Difference - €$')
plt.yticks(diff.index, fontsize=12)
plt.grid(linestyle='--', alpha=0.5)
plt.show()

In [None]:
# Draw plot
plt.figure(figsize=(14,10), dpi= 80)
plt.hlines(y=diff.index, xmin=0, xmax=diff[1], color=diff.colors, alpha=0.4, linewidth=5)

# Decorations
plt.gca().set(ylabel='$Programme$', xlabel='$Difference$')
plt.yticks(diff.index, fontsize=12)
plt.grid(linestyle='--', alpha=0.5)
plt.show()