## Datasets used for creating a network

<p>Four datasets are created:
      a) Relationship data: ECB buys bonds;
      b) Company nodes data: consists unique names of companies along with industry classification;
      c) Sector nodes data: consists unique sector names;
      d) Country of risk nodes data: consists unique names of country of risk.</p>

### Libraries to import

In [1]:
import pandas as pd
import os

In [2]:
directory = os.getcwd()
path = directory[:-7] + '2. Data/5. Network Analysis - Data/Data/'

### Import dataset

In [3]:
portfolio = pd.read_excel(path + 'ECB Portfolio - Company Information.xlsx')

### a) Relationship data

In [4]:
# functions
def create_relations(data):
    df = data[['ISIN_CODE','QUARTER','ISSUER_NAME']]
    rel = df.groupby(['ISIN_CODE','QUARTER'])['ISSUER_NAME'].count()
    rel = rel.reset_index()
    rel['from'] = 'ECB'
    rel['type'] = 'Buys'
    rel = rel.rename(columns = {'ISSUER_NAME':'number_of_issues'})
    return rel

def clean_data(file):
    columns_to_keep = ['ISIN_CODE','ISSUER_NAME']
    company_name = file[columns_to_keep].drop_duplicates()
    company_name['ISSUER_NAME']=company_name['ISSUER_NAME'].astype(str)
    df = company_name.groupby('ISIN_CODE')['ISSUER_NAME'].apply(';'.join).reset_index()
    df[['ISSUER_NAME_1','ISSUER_NAME_2']]= df['ISSUER_NAME'].str.split(';', expand=True)
    df = df.drop(columns = 'ISSUER_NAME',axis = 1)
    df = df.fillna('-')
    df2 = file[['ISIN_CODE','SECTOR','NACE_CODE','COUNTRY_OF_RISK','COUPON_RATE','MATURITY_DATE']].drop_duplicates()
    isin_info =df.merge(df2,on='ISIN_CODE',how='left').drop_duplicates(subset = df.columns)
    return isin_info
    

In [5]:
relation = create_relations(portfolio)
isin_info_data = clean_data(portfolio)

In [6]:
relationship_data = relation.merge(isin_info_data,on ='ISIN_CODE',how = "left").drop_duplicates()

In [7]:
relationship_data.head()

Unnamed: 0,ISIN_CODE,QUARTER,number_of_issues,from,type,ISSUER_NAME_1,ISSUER_NAME_2,SECTOR,NACE_CODE,COUNTRY_OF_RISK,COUPON_RATE,MATURITY_DATE
0,AT0000A0KSM6,2017Q2,2,ECB,Buys,NOVOMATIC AG,-,Financial And Insurance Activities,K,Austria,5.0,27/10/2017
1,AT0000A0KSM6,2017Q3,13,ECB,Buys,NOVOMATIC AG,-,Financial And Insurance Activities,K,Austria,5.0,27/10/2017
2,AT0000A0KSM6,2017Q4,3,ECB,Buys,NOVOMATIC AG,-,Financial And Insurance Activities,K,Austria,5.0,27/10/2017
3,AT0000A0PHV9,2017Q2,2,ECB,Buys,STRABAG SE,-,Construction,F,Austria,4.75,25/05/2018
4,AT0000A0PHV9,2017Q3,13,ECB,Buys,STRABAG SE,-,Construction,F,Austria,4.75,25/05/2018


### b) Company nodes data

In [8]:
columns = ['ISSUER_NAME','NACE_CODE','SECTOR','COUNTRY_OF_RISK']
company = portfolio[columns]
company = company.drop_duplicates(subset = columns)
company.head()

Unnamed: 0,ISSUER_NAME,NACE_CODE,SECTOR,COUNTRY_OF_RISK
0,Delhaize Group S.A.,G,Wholesale And Retail Trade; Repair Of Motor Ve...,Belgium
2,RESA SA,D,"Electricity, Gas, Steam And Air Conditioning S...",Belgium
3,Cofinimmo S.A./N.V.,L,Real Estate Activities,Belgium
4,Elia Transmission Belgium,D,"Electricity, Gas, Steam And Air Conditioning S...",Belgium
5,Fluvius System Operator CVBA,D,"Electricity, Gas, Steam And Air Conditioning S...",Belgium


### c) Sector nodes data

In [9]:
columns = ['NACE_CODE','SECTOR']
sector = portfolio[columns]
sector = sector.drop_duplicates(subset = columns)
sector.head()

Unnamed: 0,NACE_CODE,SECTOR
0,G,Wholesale And Retail Trade; Repair Of Motor Ve...
2,D,"Electricity, Gas, Steam And Air Conditioning S..."
3,L,Real Estate Activities
15,K,Financial And Insurance Activities
17,C,Manufacturing


### d) Country of risk nodes data

In [10]:
country = portfolio[['COUNTRY_OF_RISK']]
country = country.drop_duplicates(subset = ['COUNTRY_OF_RISK'])
country.head()

Unnamed: 0,COUNTRY_OF_RISK
0,Belgium
42,Netherlands
48,Portugal
56,Luxembourg
86,Slovakia


### Download files

In [12]:
relationship_data.to_csv(path + 'relationship_2.csv',index = False)
company.to_csv(path + 'company_node.csv',index=False)
sector.to_csv(path + 'sector_node.csv',index=False)
country.to_csv(path + 'country_node.csv',index=False)