## Company Information data combined with ECB portfolio data

<p>The revenue, industry classification and country of risk information data was merged with the portfolio data to gather all information in one file. The revenue was also converted into million dollars format as it would be required to calculate carbon footprint.</p>

### Libraries to import

In [1]:
import datetime as dt
import pandas as pd
import os
from functions_file_2 import clean_file,clean_file_rev,clean_file_industry,merge_revenue,merge_country,clean_country


In [2]:
directory = os.getcwd()
url = directory[:-7] + "2. Data/"

### Import Dataset

In [3]:
ecb_portfolio = pd.read_excel(url + "1. ECB Data/ECB_Portfolio_2017_2022.xlsx")

In [4]:
# revenue information
bonds_snp = pd.read_excel(url + "2. Company Information Data/Combined_Revenue_S&P.xlsx" )
mapping_revenue = pd.read_excel(url + "2. Company Information Data/3. Mapping/Mapping.xlsx",sheet_name = "Bonds_Revenue_Mapping",skiprows=4)


In [5]:
# industry classification information
sector_info = pd.read_excel(url + "2. Company Information Data/3. Mapping/Mapping.xlsx",sheet_name = "Bonds_SIC_Mapping",skiprows=4)


In [6]:
# country of risk
country_risk_data = pd.read_excel(url + "2. Company Information Data/Company's Country of Risk Info.xlsx")
mapping_country = pd.read_excel(url + "2. Company Information Data/3. Mapping/Mapping.xlsx",sheet_name = "Country_Mapping",skiprows=4)


### Portfolio with revenue

In [7]:
# clean the bonds data
bonds_snp = clean_file(bonds_snp)

In [8]:
# merge ecb portfolio with revenue data from S&P and mapped file
df = merge_revenue(ecb_portfolio,bonds_snp,mapping_revenue)

In [9]:
# Portfolio with total revenue
df_revenue = clean_file_rev(df)

#### Converting revenue to million format

In [10]:
df_revenue['TOTAL__REVENUE'] = (df_revenue['TOTAL__REVENUE'].apply(lambda x: x/1000)).round(2)
df_revenue.head()

Unnamed: 0,NCB,ISIN_CODE,ISSUER_NAME,MATURITY_DATE,COUPON_RATE,PUBLISHED_DATE,YEAR,MONTH,TOTAL__REVENUE
0,BE,BE0002178441,Delhaize Group S.A.,19/10/2018,4.25,2018-06-08,2018,6,71057.21
1,BE,BE0002189554,Delhaize Group S.A.,27/02/2020,3.125,2018-06-08,2018,6,71057.21
2,BE,BE0002239086,Elia System Operator S.A./N.V.,27/05/2024,1.375,2018-06-08,2018,6,0.0
3,BE,BE0002256254,RESA SA,22/07/2026,1.0,2018-06-08,2018,6,393.06
4,BE,BE0002269380,Cofinimmo S.A./N.V.,09/12/2024,2.0,2018-06-08,2018,6,297.38


### Portfolio with Industry Classification

In [11]:
sector_info = sector_info[['Unique_ISIN_Code','SIC_CODE','NACE_CODE','ECONOMIC_SECTOR']]


In [12]:
# merge with industry classification information
subset = ['NCB', 'ISIN_CODE', 'ISSUER_NAME', 'MATURITY_DATE', 'COUPON_RATE','PUBLISHED_DATE', 'YEAR', 'MONTH', 'TOTAL__REVENUE']
merge_sector = df_revenue.merge(sector_info,left_on = ['ISIN_CODE'],right_on= ['Unique_ISIN_Code'], how = "left").drop_duplicates(subset = subset)


In [13]:
# data includes additional information about SIC Code, NACE Code, Economic Sector
df_company_info = clean_file_industry(merge_sector)

In [14]:
df_company_info.head()

Unnamed: 0,NCB,ISIN_CODE,ISSUER_NAME,MATURITY_DATE,COUPON_RATE,PUBLISHED_DATE,YEAR,MONTH,TOTAL__REVENUE,SIC_CODE,NACE_CODE,ECONOMIC_SECTOR
0,BE,BE0002178441,Delhaize Group S.A.,19/10/2018,4.25,2018-06-08,2018,6,71057.21,5411,G,Beverages
1,BE,BE0002189554,Delhaize Group S.A.,27/02/2020,3.125,2018-06-08,2018,6,71057.21,5411,G,Beverages
2,BE,BE0002239086,Elia System Operator S.A./N.V.,27/05/2024,1.375,2018-06-08,2018,6,0.0,6726,K,Other sectors
3,BE,BE0002256254,RESA SA,22/07/2026,1.0,2018-06-08,2018,6,393.06,4931,D,Utilities
4,BE,BE0002269380,Cofinimmo S.A./N.V.,09/12/2024,2.0,2018-06-08,2018,6,297.38,6531,L,Real estate


#### Filtering out data with no revenue information

<p> Revenue information for year 2017 was not available for 155 individual companies, a total of 59461 rows (18% of the total data). These rows were removed from the database. A company named 'Delta Lloyd NV' had negative revenue was also removed.</p>

In [15]:
# no revenue information available
print(df_company_info[df_company_info['TOTAL__REVENUE'] <= 0]['ISSUER_NAME'].nunique())

# removing rows no revenue information
final_data_co_info = df_company_info[df_company_info['TOTAL__REVENUE'] > 0]

156


### Portfolio with country of risk

In [16]:
data = merge_country(final_data_co_info,country_risk_data,mapping_country)

In [17]:
df = clean_country(data)

### Classifying dates into quarters

In [18]:
data = df.copy()
data['QUARTER'] = df['PUBLISHED_DATE'].dt.to_period('Q')
data.head()

Unnamed: 0,NCB,ISIN_CODE,ISSUER_NAME,MATURITY_DATE,COUPON_RATE,PUBLISHED_DATE,YEAR,MONTH,TOTAL__REVENUE,SIC_CODE,NACE_CODE,ECONOMIC_SECTOR,COUNTRY_OF_RISK,QUARTER
0,BE,BE0002178441,Delhaize Group S.A.,19/10/2018,4.25,2018-06-08,2018,6,71057.21,5411,G,Beverages,Belgium,2018Q2
1,BE,BE0002189554,Delhaize Group S.A.,27/02/2020,3.125,2018-06-08,2018,6,71057.21,5411,G,Beverages,Belgium,2018Q2
2,BE,BE0002256254,RESA SA,22/07/2026,1.0,2018-06-08,2018,6,393.06,4931,D,Utilities,Belgium,2018Q2
3,BE,BE0002269380,Cofinimmo S.A./N.V.,09/12/2024,2.0,2018-06-08,2018,6,297.38,6531,L,Real estate,Belgium,2018Q2
4,BE,BE0002276450,Elia System Operator S.A./N.V.,07/04/2027,1.375,2018-06-08,2018,6,913.16,4911,D,Utilities,Belgium,2018Q2


### Download file

In [19]:
data.to_excel(directory[:-7] + "2. Data/ECB Portfolio - Company Information.xlsx",index=None)