<a href="https://colab.research.google.com/github/PSarre/CSRD_Unusual_Disclosures/blob/main/Research_00_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 00 - ANALYSES FOR RESEARCH PRESENTATIONS

This file provides several analyses for the presentations or data insights from the Automated Data Collection File.

## Content of file
* Merging Automated Data Collection File with Compustat and RepRisk Datasets. *We named the dataframe merged_df.*
* Creation of variables. *Download merged_df in our drive as Merged_Data.*
* Dataset analyses
* OLS regressions


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import numpy as np
from google.colab import drive
drive.mount('/content/drive',  force_remount=True)

Mounted at /content/drive


In [None]:
# Import Automated Data Collection file
excel_path = '/content/drive/MyDrive/Master Thesis/01 - Data Collection/Automated_Data_Collection.xlsx'

# Import Compustat data
compustat_path = '/content/drive/MyDrive/Master Thesis/01 - Data Collection/Datasets/Compustat_full2024_2023.csv'
eu_compustat_path = '/content/drive/MyDrive/Master Thesis/01 - Data Collection/Datasets/EU_comp.csv'

# Import Reprisk Incidents
reprisk_incidents_path = '/content/drive/MyDrive/Master Thesis/01 - Data Collection/Datasets/RepRisk_Incidents.csv'

# Import Reprisk Ratings
reprisk_ratings_path = '/content/drive/MyDrive/Master Thesis/01 - Data Collection/Datasets/RepRisk_Ratings.csv'

#Open excels
print('DATASETS \n ----------------------------------------------------------------')

print('Automated Data Collection')
df = pd.read_excel(excel_path)
print(df.info())

print('--------------------------------------------------------------------------------')
print('--------------------------------------------------------------------------------')

print('Compustat 2024')
compustat = pd.read_csv(compustat_path)
print(compustat.info())

print('--------------------------------------------------------------------------------')
print('--------------------------------------------------------------------------------')

print('EU Compustat')
eu_compustat = pd.read_csv(eu_compustat_path)
eu_compustat.head()

print('--------------------------------------------------------------------------------')
print('--------------------------------------------------------------------------------')

print('RepRisk Incidents')
reprisk_incidents = pd.read_csv(reprisk_incidents_path)
print(reprisk_incidents.info())

print('--------------------------------------------------------------------------------')
print('--------------------------------------------------------------------------------')

print('RepRisk Ratings')
reprisk_ratings = pd.read_csv(reprisk_ratings_path)
print(reprisk_ratings.info())

print('--------------------------------------------------------------------------------')
print('--------------------------------------------------------------------------------')

DATASETS 
 ----------------------------------------------------------------
Automated Data Collection
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 631 entries, 0 to 630
Data columns (total 18 columns):
 #   Column                             Non-Null Count  Dtype 
---  ------                             --------------  ----- 
 0   verified                           631 non-null    object
 1   company                            631 non-null    object
 2   isin                               631 non-null    object
 3   country                            631 non-null    object
 4   publication date                   631 non-null    object
 5   claim full CSRD compliance         631 non-null    int64 
 6   auditor                            631 non-null    object
 7   start PDF                          631 non-null    int64 
 8   end PDF                            631 non-null    int64 
 9   pages PDF                          631 non-null    int64 
 10  link                            

  eu_compustat = pd.read_csv(eu_compustat_path)


--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
RepRisk Incidents
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4173 entries, 0 to 4172
Columns: 135 entries, primary_isin to ungc_principle_10
dtypes: int64(6), object(129)
memory usage: 4.3+ MB
None
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
RepRisk Ratings
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 208823 entries, 0 to 208822
Data columns (total 7 columns):
 #   Column                        Non-Null Count   Dtype 
---  ------                        --------------   ----- 
 0   primary_isin                  208823 non-null  object
 1   date                          208823 non-null  object
 2   company_name                  208823 non-null  object
 3   headquarters_country_isocode  208823 non-null

In [None]:
# Rename columns with problematic names
df = df.rename(columns={'SASB industry \n(SICS® Industries)': 'SASB_industry'})
df = df.rename(columns={'SASB sector \n(SICS® Sectors)': 'SASB_sector'})

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 631 entries, 0 to 630
Data columns (total 18 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   verified                    631 non-null    object
 1   company                     631 non-null    object
 2   isin                        631 non-null    object
 3   country                     631 non-null    object
 4   publication date            631 non-null    object
 5   claim full CSRD compliance  631 non-null    int64 
 6   auditor                     631 non-null    object
 7   start PDF                   631 non-null    int64 
 8   end PDF                     631 non-null    int64 
 9   pages PDF                   631 non-null    int64 
 10  link                        631 non-null    object
 11  SASB_industry               631 non-null    object
 12  SASB_sector                 631 non-null    object
 13  ID                          631 non-null    object

## Merging Compustat and RepRisks to Automated Data Collection in merged_df

### 1 -  Cleaning Compustat datasets and merge Compustat and EU_compustat

* Create year column
* Delete duplicates for same year/ companies and keep only the latest line
* Filter on 2024
* Merge datasets as compustat_2024
* Delete duplicates

In [None]:
# Creates 2024 datasets without duplicates

def process_compustat_data(df_input, isin_col='isin', datadate_col='datadate', filter_year=2024):
    df = df_input.copy()

    # Convert 'datadate' to datetime, handling various date formats
    df[datadate_col] = pd.to_datetime(df[datadate_col], errors='coerce')

    # Drop rows where 'datadate' could not be parsed
    df.dropna(subset=[datadate_col], inplace=True)

    # Create 'Fiscal_Year' based on the year of 'datadate'
    df['Fiscal_Year'] = df[datadate_col].dt.year

    # Create 'Company_Year' by concatenating isin and Fiscal_Year
    df['Company_Year'] = df[isin_col].astype(str) + '_' + df['Fiscal_Year'].astype(str)

    # Sort by 'Company_Year' and 'datadate' to keep the latest record for each 'Company_Year'
    df.sort_values(by=['Company_Year', datadate_col], ascending=[True, False], inplace=True)

    # Drop duplicates based on 'Company_Year', keeping the first (which is the latest datadate after sorting)
    initial_rows = len(df)
    df.drop_duplicates(subset=['Company_Year'], keep='first', inplace=True)
    final_rows = len(df)

    # Filter for the specified year
    observations_filtered_year = df[df['Fiscal_Year'] == filter_year]
    obs_filtered_year_size = len(observations_filtered_year)

    print(f"Initial number of rows: {initial_rows}")
    print(f"Number of rows after dropping duplicates based on Company_Year: {final_rows}")
    print(f"Number of observations in {filter_year}: {obs_filtered_year_size}")

    return observations_filtered_year


print('Processing Compustat_full2024_2023.csv:')
compustat_2024 = process_compustat_data(compustat.copy(), isin_col='isin', datadate_col='datadate', filter_year=2024)

print('\nProcessing EU_comp.csv:')
eu_compustat_2024 = process_compustat_data(eu_compustat.copy(), isin_col='isin', datadate_col='datadate', filter_year=2024)


Processing Compustat_full2024_2023.csv:
Initial number of rows: 1196
Number of rows after dropping duplicates based on Company_Year: 1196
Number of observations in 2024: 597

Processing EU_comp.csv:
Initial number of rows: 2701
Number of rows after dropping duplicates based on Company_Year: 2701
Number of observations in 2024: 539


  df[datadate_col] = pd.to_datetime(df[datadate_col], errors='coerce')


In [None]:
# Merging compustat_2024 and eu_compustat_2024

# Get initial number of rows in compustat_2024
initial_compustat_rows = len(compustat_2024)

# Concatenate the two dataframes
# Use columns from compustat_2024 to avoid issues with differing columns if any
merged_compustat = pd.concat([compustat_2024, eu_compustat_2024], ignore_index=True)

# Drop duplicates based on 'Company_Year' to ensure only unique company-year observations remain
# This assumes 'Company_Year' is a unique identifier for a company in a given year
merged_compustat.drop_duplicates(subset=['Company_Year'], keep='first', inplace=True)

# Calculate the number of new rows added
new_rows_added = len(merged_compustat) - initial_compustat_rows

# Identify the new rows added by finding rows in `merged_compustat` that are not in `compustat_2024`
# We can do this by using a merge operation or by checking `Company_Year` values
# For simplicity and to avoid potential issues with NaN values in other columns, let's compare 'Company_Year'
existing_company_years = compustat_2024['Company_Year'].unique()
newly_added_df = merged_compustat[~merged_compustat['Company_Year'].isin(existing_company_years)]

# Update compustat_2024 with the merged dataframe
compustat_2024 = merged_compustat

print(f"Number of new lines added from eu_compustat_2024: {new_rows_added}")

if new_rows_added > 0:
    print("Newly added lines:")
    newly_added_df.head()
else:
    print("No new lines were added.")
compustat_2024.head()


Number of new lines added from eu_compustat_2024: 6
Newly added lines:


Unnamed: 0,fic,costat,datafmt,indfmt,consol,isin,datadate,conm,exchg,fyr,...,icapi,naicsh,nicon,ninc,pv,sich,tstkni,Fiscal_Year,Company_Year,sedol
0,AUT,A,HIST_STD,INDL,C,AT000000STR1,2024-12-31,STRABAG SE,273,12,...,,237.0,823.004,,,1600.0,2.779,2024,AT000000STR1_2024,
1,AUT,A,HIST_STD,INDL,C,AT00000AMAG3,2024-12-31,AMAG AUSTRIA METALL AG,273,12,...,,331313.0,43.199,,,3334.0,,2024,AT00000AMAG3_2024,
2,AUT,A,HIST_STD,INDL,C,AT00000FACC2,2024-12-31,FACC AG,273,12,...,,3364.0,6.355,,,3720.0,,2024,AT00000FACC2_2024,
3,AUT,A,HIST_STD,FS,C,AT0000606306,2024-12-31,RAIFFEISEN BANK INTERNATI AG,273,12,...,,522110.0,1720.0,,,6020.0,0.525,2024,AT0000606306_2024,
4,AUT,A,HIST_STD,INDL,C,AT0000609607,2024-12-31,PORR AG,273,12,...,,238.0,88.995,,,1700.0,1.227,2024,AT0000609607_2024,


### 2 - Cleaning RepRisk datasets

Goal:
* Create a common reprisk dataframe with:
* From RepRisk Incident the number of accident in each category and in total for 2024 data
* From RepRisk ratings: the rating of the company in 2024, if there has been a change of rating within the year and if yes from what score to what score

In [None]:
# Create 2024 datasets

# RepRisk Incidents
print('RepRisk Incidents:')
unique_primary_isin_reprisk_incidents = reprisk_incidents['primary_isin'].nunique()
reprisk_incidents['incident_date'] = pd.to_datetime(reprisk_incidents['incident_date'], errors='coerce')
reprisk_incidents.dropna(subset=['incident_date'], inplace=True)
reprisk_incidents['incident_year'] = reprisk_incidents['incident_date'].dt.year
reprisk_incidents_2024 = reprisk_incidents[reprisk_incidents['incident_year'] == 2024].copy()
print(f"Number of RepRisk incidents in 2024: {len(reprisk_incidents_2024)}")
print(f"Number of unique ISINs in reprisk_incidents: {unique_primary_isin_reprisk_incidents}")
unique_isin_reprisk_incidents_2024 = reprisk_incidents_2024['primary_isin'].nunique()
print(f"Number of unique ISINs in reprisk_incidents_2024: {unique_isin_reprisk_incidents_2024}")

print('--------------------------------------------------------------------------------')
print('--------------------------------------------------------------------------------')

# RepRisk Ratings
print('RepRisk Ratings:')
unique_primary_isin_reprisk_ratings = reprisk_ratings['primary_isin'].nunique()
print(f"Number of unique ISINs in reprisk_ratings: {unique_primary_isin_reprisk_ratings}")
reprisk_ratings['date'] = pd.to_datetime(reprisk_ratings['date'], errors='coerce')
reprisk_ratings.dropna(subset=['date'], inplace=True)
reprisk_ratings['rating_year'] = reprisk_ratings['date'].dt.year
reprisk_ratings_2024 = reprisk_ratings[reprisk_ratings['rating_year'] == 2024].copy()
print(f"Number of RepRisk ratings in 2024: {len(reprisk_ratings_2024)}")
unique_isin_reprisk_ratings_2024 = reprisk_ratings_2024['primary_isin'].nunique()
print(f"Number of unique ISINs in reprisk_ratings_2024: {unique_isin_reprisk_ratings_2024}")


RepRisk Incidents:
Number of RepRisk incidents in 2024: 4164
Number of unique ISINs in reprisk_incidents: 338
Number of unique ISINs in reprisk_incidents_2024: 338
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
RepRisk Ratings:
Number of unique ISINs in reprisk_ratings: 569
Number of RepRisk ratings in 2024: 208254
Number of unique ISINs in reprisk_ratings_2024: 569


In [None]:
# Processing RepRisk Incidents

#Rename primary_isin into isin
reprisk_incidents_2024.rename(columns={'primary_isin': 'isin'}, inplace=True)

#Count number of incidents by type
environment_col_index = reprisk_incidents_2024.columns.get_loc('environment')
incident_columns = reprisk_incidents_2024.columns[environment_col_index:].tolist()
print(f"Identified incident columns: {incident_columns[:5]}...") # Print first 5 for brevity
print(f"Total incident columns: {len(incident_columns)}")

# Create a dataset with for each isin the count of incidents
reprisk_incidents_2024_count = reprisk_incidents_2024.groupby('isin')[incident_columns].apply(lambda x: (x == 'T').sum())
print("Shape of the new DataFrame (reprisk_incidents_2024_count):")
print(reprisk_incidents_2024_count.shape)

Identified incident columns: ['environment', 'social', 'governance', 'cross_cutting', 'animal_mistreatment']...
Total incident columns: 124
Shape of the new DataFrame (reprisk_incidents_2024_count):
(338, 124)


In [None]:
# Processing RepRisk_Ratings

# Ensure the dataframe is sorted by primary_isin and date to easily get first and last ratings
reprisk_ratings_2024 = reprisk_ratings_2024.sort_values(by=['primary_isin', 'date']).copy()

# Group by primary_isin and aggregate the required information
reprisk_ratings_2024_count = reprisk_ratings_2024.groupby('primary_isin').agg(
    initial_rating=('reprisk_rating', 'first'),
    EOY_rating=('reprisk_rating', 'last'),
    first_date=('date', 'first'),
    last_date=('date', 'last')
).reset_index()

# Determine if rating changed within the year
reprisk_ratings_2024_count['rating_changed'] = reprisk_ratings_2024_count['initial_rating'] != reprisk_ratings_2024_count['EOY_rating']

# Rename 'primary_isin' to 'isin' for consistency
reprisk_ratings_2024_count.rename(columns={'primary_isin': 'isin'}, inplace=True)

print("reprisk_ratings_2024_count head:")
print(f"Shape of reprisk_ratings_2024_count: {reprisk_ratings_2024_count.shape}")

reprisk_ratings_2024_count.head()

reprisk_ratings_2024_count head:
Shape of reprisk_ratings_2024_count: (569, 6)


Unnamed: 0,isin,initial_rating,EOY_rating,first_date,last_date,rating_changed
0,AT000000STR1,A,A,2024-01-01,2024-12-31,False
1,AT00000AMAG3,AA,AA,2024-01-01,2024-12-31,False
2,AT00000FACC2,AA,AA,2024-01-01,2024-12-31,False
3,AT0000606306,CCC,B,2024-01-01,2024-12-31,True
4,AT0000609607,BBB,AA,2024-01-01,2024-12-31,True


In [None]:
# Assign numeric values to ratings

# Define the mapping for RepRisk ratings to a numerical scale
rating_map = {
    'AAA': 10,
    'AA': 9,
    'A': 8,
    'BBB': 7,
    'BB': 6,
    'B': 5,
    'CCC': 4,
    'CC': 3,
    'C': 2,
    'D': 1
}

# Create new columns with the numerical equivalent of the ratings
reprisk_ratings_2024_count['initial_rating_numeric'] = reprisk_ratings_2024_count['initial_rating'].map(rating_map)
reprisk_ratings_2024_count['EOY_rating_numeric'] = reprisk_ratings_2024_count['EOY_rating'].map(rating_map)

# Create a column for the rating change amount
reprisk_ratings_2024_count['rating_changed_amount'] = reprisk_ratings_2024_count['EOY_rating_numeric'] - reprisk_ratings_2024_count['initial_rating_numeric']

print("Updated reprisk_ratings_2024_count head with numeric ratings and change amount:")
reprisk_ratings_2024_count.head(3)

Updated reprisk_ratings_2024_count head with numeric ratings and change amount:


Unnamed: 0,isin,initial_rating,EOY_rating,first_date,last_date,rating_changed,initial_rating_numeric,EOY_rating_numeric,rating_changed_amount
0,AT000000STR1,A,A,2024-01-01,2024-12-31,False,8,8,0
1,AT00000AMAG3,AA,AA,2024-01-01,2024-12-31,False,9,9,0
2,AT00000FACC2,AA,AA,2024-01-01,2024-12-31,False,9,9,0


### 3 - Merge all datasets

In [None]:
# Start with the main dataframe 'df'
merged_df = df.copy()
print(f"Shape of the main DataFrame (df): {merged_df.shape}")

# Merge with compustat_2024 on 'isin'
# Use a left merge to keep all entries from df
merged_df = pd.merge(merged_df, compustat_2024, on='isin', how='left')
print(f"Shape after merging with compustat_2024: {merged_df.shape}")

# Merge with reprisk_incidents_2024_count on 'isin'
merged_df = pd.merge(merged_df, reprisk_incidents_2024_count, on='isin', how='left')
print(f"Shape after merging with reprisk_incidents_2024_count: {merged_df.shape}")

# Merge with reprisk_ratings_2024_count on 'isin'
merged_df = pd.merge(merged_df, reprisk_ratings_2024_count, on='isin', how='left')
print(f"Shape after merging with reprisk_ratings_2024_count: {merged_df.shape}")

# Filter merged_df to keep only rows where 'CSRD_Collection' is 'Downloaded'
initial_rows_count = len(merged_df)
merged_df_analysis = merged_df[merged_df['CSRD_Collection'] == 'Downloaded'].copy()
filtered_rows_count = len(merged_df_analysis)


print(f"\nInitial number of rows: {initial_rows_count}")
print(f"Number of rows after filtering for 'CSRD_Collection' == 'Downloaded': {filtered_rows_count}")
print("\nMerged DataFrame (merged_df_analysis) info:")
merged_df_analysis.info()

print("\nMerged DataFrame (merged_df_analysis) head:")
merged_df_analysis.head()

Shape of the main DataFrame (df): (631, 18)
Shape after merging with compustat_2024: (631, 492)
Shape after merging with reprisk_incidents_2024_count: (631, 616)
Shape after merging with reprisk_ratings_2024_count: (631, 624)

Initial number of rows: 631
Number of rows after filtering for 'CSRD_Collection' == 'Downloaded': 519

Merged DataFrame (merged_df_analysis) info:
<class 'pandas.core.frame.DataFrame'>
Index: 519 entries, 0 to 628
Columns: 624 entries, verified to rating_changed_amount
dtypes: datetime64[ns](3), float64(567), int64(4), object(50)
memory usage: 2.5+ MB

Merged DataFrame (merged_df_analysis) head:


Unnamed: 0,verified,company,isin,country,publication date,claim full CSRD compliance,auditor,start PDF,end PDF,pages PDF,...,ungc_principle_10,incident_year,initial_rating,EOY_rating,first_date,last_date,rating_changed,initial_rating_numeric,EOY_rating_numeric,rating_changed_amount
0,yes,Netcompany,DK0060952919,Denmark,2025-01-29,1,EY,62,154,93,...,0.0,0.0,AAA,A,2024-01-01,2024-12-31,True,10.0,8.0,-2.0
1,yes,Tryg,DK0060636678,Denmark,2025-01-23,1,PwC,53,131,79,...,0.0,0.0,AA,AA,2024-01-01,2024-12-31,False,9.0,9.0,0.0
3,yes,Lundbeck,DK0061804697,Denmark,2025-02-04,1,PwC,59,144,86,...,,,,,NaT,NaT,,,,
4,yes,Vestas,DK0061539921,Denmark,2025-02-05,1,Deloitte,51,132,82,...,,,,,NaT,NaT,,,,
5,yes,Demant,DK0060738599,Denmark,2025-02-05,1,PwC,52,117,66,...,,,A,A,2024-01-01,2024-12-31,False,8.0,8.0,0.0


### 4 - Merge with CRSP-Compustat from the GVKEYs

In [None]:
# Creating list of gvkeys

gvkey = merged_df[merged_df['CSRD_Collection'] == 'Downloaded'].copy()
gvkey = gvkey['gvkey'].unique()
# Remove NaN values from the gvkey array before saving
gvkey = gvkey[~np.isnan(gvkey)]
print(gvkey)

gvkey_path = '/content/drive/MyDrive/Master Thesis/01 - Data Collection/Datasets/gvkey_list.txt'

# Save the gvkey list to a text file
np.savetxt(gvkey_path, gvkey, fmt='%d')

[327374. 274282. 225094. 204544. 295526.   8020. 213403. 211463. 273052.
 101539. 232646. 250951. 101130.  15552. 101922. 104761. 245207. 101020.
  12383. 296091. 101023. 326804.  15181. 234356. 208821. 225597. 101739.
 101557. 319309. 274957. 274373. 325660. 100778. 101178. 297224. 220833.
  15319. 102947. 220525. 214881.  24578. 101204. 226803. 221616. 320448.
 241336. 213118. 272817.  23667. 325699. 215406. 108147. 101363. 100751.
 112116. 313972. 254661. 234087. 101971. 272746. 211509. 100054.  25466.
  14140. 287932. 102477. 101529. 319659. 241637. 294508.  29789. 101828.
 221244.  61214. 295421. 241456.  15617. 247558. 100103.  13683. 321698.
 101718. 211452.  61440.  15505. 101276. 104755. 100737.  16349. 351491.
 320764. 318434. 327401. 100312. 101361. 100368. 101017. 100736. 101248.
 101714. 100022. 211453. 260840.  15784. 103260. 211415. 245628. 205865.
  17452. 221261.  15647. 102175. 100609. 347141.  15773. 220942. 315682.
 278145. 101434. 232630. 284521.  63120. 211503. 22

!!! At this stage, you need to take the list of gvkey and download the Compustat-CRSP fundamental annuals from Wrds !!!

In [None]:
# Import Compustat-CRSP fundamentals

CRSP_Compustat_path = '/content/drive/MyDrive/Master Thesis/01 - Data Collection/Datasets/CRSP_Compustat.csv'
CRSP_Compustat = pd.read_csv(CRSP_Compustat_path)
CRSP_Compustat.info()

gvkey = CRSP_Compustat['GVKEY'].unique()
gvkey

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Columns: 988 entries, GVKEY to ipodate
dtypes: float64(920), int64(28), object(40)
memory usage: 401.5+ KB


array([  8020,  10846,  12384,  13683,  14140,  15181,  15617,  24625,
        25466,  31142,  61214,  61616, 101204, 101276, 101973, 151933,
       212782, 220546, 220940, 238616, 241637, 245207, 272872, 277812,
       318005, 322576])

In [None]:
# Creates 2024 datasets without duplicates

print('Processing Compustat_full2024_2023.csv:')
CRSP_compustat_2024 = process_compustat_data(CRSP_compustat.copy(), isin_col='isin', datadate_col='datadate', filter_year=2024)

Processing Compustat_full2024_2023.csv:


NameError: name 'CRSP_compustat' is not defined

## VARIABLES CREATION

Creating the variables for later OLS based on Rouen et al. (2022) *The evolution of ESG Reports and the Role of Voluntary Standards*

link: https://www.ssrn.com/abstract=4227934

List of Variables

* Firm Size - ln(Market Value) using MKVALT or CSHO×PRCC_F

* Market-to-Book - MKVALT / CEQ

* ROE - NI / Average CEQ

* R&D Intensity - XRD / SALE

* Capex Intensity - CAPX / PPENT

* Leverage - DLTT / AT

* SG&A Intensity - XSGA / SALE

* Advertising Intensity - XAD / SALE

* ESG Score - Letter Grade from RepRisk

* Incidents - Count from RepRisk



NOTE TO MYSELF FOR NEXT STEPS:
* **I probably need to update RepRisk datasets, I feel like I only downloaded the isin I had back in July**
1. Create the variables
2. Dataset Analyses
3. Econometric recap table
4. Prepare the OLS


In [None]:
# Creating the variables

#Firm Size
merged_df['Firm_Size'] = np.log(merged_df['MKVALT'])