Skip to content

ESGeasy/Analysis

Repository files navigation

ESGeasy Analysis

Table of Contents

About

It is the first analysis of the data available. Here we explore 3 datasets:

  • esg_scores_history_rated.csv
  • companies_all.csv
  • environmental_data_history_all.csv

The problem to be solved: Which company best fits with my values?

We are mainly interested in solving the investor/company matching problem. To do so, we considered 2 approches, which will be clear later: one with clustering approch, and another with ESG forecasting.

Requirements

First of all, clone the repository

$ git clone <https://github.com/Hackganization/

In order to run the notebook, you must install jupyter notebook, which can be done in this link: https://www.anaconda.com/products/individual

Once the software is downloaded, be sure that you have matplotlib, numpy, pandas, sklearn and statsmodels installed.

To do so, simply write

$conda install package

for each of the above libraries.

How to use

Now that you're have the dependencies, open the notebook writing the command

$jupyter notebook 

and click above the desired notebook. Now you can click at cell button at the top of the notebook and run all cells

Data Exploration

Let's start looking at the companies_all.csv. It contains data related to each company, such as Id, country, region, industry segment and more: alt text

Next, let's see the frequencies of each industry segment: alt text

We found that the frequencies of companies at the financial markets are very imbalanced, which could possibly generate a bias if some prediction model were used. The first idea we had was to make a cluster of companies, but we changed to time series forecasting, which reason will become clear later.

Now, we will look at ESG data at esg_scores_history_rated.csv: alt text

It contains data like the company_id, industry_segment, assessment_year, parent_aspect, score_weight and score_value. We will use the parent_aspect in order to aggregate the data to obtain the score for each ESG dimension, but first let's look how these scores are distributed: alt text

It seems like the distributions are similar, but for environmental dimension the median is significantly lower than the others. It could be an indicative that companies still are not very worried about environmental issues.

Finally, we are going to build a model to predict the next ESG given the historical data. It can be useful if an investor is looking for a prediction of the ESG performance in the next years. To do so, we used SimpleExpSmoothing model, from the statsmodel library. It is a quite simple and fast model, so it is easy scalable for huge datasets. For more about the theory, see: Time series forecasting: principles and practices. Once these future ESG values are predicted, including the global score, it is used to provide a ranking of the companies to the potential investor, given his(her) preferences. The final step is aggregate both ESG predicted scores and company details to serve for the application, resulting in the following Amostra_das_empresas.csv dataset: alt text

It is important to note that since it is a proof of concept, only a sample considering 4 industry segments with 5 companies each.

Technologies

  • Python
  • Docker
  • Pandas
  • Flask
  • Statsmodels

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published