<a href="https://colab.research.google.com/github/2series/rockwall_analytics/blob/master/PredictBusinessDistress_s8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Predict Corporate Financial Distress Using Machine Learning Models
## June 17, 2020 
## About RIHAD VARIAWA
> As a Data Scientist and former head of global fintech research at Malastare.ai, I find fulfillment tacking challenges to solve complex problems using data

![](https://media.giphy.com/media/3o7TKSx0g7RqRniGFG/giphy.gif)

# Overview
**Financial distress** is the condition wherein a firm cannot generate revenue and meet its financial obligations to its creditors or lenders. If financial distress cannot be relieved, it may lead to bankruptcy, which in turn leads to liquidation or reorganization of the firm. 

> *There can be many internal and external reasons for corporate financial distress*

Budgeting is a crucial factor for an organization to make sure that they are not going to incur losses and are financially safe and secure. Constant losses and the inability to break-even indicate that a firm cannot sustain from its internal funds and hence, raising capital externally will be the only survival option. Identifying these reasons in early stages will enable firms to draft remedies and strategies to turn things around

The ability to predict financial distress is of utmost importance for the firms. Predicting the difficulties in liquidation and the consequent financial distress will help them increase their potential, maintain/increase their current investors, and maximize their stock value

> *We attempted to analyze and predict the major features/reasons affecting the **financial health** of a company with various machine learning models*

# Problem Statement
The rapidly developing capital markets and the integration of the global economy have drastically increased the number of financially distressed companies. Usually, it is gradually revealed about the company’s entry into a financial crisis after a worse process of financial situation. Hence, there exists a scope of predictability. *The unfavorable consequences of corporate financial failure and bankruptcy are innumerable*. Management, employees, stakeholders, etc always stay concerned with their company’s financial health

> There exists a necessity of establishing a **financial distress prediction system** to identify and predict the reasons responsible for the distress caused. Such predictions will help management to forecast the operations based upon the predictions provided, timely management of strategies and strategic plans, optimizing the investors’ investment choices from the analytics provided, etc

Identifying the predictors of **financial distress** of firms by using discriminant machine learning has an important practical significance

# About the Dataset
The dataset has various columns that give information on the sample companies, the period to which the data belongs to, etc. The *financial distress* column is the target variable that determines if the company is financially distressed. The value in the target variable, being greater than -0.50, indicates the company to be healthy(0), and if the value in the target variable is less than -0.50 then the company is financially distress(1). The remaining columns are various financial and non-financial characteristics of the past time period for the sample companies that can be considered for predicting if a company is financially distressed

## Assumptions
This data could be regarded as a multivariate time series classification

# Workflow

> 1. Data pre-processing

The dataset contains masked data as the actual features were unknown. After analyzing the data carefully, it was observed that the values are closely scaled to each other. As the company and time do not have any significance in our prediction, we dropped these values for our further analysis. All the masked labels play an important role in our analysis. From the list of variables, we select the variables with high VIF (Variance Inflation Factor), *a test to detect the presence of multicollinearity*

Algorithms are applied to understand the hidden patterns in data by developing models. Boosting algorithms improve the accuracy and performance of the models by transforming the weak learners into stronger ones

# Technologies Used
The following technologies for our study were applied

![](https://drive.google.com/uc?export=view&id=1uEyu_ykETkoBjG4IgnruxTQ130oqZ7vx)

# Process
We started analyzing the dataset to refine the data and keep the columns relevant to the study and droped some columns that were not relevant

## Model Building

> 2. Features, algorithms, parameter selection

We built the following 4 models for regression and achieve accuracy levels

+ Linear Regression
+ Random Forest Regressor
+ Gradient Boosting Regressor
+ XGB Regressor

In the boosting technique, weak learners are trained sequentially. The predictor variables are taken back to back, such that the successive variable is trying to correct the flaws/mistakes of the predecessors

Gradient Boosting Regressor is a machine learning technique for *regression and classification* problems that has a group of decision tree regressor models

XGBoost is an ensemble of decision trees that are designed to increase/improve the speed and performance of the model

## Actual vs Predicted values

![](https://drive.google.com/uc?export=view&id=1Xo1QL3x1j_vQB2vhKT0nR9NLIrtSAVCd)

Inference - From the above plot, we can depict that the model can capture up to 90% of data points and predict the behavior. But there were few outliers in the predicted values. These outliers are the data points that differ from other data points, significantly

So we removed these outliers with the boxplot technique. The performance of the model can be viewed in the plot below

![](https://drive.google.com/uc?export=view&id=1i88Of8xy6uPv_mztK7XQ_RMjT8tqxO4T)

## Model Metrics
The criterion for deciding if a company is financially healthy or bankrupt depends on the value the target variable has. The values above -0.50 indicate that the company is healthy and is financially secured. If the value is below -0.50, it is treated as financially distressed. As we need to predict a number, we performed a regression analysis

$$Number of observations for DISTRESSED companies(1's): 136$$ 
$$Number of observations for HEALTHY companies(0's): 3536$$

RMSE (Root Mean Square Error): A metric to measure the difference between actual/observed values *(aka RESIDUALS)* and predicted values

> RMSE has been used as a standard statistical parameter to measure model performance in several natural sciences. The parameter indicates the standard deviation of the residuals or how far the points are from the regression or modelled line. The following figure shows the residuals as green arrows and its location between the point data and the regression line

![](https://drive.google.com/uc?export=view&id=1bVFQVGsDnHIiH-196UwB7_yM455T8LXz)

> To calculate the RMSE, the following equation is used

![](https://drive.google.com/uc?export=view&id=1Dw-V-9wUJtIeJtGHOrjNEHKv_Y5tVxOh)

Where
+ n: number of samples
+ f: forecasts
+ o: observed values

It's used to check the performance of the model by telling how concentrated the data is around the line of best fit. It's calculated by the square root of the sum of squares with predicted values subtracted by actual values divided by the sample size

## Accuracy Achieved

|  Model                      |  RMSE         |
| --------------------------- |:-------------:|
| Linear Regression           |     1.76      | 
| Random Forest Regressor     |     1.31      |
| Gradient Boosting Regressor |     1.07      | 
| XGB Regressor               |     1.11      |

> We can see that the best fit model is Gradient Boosting Regressor with RMSE 1.07

# Conclusion
These inferences help to understand if a company is making profits or is prone to suffer from financial distress. By the Gradient Boosting Regressor model, we can analyze and predict the factors that affect a company's financing and plan a budget flow for the company

Understanding the early warning indicators and inferences of corporate financial distress is an important task. If management/stakeholders/employees can predict the financial distress a company can face, they can take necessary measures in time. It should be understood that corporate finance and forecasting are complex

> where forecasting helps in creating more predictable financial outcomes

This is where *Rockwall Analytics* makes its presence felt. Apart from providing actionable insights on customer service, risk modeling, fraud detection, customer segmentation, etc, it also predicts the features/factors that can be the major reasons for a firm to suffer financial distress. *Rockwall Analytics* imbibes Artificial Intelligence, Data Science, and Machine Learning technologies into business operations to gain data intelligence

# About Rockwall Analytics 
*Rockwall Analytics* is a venture-funded startup 
 
Our mission is to enable organizations to make an easier transition into the field of Data Analytics and Artificial Intelligence. We aim to achieve this more efficiently than an organization might expect, were it to invest in the manpower and technology to build such complex platforms. For a universal need such as Data Science and Ai, a dedicated focus is absolutely critical and *Rockwall Analytics* empowers our clients with the knowledge that allows them to apply in their core business offerings

![](https://drive.google.com/uc?export=view&id=1i7fzIUxz-oEs8V4uMdoZCQUl51NMrbVz)