# Accounting Analytics Lab 8-1: Predicting Bankruptcy Using Altman's Z

**Keywords:** Predicting Bankruptcy, Distress, Classification

**Insight:** In 1968, Edward Altman published a paper predicting bankruptcyusing an analysis called Altman’s Z.  The basic terminology is still used today more than fifty years later.

Why is it a classification exercise? It’s because we are trying to put companies into classes, whether companies will be bankrupt or not.

The base of his analysis predicts whether certain common business ratios could be used to classify bankrupt firms as compared to a matched sample of firms. 

Altman considered more than 20 possible ratios of firm performance, but the resulting analysis found that bankruptcy prediction is a linear combination of five common business ratios including the following:

1. ***X1 = Working capital / Total assets***: Measures how liquid, cash-like assets (or liquidity level in relation to the size of the company). 

2. ***X2 = Retained Earnings / Total assets***: Measures long-term profitability over the life of the company

3. ***X3 = Earnings before interest and taxes / Total assets***: Measures recent, or short-term profitability of the company

4. ***X4 =  Market value of stockholders’ equity / Book value of total debt owed***: Long-term solvency of the company, or whether the company will have sufficient funds to pay its debt as it comes due.

5. ***X5 = Sales / Total Assets***: A measure of efficiency, or how well assets are utilized. 

**Required:**

1. Compute the Altman’s five factors used to predict bankruptcy.
2. Weight each of those factors using Altman’s bankruptcy prediction weights to arrive at Altman’s Z score.
3. Classify each company as either in the "Distress Zone", "Gray Zone" or "Safe Zone" for Bankruptcy using Altman’s classifications through use of a histogram.

**Ask the Question:** Which companies do we predict will go bankrupt?





In [1]:
# Import key library functions needed.
import pandas as pd
import numpy as np

## Part 1: Master the Data

Open the file *Lab_8_1_Data.xlsx*

The Financial data we will use was gathered for each publicly traded firms in the retail sector. The dataset contains details on 2,329 firms with all necessary data and years from 2009 to 2017. The data dictionary for the dataset is as follows:

- *gvkey:* A unique code for each company given by Compustat, the data provider for this financial statement data

- *conm:* Company name

- *fyear* = The fiscal year 

- act:* Current assets ($ millions) 

- *at:* Total assets  ($ millions)

- *ebit:* Earnings before interest and taxes  ($ millions)

- *lct:* Current liabilities  ($ millions)

- *lt:* Total liabilities  ($ millions)

- *ni:* Net income  ($ millions)

- *re:* Retained earnings  ($ millions)

- *sale:* Net sales  ($ millions)

- *ME:* Market value of equity  ($ millions)

**Exhibit 1.4** Lending Club Statistics, [Source](https://www.lendingclub.com/info/statistics.action), (Accessed 1/15/2019)

In [2]:
# Specify the excel file containing multiple worksheets (i.e. tables).
data_file = './Lab_8_1_Data.xlsx'

# Create a variable for sales transactions containing the Sales_Transactions worksheet data.
data = pd.read_excel(data_file, 'Sheet1')

# View the first few rows
data.head()

Unnamed: 0,gvkey,conm,fyear,act,at,ebit,lct,lt,ni,re,sale,ME
0,24218,CALLOWAY'S NURSERY INC,2012,8.038,23.143,2.371,5.697,15.954,0.609,-2.489,45.551,6.26
1,14225,FASTENAL CO,2009,982.364,1327.358,296.643,119.509,136.515,184.357,1189.036,1930.33,6139.027
2,14225,FASTENAL CO,2010,1085.698,1468.283,429.724,162.185,185.771,265.356,1276.675,2269.471,8832.591
3,14225,FASTENAL CO,2011,1236.138,1684.948,574.803,187.818,225.972,357.929,1439.167,2766.859,12876.245
4,14225,FASTENAL CO,2012,1286.656,1815.832,673.288,204.174,255.472,420.536,1495.958,3133.577,13834.711


#### Step 1: Add columns X1 - X5 and compute their respective values

In [None]:
# X1 = Working Capital / Total Assets = (act - lct) / at
data = data.assign(X1 = lambda x: ((x['act'] - x['lct']) / x['at']))

# X2 = Retained Earnings / Total Assets = re / at
## TODO: Compute X2 and add it as a new column to the data

# X3 = Earnings before interest and taxes / Total Assets = ebit / at
## TODO: Compute X3 and add it as a new column to the data

# X4 = Market value of stockholder's equity / Book value of total debt owed = ME / lt
## TODO: Compute X4 and add it as a new column to the data

# X5 = Sales / Total Assets = sale / at
## TODO: Compute X5 and add it as a new column to the data

# View the updated data
## TODO: Show the first few rows of the updated data to confirm the columns have been added correctly

## Part 2: Perform the Analysis

Altman’s Z is a bankruptcy prediction using a linear combination of five common business ratios. In essence, the higher of each of these common business ratios, the less likely the company would be to go bankrupt. The original Z-score formula was as follows: 

$Z = 1.2X_{1} + 1.4X_{2} + 3.3X_{3} + 0.6X_{4} + 1.0X_{5} $


Altman found he could accurately classify the firms into three "zones", or classes, using the following cutoff-based
Decision Rules:

| Condition                       | Classification                           |
| :------------------------------ | ---------------------------------------- |
| Z  <  1.80                      |  Bankrupt, or "Distress Zone"            |
| Z >= 1. 80 and   Z < 3.0        |  At risk of Bankruptcy, or "Gray Zone"   |
| Z >= 3.0                        |	 Nonbankrupt, or "Safe Zone"             |

Let's add a "Z" column and a column called "Class" that implements these decision rules.

In [None]:
# Add the Z column
data = data.assign(Z = lambda x: (1.2 * x['X1']) + (1.4 * x['X2']) + (3.3 * x['X3']) + (0.6 * x['X4']) + x['X5'])

# Put the decision rules into a small but separate function
classif = lambda z: 'Safe Zone' if z >= 3.0 else 'Gray Zone' if z >= 1.8 else 'Distress Zone'

# Add the Class column
data['Class'] = data['Z'].map(classif)

# View the updated data
data.head()

Now let's use a histogram to view the number of occurrences in each class.

In [None]:
# Build a small data frame to hold counts of each class
plot_data = pd.DataFrame({'Classification':['Distress Zone', 'Gray Zone', 'Safe Zone'],
                          'Count':[data[data['Class'] == 'Distress Zone'].shape[0],
                                   data[data['Class'] == 'Gray Zone'].shape[0],
                                   data[data['Class'] == 'Safe Zone'].shape[0]]})

# Implement the histogram as a bar plot.
ax = plot_data.plot.bar(x='Classification', y='Count', rot=0)

# Display the summarized plot data to see exact counts
plot_data.head()

### Share the Story

We now have a bankruptcy prediction for every retail company with availability date for each firm-year combination.  Auditors can use this score to understand the financial risk facing their client.  Investors can use this score to under the financial risk they face if they invest in the firm.  Banks and lenders can also use this score to decide if the company will be around to pay the loan back when it is due.

### Assessment

Please answer the following questions.

**Question 1:** How many firms have an Altman Z-score less than 1.8 and fall in the "Distress Zone"?

*TODO:* Provide an answer based on your results

**Question 2:** How many firms have an Altman Z-score greater than or equal to 1.8, but less than 3.0 and fall in the "Gray Zone"?

*TODO:* Provide an answer based on your results


**Question 3:** Based on the Altman Z's formulas, what is the general relationship between each of the five factors (or financial ratios) and the chance of going bankrupt?

*TODO:* Provide an answer based on your results

**Question 4:** With recent financial pressure from e-commerce firms like Amazon and bricks and mortar stores like Walmart, would you predict in more recent years that firms would go bankrupt more or less?

*TODO:* Provide an answer based on your results and business knowledge

**Question 5:** How many distinct companies and fiscal years are considered in this dataset? How many total rows are contained in the dataset?

In [None]:
print('Number of distinct companies represented', data['conm'].unique().shape[0])
## TODO: Similar to above, print out the number distinct fiscal years represented in the data
## TODO: Similar to above, print out the total number of rows in the dataset

## Part 3: Analyzing a Different Dataset

For this part you will use the above analyses as an example and undertake a similar analysis yourself. Specifically, you should accomplish the following:

1. Compute the Altman’s five factors used to predict bankruptcy.
2. Weight each of those factors using Altman’s bankruptcy prediction weights to arrive at Altman’s Z score.
3. Classify each company as either in the "Distress Zone", "Gray Zone" or "Safe Zone" for Bankruptcy using Altman’s classifications through use of a histogram.

Please perform this analysis using the data file entitled *Lab_8_1_Alt_Data.xlsx*.

#### Step 1: Perform the Analysis and Display Key Results

First, we read in the new data:

In [None]:
# Specify the excel file containing multiple worksheets (i.e. tables).
## TODO: Create a varible called data_file and assign it the name of the new data XLSX data file

# Create a variable for sales transactions containing the Sales_Transactions worksheet data.
## TODO: Read in the data file as a data frame and assign it to a variable named data.
##       ** Hint: Make sure to properly name the targt worksheet within the data file.

# View the first few rows
## TODO: Display the first few rows of the data

Next, we add columns X1 through X5:

In [None]:
## TODO: Add columns containing X1 through X5 to the data and display the first few rows to confirm the result.

Now add a *Z* column containing Altman's Z for each row, followed by a *Class* column containing the respective bankruptcy classification.

In [None]:
# Add the Z column
## TODO: Add a new Z column containing the computed Z values

# Put the decision rules into a small but separate function
## TODO: construct a small function to implement Altman's Z decision rules

# Add the Class column
## TODO: Add a Class column which contains the each institution's classification based on the decision rules

# View the updated data
## TODO: View the first few rows to confirm the addition and correctness of these two columns

Finally, construct a histogram to show the counts of each distinct bankruptcy classification.

In [None]:
# Build a small data frame to hold counts of each class
## TODO: Construct a three-row data frame called plot_data ontaining Classification and Count columns. The Classification  
##       column should list the classes 'Distress Zone', 'Gray Zone', and 'Safe Zone', and the Count column should list the 
##       corresponding counts in the dataset
## TODO: Construct plot_data

# Implement the histogram as a bar plot.
## TODO: Build a bar plot showing the differences in the counts for the plot_data data frame

# Display the summarized plot data to see exact counts
## TODO: Print out the plot_data data frame

### Assessment

Please answer the following questions.

**Question 1:** How many firms have an Altman Z-score less than 1.8 and fall in the "Distress Zone"?

*TODO:* Provide an answer based on your results

**Question 2:** How many firms have an Altman Z-score greater than or equal to 1.8, but less than 3.0 and fall in the "Gray Zone"?

*TODO:* Provide an answer based on your results

**Question 3:** Based on the Altman Z's formulas, what is the general relationship between each of the five factors (or financial ratios) and the chance of going bankrupt?

*TODO:* Provide an answer based on your results

**Question 4:** For factor X4, what is the appropriate weight based on Altman's Z?

*TODO:* Provide an answer based on your results

**Question 5:** How many distinct companies and fiscal years are considered in this dataset? How many total rows are contained in the dataset?

In [None]:
## TODO: Print out the number of distinct companies represented, prefaced by identifying text.
## TODO: Similar to above, print out the number distinct fiscal years represented in the data
## TODO: Similar to above, print out the total number of rows in the dataset