https://www.kaggle.com/datasets/fedesoriano/company-bankruptcy-prediction



Taiwanese Bankruptcy Prediction
Donated on 6/27/2020
The data were collected from the Taiwan Economic Journal for the years 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange.



Source
Deron Liang and Chih-Fong Tsai, deronliang '@' gmail.com; cftsai '@' mgt.ncu.edu.tw, National Central University, Taiwan
The data was obtained from UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction

Relevant Papers
Liang, D., Lu, C.-C., Tsai, C.-F., and Shih, G.-A. (2016) Financial Ratios and Corporate Governance Indicators in Bankruptcy Prediction: A Comprehensive Study. European Journal of Operational Research, vol. 252, no. 2, pp. 561-572.
https://www.sciencedirect.com/science/article/pii/S0377221716000412

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv("data.zip")
df.shape

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df["Bankrupt?"].value_counts()

In [None]:
fig, axes = plt.subplots(5, 6, figsize=(12, 12))
df[df["Bankrupt?"] == 1].iloc[:, :30].hist(bins=50, alpha=.5, ax=axes);
df[df["Bankrupt?"] == 0].iloc[:, :30].hist(bins=50, alpha=.5, ax=axes);

In [None]:
fig, axes = plt.subplots(5, 6, figsize=(12, 12))
df[df["Bankrupt?"] == 1].iloc[:, 30:60].hist(bins=50, alpha=.5, ax=axes);
df[df["Bankrupt?"] == 0].iloc[:, 30:60].hist(bins=50, alpha=.5, ax=axes);

In [None]:
df.dtypes.value_counts()

In [None]:
df.dtypes.describe()

In [None]:
df.describe()

In [None]:
df.describe().T.describe().round(2)

In [None]:
df.skew().describe()

In [None]:
df.kurt().describe()

In [None]:
corr = df.corr()

In [None]:
sns.heatmap(corr)

In [None]:
corr.describe().T.describe()

In [None]:
sns.violinplot(df)

In [None]:
df_scaling = (df - df.mean()) / df.std()
df_scaling.describe().round(2)

In [None]:
df_scaling.describe().T.describe().round(4)

In [None]:
sns.violinplot(df_scaling)

In [None]:
lable_name = "Bankrupt?"

X_raw = df.drop(columns=lable_name)
y_raw = df[lable_name]

X_raw.shape, y_raw.shape

In [None]:
from imblearn.over_sampling import SMOTE

sm = SMOTE(random_state=42)
X, y = sm.fit_resample(X_raw, y_raw)
X.shape, y.shape

In [None]:
X

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, stratify=y, random_state=42)

In [None]:
from sklearn.ensemble import HistGradientBoostingClassifier

model = HistGradientBoostingClassifier()
model