<a href="https://colab.research.google.com/github/classical16/gomycode/blob/main/supervised_learning_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install ydata-profiling



In [2]:
import pandas as pd
from ydata_profiling import ProfileReport

In [3]:
data = pd.read_csv('/content/African_crises_dataset.csv')

In [4]:
profile = ProfileReport(data, title="Ydata_Profiling Report", explorative=True)

In [5]:
profile.to_file('/content/African_crises_dataset.csv')



Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1059 entries, 0 to 1058
Data columns (total 14 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   country_number                   1059 non-null   int64  
 1   country_code                     1059 non-null   object 
 2   country                          1059 non-null   object 
 3   year                             1059 non-null   int64  
 4   systemic_crisis                  1059 non-null   int64  
 5   exch_usd                         1059 non-null   float64
 6   domestic_debt_in_default         1059 non-null   int64  
 7   sovereign_external_debt_default  1059 non-null   int64  
 8   gdp_weighted_default             1059 non-null   float64
 9   inflation_annual_cpi             1059 non-null   float64
 10  independence                     1059 non-null   int64  
 11  currency_crises                  1059 non-null   int64  
 12  inflation_crises    

In [7]:
print(data.head())
print(data.describe())

   country_number country_code  country  year  systemic_crisis  exch_usd  \
0               1          DZA  Algeria  1870                1  0.052264   
1               1          DZA  Algeria  1871                0  0.052798   
2               1          DZA  Algeria  1872                0  0.052274   
3               1          DZA  Algeria  1873                0  0.051680   
4               1          DZA  Algeria  1874                0  0.051308   

   domestic_debt_in_default  sovereign_external_debt_default  \
0                         0                                0   
1                         0                                0   
2                         0                                0   
3                         0                                0   
4                         0                                0   

   gdp_weighted_default  inflation_annual_cpi  independence  currency_crises  \
0                   0.0              3.441456             0                0  

In [8]:
print(data.isnull().sum())

country_number                     0
country_code                       0
country                            0
year                               0
systemic_crisis                    0
exch_usd                           0
domestic_debt_in_default           0
sovereign_external_debt_default    0
gdp_weighted_default               0
inflation_annual_cpi               0
independence                       0
currency_crises                    0
inflation_crises                   0
banking_crisis                     0
dtype: int64


In [9]:
data_no_duplicates = data.drop_duplicates()
print(data_no_duplicates.head())

   country_number country_code  country  year  systemic_crisis  exch_usd  \
0               1          DZA  Algeria  1870                1  0.052264   
1               1          DZA  Algeria  1871                0  0.052798   
2               1          DZA  Algeria  1872                0  0.052274   
3               1          DZA  Algeria  1873                0  0.051680   
4               1          DZA  Algeria  1874                0  0.051308   

   domestic_debt_in_default  sovereign_external_debt_default  \
0                         0                                0   
1                         0                                0   
2                         0                                0   
3                         0                                0   
4                         0                                0   

   gdp_weighted_default  inflation_annual_cpi  independence  currency_crises  \
0                   0.0              3.441456             0                0  

In [10]:
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()

In [11]:
for column in ['banking_crisis']:
    data[column] = label_encoder.fit_transform(data[column])

In [12]:
target_variable = ['systemic_crisis']
features = ['banking_crisis', 'sovereign_external_debt_default', 'exch_usd', 'inflation_annual_cpi']

In [13]:
from sklearn.model_selection import train_test_split

In [14]:
X_train, X_test, y_train, y_test = train_test_split(data[features], data[target_variable], test_size=0.2, random_state=42)

In [15]:
print(f"Training set size: {X_train.shape[0]} samples")
print(f"Test set size: {X_test.shape[0]} samples")

Training set size: 847 samples
Test set size: 212 samples


In [16]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix


In [17]:
model = LogisticRegression()

In [18]:
model.fit(X_train, y_train)

  y = column_or_1d(y, warn=True)


In [19]:
y_pred = model.predict(X_test)


In [20]:
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99       195
           1       0.89      0.94      0.91        17

    accuracy                           0.99       212
   macro avg       0.94      0.97      0.95       212
weighted avg       0.99      0.99      0.99       212

[[193   2]
 [  1  16]]


In [21]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


In [22]:
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")


Accuracy: 0.99


In [23]:
precision = precision_score(y_test, y_pred)
print(f"Precision: {precision:.2f}")


Precision: 0.89


In [24]:
recall = recall_score(y_test, y_pred)
print(f"Recall: {recall:.2f}")

Recall: 0.94


In [25]:
f1 = f1_score(y_test, y_pred)
print(f"F1-score: {f1:.2f}")

F1-score: 0.91


In [26]:
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

Confusion Matrix:
[[193   2]
 [  1  16]]


Feature Selection: Consider selecting a subset of relevant features rather than using all available features. Feature selection can help reduce noise in the model and improve its generalization.

Feature Engineering: Create new features based on the existing ones that might capture more information. This can involve transformations, aggregations, or combinations of existing features.

Data Augmentation: If you have a small dataset, you can create synthetic samples by applying transformations such as rotation, flipping, or scaling to the existing samples.

Algorithm Selection: Try different algorithms and see which one works best for your dataset. Some other popular algorithms for classification tasks include Decision Trees, Random Forests, Support Vector Machines, Naive Bayes, and Gradient Boosting.
