# QCTO - Workplace Module

### Project Title: Africa Economic, Banking Systemic Crisis Data
#### Done By: Sharon Mokgadi Ramapuputla
#### Link to github repo: https://github.com/Sharonramapuputla/Workplace
#### Link to Trello board: https://trello.com/b/yQkSU6ca/workplace

© ExploreAI 2024

---

## Table of Contents

<a href=#BC> Background Context</a>

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Data Collection and Description</a>

<a href=#three>3. Loading Data </a>

<a href=#four>4. Data Cleaning and Filtering</a>

<a href=#five>5. Exploratory Data Analysis (EDA)</a>

<a href=#six>6. Modeling </a>

<a href=#seven>7. Evaluation and Validation</a>

<a href=#eight>8. Final Model</a>

<a href=#nine>9. Conclusion and Future Work</a>

<a href=#ten>10. References</a>

---
 <a id="BC"></a>
## **Background Context**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Introduce the project, outline its goals, and explain its significance.
* **Details:** Include information about the problem domain, the specific questions or challenges the project aims to address, and any relevant background information that sets the stage for the work.
---

**Context**

This dataset is a derivative of Reinhart et. al's Global Financial Stability dataset which can be found online at: https://www.hbs.edu/behavioral-finance-and-financial-stability/data/Pages/global.aspx.

The dataset will be valuable to those who seek to understand the dynamics of financial stability within the African context.

**Content**

The dataset specifically focuses on the Banking, Debt, Financial, Inflation and Systemic Crises that occurred, from 1860 to 2014, in 13 African countries, including Algeria, Angola, Central African Republic, Ivory Coast, Egypt, Kenya, Mauritius, Morocco, Nigeria, South Africa, Tunisia, Zambia and Zimbabwe.

**Acknowledgements**

Reinhart, C., Rogoff, K., Trebesch, C. and Reinhart, V. (2019) Global Crises Data by Country.
[online] https://www.hbs.edu/behavioral-finance-and-financial-stability/data. Available at: https://www.hbs.edu/behavioral-finance-and-financial-stability/data/Pages/global.aspx [Accessed: 10 September 2024].

**Inspiration**

My inspiration stems from two questions: "Which factors are most associated with Systemic Crises in Africa?" And; "At which annual rate of inflation does an Inflation Crisis become a practical certainty?"

---
<a href=#one></a>
## **Importing Packages**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Set up the Python environment with necessary libraries and tools.
* **Details:** List and import all the Python packages that will be used throughout the project such as Pandas for data manipulation, Matplotlib/Seaborn for visualization, scikit-learn for modeling, etc.
---

In [45]:
# For data manupulation and analysis.
import pandas as pd
# For numerical  operations.
import numpy as np
# For data visualization.
import matplotlib.pyplot as plt
import seaborn as sns
#For machine learning and modeling tasks like train/test splitting, scalling, and modeling.
import sklearn

---
<a href=#two></a>
## **Data Collection and Description**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Describe how the data was collected and provide an overview of its characteristics.
* **Details:** Mention sources of the data, the methods used for collection (e.g., APIs, web scraping, datasets from repositories), and a general description of the dataset including size, scope, and types of data available (e.g., numerical, categorical).
---

In [21]:
# Import necessary libraries
import pandas as pd

# Load the dataset from the CSV file
# Make sure to replace 'path_to_csv' with the actual file path
df = pd.read_csv('/content/african_crises.csv')

# Display the first few rows of the dataset
print("Dataset Preview:")
print(df.head())

# Check the size of the dataset (number of rows and columns)
print("\nDataset Size (rows, columns):")
print(df.shape)

# Get an overview of the data types for each column
print("\nData Types and Missing Values:")
df.info()

# Check for missing values in the dataset
print("\nMissing Values in Each Column:")
print(df.isnull().sum())

# Provide a statistical summary of numerical features
print("\nStatistical Summary of Numerical Columns:")
print(df.describe())


Dataset Preview:
   case  cc3  country  year  systemic_crisis  exch_usd  \
0     1  DZA  Algeria  1870                1  0.052264   
1     1  DZA  Algeria  1871                0  0.052798   
2     1  DZA  Algeria  1872                0  0.052274   
3     1  DZA  Algeria  1873                0  0.051680   
4     1  DZA  Algeria  1874                0  0.051308   

   domestic_debt_in_default  sovereign_external_debt_default  \
0                         0                                0   
1                         0                                0   
2                         0                                0   
3                         0                                0   
4                         0                                0   

   gdp_weighted_default  inflation_annual_cpi  independence  currency_crises  \
0                   0.0              3.441456             0                0   
1                   0.0             14.149140             0                0   
2        

### **Sources of the Data**:
The dataset titled **"Africa Economic, Banking, and Systemic Crisis Data"** was sourced from **Kaggle**, a popular platform for sharing datasets. It compiles information from various **financial and economic reports, databases**, and **public repositories** related to African countries' economic crises, banking failures, and other systemic issues.

### **Methods of Collection**:
The dataset was created by combining multiple data sources, such as:
- **Economic Reports**: Data from annual reports on GDP, inflation, and debt defaults published by global or regional organizations (e.g., World Bank, IMF).
- **Financial Institutions**: Records of banking crises and financial defaults from central banks or financial regulatory agencies.
- **Historical Records**: Information on currency and systemic crises might be extracted from historical datasets.
- **Manual Compilation**: Some datasets on economic crises may require manual gathering and validation from various academic papers, financial reports, and global databases.

There is no direct indication of APIs or web scraping used for data collection, but it's common for such datasets to be compiled from publicly available financial reports and repositories.

### **General Description of the Dataset**:

1. **Size**:
   - **Rows**: 1,059 rows
   - **Columns**: 14 columns
   
2. **Scope**:
   - This dataset focuses on the **economic crises** across different African countries. It includes information about systemic crises, currency crises, banking crises, and inflation crises from the 19th to the 21st century (years ranging from 1860 to 2014). The dataset covers data for multiple African countries, focusing on major economic and banking events that affected their financial stability.
   
3. **Types of Data**:
   - **Numerical Data**:
     - Year, exchange rate (exch_usd), domestic debt defaults, GDP defaults, inflation rates, etc.
     - Example columns: `exch_usd`, `domestic_debt_in_default`, `inflation_annual_cpi`.
     
   - **Categorical Data**:
     - Country names, crisis types (systemic, currency, inflation, banking).
     - Example columns: `cc3` (country code), `country`, `banking_crisis` (e.g., crisis or no_crisis).
   
   - **Binary Data** (0/1):
     - Presence or absence of crises such as systemic, currency, inflation crises.
     - Example columns: `systemic_crisis`, `currency_crises`, `inflation_crises`, `independence`.

### Summary:
- The dataset contains **1059 entries** of economic data focused on various crises that impacted African countries between 1860 and 2014.
- The data is mostly **numerical** (e.g., exchange rates, inflation, default rates), with some **categorical** (e.g., country names, crisis types) and **binary** (e.g., systemic crisis or not) values.
- It is designed to analyze and understand the patterns, triggers, and outcomes of different financial crises in African countries over time.

This dataset is suitable for both **historical analysis** and **predictive modeling** related to economic crises in Africa.

---
<a href=#three></a>
## **Loading Data**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Load the data into the notebook for manipulation and analysis.
* **Details:** Show the code used to load the data and display the first few rows to give a sense of what the raw data looks like.
---

In [24]:
# Import necessary library
import pandas as pd

# Load the CSV file
df = pd.read_csv('/content/african_crises.csv')

# Display the first few rows of the dataset to give a sense of what the raw data looks like.
df.head()


Unnamed: 0,case,cc3,country,year,systemic_crisis,exch_usd,domestic_debt_in_default,sovereign_external_debt_default,gdp_weighted_default,inflation_annual_cpi,independence,currency_crises,inflation_crises,banking_crisis
0,1,DZA,Algeria,1870,1,0.052264,0,0,0.0,3.441456,0,0,0,crisis
1,1,DZA,Algeria,1871,0,0.052798,0,0,0.0,14.14914,0,0,0,no_crisis
2,1,DZA,Algeria,1872,0,0.052274,0,0,0.0,-3.718593,0,0,0,no_crisis
3,1,DZA,Algeria,1873,0,0.05168,0,0,0.0,11.203897,0,0,0,no_crisis
4,1,DZA,Algeria,1874,0,0.051308,0,0,0.0,-3.848561,0,0,0,no_crisis


---
<a href=#four></a>
## **Data Cleaning and Filtering**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Prepare the data for analysis by cleaning and filtering.
* **Details:** Include steps for handling missing values, removing outliers, correcting errors, and possibly reducing the data (filtering based on certain criteria or features).
---

In [25]:
# Check for missing values
missing_values = df.isnull().sum()
print("Missing Values in Each Column:")
print(missing_values)


Missing Values in Each Column:
case                               0
cc3                                0
country                            0
year                               0
systemic_crisis                    0
exch_usd                           0
domestic_debt_in_default           0
sovereign_external_debt_default    0
gdp_weighted_default               0
inflation_annual_cpi               0
independence                       0
currency_crises                    0
inflation_crises                   0
banking_crisis                     0
dtype: int64


In [28]:
# Example: Fill missing values with the mean of the column (if any)

# Calculate the mean only for numeric columns
numeric_cols = df.select_dtypes(include=['number']).columns
df[numeric_cols] = df[numeric_cols].fillna(df[numeric_cols].mean())

In [29]:
# Using IQR to identify outliers in a numerical column
Q1 = df['inflation_annual_cpi'].quantile(0.25)
Q3 = df['inflation_annual_cpi'].quantile(0.75)
IQR = Q3 - Q1
outlier_threshold_low = Q1 - 1.5 * IQR
outlier_threshold_high = Q3 + 1.5 * IQR

# Filter out the outliers
df_filtered = df[(df['inflation_annual_cpi'] >= outlier_threshold_low) & (df['inflation_annual_cpi'] <= outlier_threshold_high)]


In [30]:
# Removing outliers from the dataset
# df = df[(df['column_name'] >= lower_bound) & (df['column_name'] <= upper_bound)]


In [31]:
# Check for negative values in columns where they should not be
print(df[df['exch_usd'] < 0])

# Example: Correcting errors by setting negative exchange rates to zero
df['exch_usd'] = df['exch_usd'].apply(lambda x: max(x, 0))


Empty DataFrame
Columns: [case, cc3, country, year, systemic_crisis, exch_usd, domestic_debt_in_default, sovereign_external_debt_default, gdp_weighted_default, inflation_annual_cpi, independence, currency_crises, inflation_crises, banking_crisis]
Index: []


In [32]:
# Standardize categorical entries
df['banking_crisis'] = df['banking_crisis'].str.lower().replace({'crisis': 'yes', 'no_crisis': 'no'})


In [35]:
# Filtering data for a specific country and year range
df_filtered = df[(df['country'] == 'Algeria') & (df['year'] >= 2000)]


In [41]:
# Checking if the code worked
df_filtered.head()

Unnamed: 0,case,cc3,country,year,systemic_crisis,exch_usd,domestic_debt_in_default,sovereign_external_debt_default,gdp_weighted_default,inflation_annual_cpi,independence,currency_crises,inflation_crises,banking_crisis
70,1,DZA,Algeria,2000,0,75.3428,0,0,0.0,0.3,1,0,0,no
71,1,DZA,Algeria,2001,0,77.8196,0,0,0.0,4.2,1,0,0,no
72,1,DZA,Algeria,2002,0,79.7234,0,0,0.0,1.43,1,0,0,no
73,1,DZA,Algeria,2003,0,72.6128,0,0,0.0,4.259,1,0,0,no
74,1,DZA,Algeria,2004,0,72.6137,0,0,0.0,3.972,1,0,0,no


In [36]:
# Select specific columns for analysis
df_reduced = df[['country', 'year', 'exch_usd', 'inflation_annual_cpi', 'systemic_crisis']]


In [42]:
# Checking if the code worked
df_reduced.head()

Unnamed: 0,country,year,exch_usd,inflation_annual_cpi,systemic_crisis
0,Algeria,1870,0.052264,3.441456,1
1,Algeria,1871,0.052798,14.14914,0
2,Algeria,1872,0.052274,-3.718593,0
3,Algeria,1873,0.05168,11.203897,0
4,Algeria,1874,0.051308,-3.848561,0


In [43]:
# Display the first few rows of the dataset
df.head()

Unnamed: 0,case,cc3,country,year,systemic_crisis,exch_usd,domestic_debt_in_default,sovereign_external_debt_default,gdp_weighted_default,inflation_annual_cpi,independence,currency_crises,inflation_crises,banking_crisis
0,1,DZA,Algeria,1870,1,0.052264,0,0,0.0,3.441456,0,0,0,yes
1,1,DZA,Algeria,1871,0,0.052798,0,0,0.0,14.14914,0,0,0,no
2,1,DZA,Algeria,1872,0,0.052274,0,0,0.0,-3.718593,0,0,0,no
3,1,DZA,Algeria,1873,0,0.05168,0,0,0.0,11.203897,0,0,0,no
4,1,DZA,Algeria,1874,0,0.051308,0,0,0.0,-3.848561,0,0,0,no


---
<a href=#five></a>
## **Exploratory Data Analysis (EDA)**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Explore and visualize the data to uncover patterns, trends, and relationships.
* **Details:** Use statistics and visualizations to explore the data. This may include histograms, box plots, scatter plots, and correlation matrices. Discuss any significant findings.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#six></a>
## **Modeling**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Develop and train predictive or statistical models.
* **Details:** Describe the choice of models, feature selection and engineering processes, and show how the models are trained. Include code for setting up the models and explanations of the model parameters.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#seven></a>
## **Evaluation and Validation**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Evaluate and validate the effectiveness and accuracy of the models.
* **Details:** Present metrics used to evaluate the models, such as accuracy, precision, recall, F1-score, etc. Discuss validation techniques employed, such as cross-validation or train/test split.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#eight></a>
## **Final Model**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Present the final model and its performance.
* **Details:** Highlight the best-performing model and discuss its configuration, performance, and why it was chosen over others.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#nine></a>
## **Conclusion and Future Work**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Summarize the findings and discuss future directions.
* **Details:** Conclude with a summary of the results, insights gained, limitations of the study, and suggestions for future projects or improvements in methodology or data collection.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#ten></a>
## **References**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Provide citations and sources of external content.
* **Details:** List all the references and sources consulted during the project, including data sources, research papers, and documentation for tools and libraries used.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

## Additional Sections to Consider

* ### Appendix:
For any additional code, detailed tables, or extended data visualizations that are supplementary to the main content.

* ### Contributors:
If this is a group project, list the contributors and their roles or contributions to the project.
