# **Financial Risk Classification of S&P 500 Companies Using Machine Learning**

## **Objective**
apply supervised machine learning techniques to classify S&P 500 companies based on financial health indicators such as profit margins, debt levels, and return on equity. We aim to build a predictive model that categorizes companies as low, medium, or high financial risk. This can assist in investment decision-making and financial forecasting.

## **Methodology**
Our approach involves several key steps:
1. Data preprocessing and exploration of variables
2. Feature engineering to create meaningful predictors
3. 
4. 

## **Data Overview**
The dataset including numerous financial metrics that many professionals and investing gurus often use to value companies. This data is a look at the companies that comprise the S&P 500 (Standard & Poor's 500). The S&P 500 is a capitalization-weighted index of the top 500 publicly traded companies in the United States (top 500 meaning the companies with the largest market cap). The S&P 500 index is a useful index to study because it generally reflects the health of the overall U.S. stock market. The dataset was last updated in July 2020.

### **Data Source**
[S&P 500 Companies with Financial Information](https://www.kaggle.com/datasets/paytonfisher/sp-500-companies-with-financial-information?resource=download)

### 1. Install Dependencies
In this section, we install all required dependencies listed in requirements.txt. These packages are essential for data processing, visualization, and implementing various machine learning algorithms for our wildfire risk prediction model.

In [None]:
# Install all required dependencies listed in requirements.txt
# %pip install -r requirements.txt

### 2. Setup and Dependencies
Here, we import all necessary Python libraries for:
- Data manipulation (pandas, numpy)
- Visualization (matplotlib, seaborn)
- Statistical analysis
- Machine learning models (scikit-learn, PyTorch)

In [None]:
# Standard Library
import math
import os
import warnings

# Data Manipulation
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Statistics & Diagnostics
import scipy.stats as stats

# Machine Learning: Preprocessing, Metrics, Utilities
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import LabelEncoder, StandardScaler, PolynomialFeatures
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    mean_absolute_error, mean_squared_error, r2_score,
    classification_report, confusion_matrix, ConfusionMatrixDisplay,
    roc_curve, auc, precision_recall_curve
)
from sklearn.utils import shuffle

# Machine Learning: Models
# Linear Models
from sklearn.linear_model import LinearRegression, LogisticRegression
# Tree-based Models
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import (
    RandomForestClassifier, GradientBoostingClassifier,
    BaggingClassifier, AdaBoostClassifier
)
# Other Models
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier

# Deep Learning with PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Configuration
warnings.filterwarnings("ignore")
pd.set_option("display.float_format", "{:.2f}".format)
sns.set_style("whitegrid")

### 3. Load & Inspect Data
This section loads the weather dataset from a CSV file. We then inspect the data structure, looking at the first few rows, data types, and checking for missing values. This step is crucial for understanding the dataset structure and quality before proceeding with analysis.

In [None]:
# Load the dataset
data_file = os.path.join("..", "dataset", "../dataset/financials.csv")
financial_data = pd.read_csv(data_file)

# Display first few rows of the dataset
# display(financial_data.head())
print(financial_data.head()) 

# Display the data types of the columns
print("\nData types:")
print(financial_data.dtypes)

# Check missing values
print("\nMissing values:")
print(financial_data.isnull().sum())