# 1. Business Understanding

## 1.2 Problem Statement

Many investors, lenders, and business owners rely on intuition or outdated reports when evaluating a company’s financial position. This lack of real-time, data-driven analysis can lead to poor investment or lending decisions.

Our challenge is to develop a **data-powered tool** that automatically analyzes publicly available financial data (like income statements, balance sheets, and cash flows) to assess a company’s **financial stability, profitability, and risk**.

This project will simplify financial decision-making by transforming raw numbers into actionable insights through **data analysis, visualization, and machine learning**.

---

## 1.3 Business Objectives

### Main Objective

To build a **data analysis and scoring system** that evaluates a company’s financial health using real-world financial data.

### Specific Objectives

1. To collect and preprocess financial data from **Yahoo Finance API** and **Alpha Vantage API**.  
2. To analyze key financial metrics such as revenue growth, net income, debt-to-equity ratio, and cash flow trends.  
3. To build a **financial health scoring model** that assigns a score to each company based on performance indicators.  
4. To visualize financial insights using clear dashboards and charts for easier interpretation.  
5. To provide actionable recommendations for investors or business managers.

---

## 1.4 Research Questions

1. What financial indicators most accurately represent a company’s health and stability?  
2. How do profitability, liquidity, and leverage ratios correlate with a company’s risk level?  
3. Can we build a model that classifies companies into categories such as _Healthy_, _Moderate_, and _At Risk_?  
4. How can visualizing financial trends help investors make better decisions?

---

## 1.5 Success Criteria

- The system should accurately collect and clean financial data for multiple companies.  
- It should compute and visualize key financial ratios and trends.  
- The scoring model should produce realistic health scores based on financial fundamentals.  
- The final output should be clear and explainable to both technical and non-technical users.

---

# 2. Data Understanding

We will use **real financial datasets** fetched directly from APIs — not from Kaggle.

---

## Datasets & Sources

| Source | Type of Data | Description |
| --- | --- | --- |
| **Yahoo Finance API (via yfinance)** | Company financials | Income statements, balance sheets, cash flow, and stock history |
| **Alpha Vantage API** | Company and macro data | Financial statements, ratios, and performance indicators |
| **World Bank Open Data (optional)** | Macroeconomic context | GDP, inflation, interest rates (for broader analysis) |

---

## Dataset Overview

Each company dataset will include:

- **Revenue**  
- **Gross profit**  
- **Operating income**  
- **Net income**  
- **Total assets & liabilities**  
- **Cash flow from operations**  
- **Debt-to-equity ratio**  
- **Return on assets (ROA)** and **Return on equity (ROE)**  
- **Stock price performance** over time  

These metrics help us assess profitability, liquidity, leverage, and efficiency — the four main pillars of financial health.

---

## Tools and Libraries

We’ll use the following tools for the analysis:

| Category | Libraries |
| --- | --- |
| **Data Collection** | `yfinance`, `requests`, `pandas` |
| **Data Cleaning & Processing** | `numpy`, `pandas` |
| **Visualization** | `matplotlib`, `seaborn`, `plotly` |
| **Modeling & Scoring** | `scikit-learn`, `statsmodels` |
| **Deployment (Optional)** | `joblib` for model serialization |


# 3. Data Preparation

In this section, we will import the necessary Python libraries and load financial data directly from Yahoo Finance using the `yfinance` API. This will form the foundation of our analysis.

The data will include income statements, balance sheets, cash flow statements, and stock price history for a chosen company. We will then explore its structure before cleaning and feature engineering.


In [1]:
!pip install yfinance



In [2]:
# 3.1 Importing Libraries

# Data manipulation and analysis
import pandas as pd
import numpy as np

# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# For fetching financial data
import yfinance as yf

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 2)

print("Libraries imported successfully!")

Libraries imported successfully!


In [3]:
!pip install alpha_vantage
!pip install fmp-python




In [4]:
# Alpha Vantage
from alpha_vantage.timeseries import TimeSeries

ALPHA_VANTAGE_API_KEY = "9FSZP6TJU7EISCHT"

ts = TimeSeries(key=ALPHA_VANTAGE_API_KEY, output_format='pandas')

# Example: Fetch historical stock data for Apple
alpha_data, meta_data = ts.get_daily(symbol='AAPL', outputsize='full')
alpha_data.head()


Unnamed: 0_level_0,1. open,2. high,3. low,4. close,5. volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2025-10-27,264.88,269.12,264.65,268.81,44900000.0
2025-10-24,261.19,264.13,259.18,262.82,38300000.0
2025-10-23,259.94,260.62,258.01,259.58,32800000.0
2025-10-22,262.65,262.85,255.43,258.45,45000000.0
2025-10-21,261.88,265.29,261.83,262.77,46700000.0


In [5]:
# Define Companies to Analyze

# List of ticker symbols
tickers = ['AAPL', 'MSFT', 'KO']
company_names = {
    'AAPL': 'Apple Inc.',
    'MSFT': 'Microsoft Corporation',
    'KO': 'The Coca-Cola Company'
}

print("Companies selected for analysis:")
for ticker in tickers:
    print(f"  - {ticker}: {company_names[ticker]}")

Companies selected for analysis:
  - AAPL: Apple Inc.
  - MSFT: Microsoft Corporation
  - KO: The Coca-Cola Company


In [6]:
# Fetch Financial Data using yfinance

# Dictionary to store all company data
financial_data = {}

for ticker in tickers:
    print(f"\nFetching data for {ticker}...")
    
    # Create ticker object
    stock = yf.Ticker(ticker)
    
    # Store various financial statements and data
    financial_data[ticker] = {
        'info': stock.info,  # Company information
        'income_statement': stock.financials,  # Income statement (annual)
        'balance_sheet': stock.balance_sheet,  # Balance sheet (annual)
        'cash_flow': stock.cashflow,  # Cash flow statement (annual)
        'quarterly_income': stock.quarterly_financials,  # Quarterly income
        'quarterly_balance': stock.quarterly_balance_sheet,  # Quarterly balance
        'quarterly_cashflow': stock.quarterly_cashflow,  # Quarterly cash flow
        'history': stock.history(period='5y'),  # 5 years of stock price data
    }
    
    print(f" Data fetched successfully for {ticker}")


print("Data collection complete!")



Fetching data for AAPL...
 Data fetched successfully for AAPL

Fetching data for MSFT...
 Data fetched successfully for MSFT

Fetching data for KO...
 Data fetched successfully for KO
Data collection complete!


In [7]:
# Explore the Data Structure

# Examine what we have for Apple as an example
ticker = 'AAPL'
print(f"\nData available for {ticker}:\n")

print("1. Company Info Keys (sample):")
info_keys = list(financial_data[ticker]['info'].keys())[:10]
print(f"   {info_keys}")

print(f"\n2. Income Statement Shape: {financial_data[ticker]['income_statement'].shape}")
print(f"   Metrics: {financial_data[ticker]['income_statement'].index.tolist()[:5]}...")

print(f"\n3. Balance Sheet Shape: {financial_data[ticker]['balance_sheet'].shape}")
print(f"   Metrics: {financial_data[ticker]['balance_sheet'].index.tolist()[:5]}...")

print(f"\n4. Cash Flow Shape: {financial_data[ticker]['cash_flow'].shape}")
print(f"   Metrics: {financial_data[ticker]['cash_flow'].index.tolist()[:5]}...")

print(f"\n5. Stock History Shape: {financial_data[ticker]['history'].shape}")
print(f"   Columns: {financial_data[ticker]['history'].columns.tolist()}")


Data available for AAPL:

1. Company Info Keys (sample):
   ['address1', 'city', 'state', 'zip', 'country', 'phone', 'website', 'industry', 'industryKey', 'industryDisp']

2. Income Statement Shape: (39, 4)
   Metrics: ['Tax Effect Of Unusual Items', 'Tax Rate For Calcs', 'Normalized EBITDA', 'Net Income From Continuing Operation Net Minority Interest', 'Reconciled Depreciation']...

3. Balance Sheet Shape: (68, 4)
   Metrics: ['Treasury Shares Number', 'Ordinary Shares Number', 'Share Issued', 'Net Debt', 'Total Debt']...

4. Cash Flow Shape: (53, 4)
   Metrics: ['Free Cash Flow', 'Repurchase Of Capital Stock', 'Repayment Of Debt', 'Issuance Of Debt', 'Issuance Of Capital Stock']...

5. Stock History Shape: (1256, 7)
   Columns: ['Open', 'High', 'Low', 'Close', 'Volume', 'Dividends', 'Stock Splits']


In [8]:
# Display Sample Financial Statements

# Display income statement for all three companies
print("="*80)
print("INCOME STATEMENTS (Most Recent Year)")
print("="*80)

for ticker in tickers:
    print(f"\n{company_names[ticker]} ({ticker}):")
    print("-" * 60)
    # Get the most recent year (first column)
    income_stmt = financial_data[ticker]['income_statement'].iloc[:, 0]
    print(income_stmt.head(10))  # Show first 10 rows

INCOME STATEMENTS (Most Recent Year)

Apple Inc. (AAPL):
------------------------------------------------------------
Tax Effect Of Unusual Items                                   0.00e+00
Tax Rate For Calcs                                            2.41e-01
Normalized EBITDA                                             1.35e+11
Net Income From Continuing Operation Net Minority Interest    9.37e+10
Reconciled Depreciation                                       1.14e+10
Reconciled Cost Of Revenue                                    2.10e+11
EBITDA                                                        1.35e+11
EBIT                                                          1.23e+11
Net Interest Income                                                NaN
Interest Expense                                                   NaN
Name: 2024-09-30 00:00:00, dtype: float64

Microsoft Corporation (MSFT):
------------------------------------------------------------
Tax Effect Of Unusual Items           

In [9]:
# Create a Summary DataFrame for Key Metrics

summary_data = []

for ticker in tickers:
    info = financial_data[ticker]['info']
    
    # Extract key metrics
    metrics = {
        'Ticker': ticker,
        'Company': company_names[ticker],
        'Sector': info.get('sector', 'N/A'),
        'Industry': info.get('industry', 'N/A'),
        'Market Cap': info.get('marketCap', None),
        'Revenue': info.get('totalRevenue', None),
        'Net Income': info.get('netIncomeToCommon', None),
        'Total Debt': info.get('totalDebt', None),
        'Total Cash': info.get('totalCash', None),
        'Total Assets': info.get('totalAssets', None),
        'Current Ratio': info.get('currentRatio', None),
        'Debt to Equity': info.get('debtToEquity', None),
        'ROE': info.get('returnOnEquity', None),
        'ROA': info.get('returnOnAssets', None),
        'Profit Margin': info.get('profitMargins', None),
    }
    
    summary_data.append(metrics)

# Create DataFrame
summary_df = pd.DataFrame(summary_data)
print("\nCompany Summary:")
print("="*80)
print(summary_df.T)  # Transpose for better readability


Company Summary:
                                   0                          1  \
Ticker                          AAPL                       MSFT   
Company                   Apple Inc.      Microsoft Corporation   
Sector                    Technology                 Technology   
Industry        Consumer Electronics  Software - Infrastructure   
Market Cap             3991248568320              4043822858240   
Revenue                 408624988160               281723994112   
Net Income               99280003072               101831999488   
Total Debt              101698002944               112184000512   
Total Cash               55372001280                94564999168   
Total Assets                    None                       None   
Current Ratio                   0.87                       1.35   
Debt to Equity                154.49                      32.66   
ROE                              1.5                       0.33   
ROA                             0.25        

## 3.2 Data Cleaning

In [10]:
# Inspect Data Quality

for ticker in tickers:
    print(f"\n{company_names[ticker]} ({ticker}):")
    print("-" * 60)
    
    # Check income statement
    income = financial_data[ticker]['income_statement']
    print(f"Income Statement: {income.shape[0]} rows, {income.shape[1]} years")
    print(f"  Missing values: {income.isna().sum().sum()}")
    print(f"  Date range: {income.columns[-1]} to {income.columns[0]}")
    
    # Check balance sheet
    balance = financial_data[ticker]['balance_sheet']
    print(f"Balance Sheet: {balance.shape[0]} rows, {balance.shape[1]} years")
    print(f"  Missing values: {balance.isna().sum().sum()}")
    
    # Check cash flow
    cashflow = financial_data[ticker]['cash_flow']
    print(f"Cash Flow: {cashflow.shape[0]} rows, {cashflow.shape[1]} years")
    print(f"  Missing values: {cashflow.isna().sum().sum()}")
    
    # Check stock history
    history = financial_data[ticker]['history']
    print(f"Stock History: {len(history)} days")
    print(f"  Missing values: {history.isna().sum().sum()}")


Apple Inc. (AAPL):
------------------------------------------------------------
Income Statement: 39 rows, 4 years
  Missing values: 6
  Date range: 2021-09-30 00:00:00 to 2024-09-30 00:00:00
Balance Sheet: 68 rows, 4 years
  Missing values: 13
Cash Flow: 53 rows, 4 years
  Missing values: 17
Stock History: 1256 days
  Missing values: 0

Microsoft Corporation (MSFT):
------------------------------------------------------------
Income Statement: 47 rows, 4 years
  Missing values: 0
  Date range: 2022-06-30 00:00:00 to 2025-06-30 00:00:00
Balance Sheet: 78 rows, 5 years
  Missing values: 82
Cash Flow: 59 rows, 5 years
  Missing values: 60
Stock History: 1256 days
  Missing values: 0

The Coca-Cola Company (KO):
------------------------------------------------------------
Income Statement: 59 rows, 5 years
  Missing values: 76
  Date range: 2020-12-31 00:00:00 to 2024-12-31 00:00:00
Balance Sheet: 77 rows, 5 years
  Missing values: 79
Cash Flow: 58 rows, 5 years
  Missing values: 61
Stoc