# Portfolio Diversity Analysis Tutorial

## What is Portfolio Diversity?

Portfolio diversity measures how different your investments are from each other. Think of it like this:
- **High diversity**: Your stocks behave very differently (when one goes up, another might go down)
- **Low diversity**: Your stocks behave similarly (they all go up or down together)

**Why does this matter?** 
- Diverse portfolios are less risky
- If one stock crashes, others might still do well
- It's like the saying "don't put all your eggs in one basket"

## What is ROA (Return on Assets)?

ROA tells us how efficiently a company uses its assets to generate profit. It's calculated as:

$$\text{ROA} = \frac{\text{Net Income}}{\text{Total Assets}}$$

Higher ROA = Company is better at making money with what it owns.

## Understanding the Code

### Step 1: Import the Tools


In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

**What this does:** Load the Python libraries we need for calculations and making charts.

### Step 2: Input Your Stock Data


In [None]:
roa_data = np.array([
    [0.5324, 0.5742, 0.3855, 0.0817, 0.1720],  # Stock 1: NVDA (5 years of ROA)
    [0.10, 0.11, 0.12, 0.13, 0.14],           # Stock 2: (5 years of ROA)
    # ... more stocks
])

**What this does:** 
- Each row = one stock
- Each column = one year of ROA data
- You can replace these numbers with real ROA data from companies you're interested in

### Step 3: Calculate Average ROA


In [None]:
avg_roa = np.mean(roa_data, axis=1)

**Formula:** 
$$\text{Average ROA} = \frac{\text{ROA}_{\text{Year 1}} + \text{ROA}_{\text{Year 2}} + ... + \text{ROA}_{\text{Year n}}}{n}$$

**What this does:** Finds the average ROA for each stock over all years.

### Step 4: Calculate Portfolio Diversity

The diversity is calculated using correlation:

$$\text{Diversity} = 1 - \text{Average Correlation}$$

**Where correlation measures how similarly two stocks behave:**
- Correlation = +1: Stocks move exactly together
- Correlation = 0: Stocks move independently  
- Correlation = -1: Stocks move in opposite directions

Correlation measures how two variables move together.
In finance, it tells us if two stocks tend to rise and fall at the same time.

Formula for correlation (Pearson correlation coefficient):
For two variables X and Y:  
    $$Corr(X, Y) = Cov(X, Y) / (std(X) * std(Y))$$

For multiple varibels:
  $$r = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} \cdot \sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}}
  $$

Where:
- $cov(X, Y)$ is the covariance between X and Y
- $std(X)$ is the standard deviation of X
- $std(Y)$ is the standard deviation of Y

In pandas, you can calculate correlation between columns using:  
`df.corr()`

**Interpretation:**  
$𝑟>0$: Positive correlation → as X increases, Y tends to increase.  
$r<0$: Negative correlation → as X increases, Y tends to decrease.  
$∣r∣$ close to 1: Strong linear relationship.  
$∣r∣$ close to 0: Weak or no linear relationship.  





In [None]:
def portfolio_diversity(roa_data_t):
    df = pd.DataFrame(roa_data_t)
    corr_matrix = df.corr()  # Calculate correlation between all stock pairs
    upper_tri = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
    avg_corr = upper_tri.stack().mean()  # Average correlation
    diversity = 1 - avg_corr
    return diversity, corr_matrix



**Diversity Scale:**
- **0.8 - 1.0**: Very diverse (excellent!)
- **0.6 - 0.8**: Good diversity
- **0.4 - 0.6**: Moderate diversity
- **0.0 - 0.4**: Low diversity (risky!)

## How to Use This Program

### Step 1: Gather ROA Data
1. Choose 5-10 companies you want to analyze
2. Find their ROA data for the past n years (use websites like Yahoo Finance, Google Finance)
3. Replace the numbers in `roa_data` with your real data

### Step 2: Run the Analysis
The program will:
1. Calculate average ROA for each stock
2. Show you which stocks perform best on average
3. Calculate how diverse your portfolio is
4. Create visual charts

### Step 3: Interpret Results

**Example Output:**


Average ROA for each stock:
Stock 1: 0.3492  (NVDA - highest average ROA)
Stock 2: 0.1200
Stock 3: 0.2000
...
Portfolio Diversity: 0.7234  (Good diversity!)



**Charts you'll get:**
1. **Bar Chart**: Shows average ROA for each stock
2. **Correlation Heatmap**: Shows how similar each pair of stocks is

## Reading the Correlation Heatmap

The heatmap uses colors to show correlation:
- **Red**: High positive correlation (stocks move together)
- **Blue**: Low or negative correlation (stocks move independently)
- **White**: Neutral

**Goal:** You want to see more blue squares (low correlation) for better diversity.



# Python Libraries Installation Tutorial

## What are these libraries used for?

Before installing, let's understand what each library does:

- **NumPy (`numpy`)**: Mathematical operations and array handling
- **Pandas (`pandas`)**: Data analysis and manipulation (like Excel for Python)
- **Seaborn (`seaborn`)**: Beautiful statistical visualizations
- **Matplotlib (`matplotlib`)**: Basic plotting and charts

## Prerequisites

**Step 1: Install Python**
1. Go to [python.org](https://python.org)
2. Download Python 3.8 or newer
3. During installation, **check "Add Python to PATH"** ✅
4. Click "Install Now"

**Step 2: Verify Python Installation**
Open terminal (Mac) or command prompt (Windows) and type:


In [None]:
python --version

You should see something like `Python 3.11.0`

## Installation Methods

### Method 1: Individual Installation 

Open your terminal/command prompt and run these commands **one by one**:



In [None]:
pip install numpy

In [None]:
pip install pandas

In [None]:
pip install seaborn

In [None]:
pip install matplotlib