# NumPy Basics - Arrays and Vectorized Computation

Welcome to Chapter 4! This notebook will guide you through the fundamental concepts of NumPy and vectorized programming. We'll explore how NumPy enables efficient array-based operations that form the foundation of data analysis in Python.

## Learning Objectives

By the end of this chapter, you will master:
- **Array-based operations** for data munging, cleaning, subsetting, filtering, and transforming
- **Efficient descriptive statistics** and data aggregation techniques
- **Common algorithms** like sorting, unique operations, and set operations
- **Data alignment** and relational data manipulation for merging heterogeneous datasets
- **Conditional logic** expressed as array expressions
- **Group-wise data manipulation** techniques
- **Vectorized programming** concepts and best practices

Let's dive into the world of efficient numerical computing!

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import randn
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

print(f"NumPy version: {np.__version__}")
print("Setup complete! Ready to explore vectorized computation.")

## 1. Array-Based Operations for Data Munging and Cleaning

NumPy arrays enable vectorized operations that are both faster and more concise than traditional loops. Let's explore the fundamental operations for data manipulation.

In [None]:
# Practice array-based operations here
# Create sample data for manipulation
sample_data = np.array([1.5, 2.3, -1.2, 4.1, -0.8, 3.7, 2.9, -2.1, 1.8, 0.5])
print("Sample data:", sample_data)

# Your array-based operations practice goes here...

## 2. Efficient Descriptive Statistics and Data Aggregation

NumPy provides highly optimized functions for computing statistics and aggregating data across different axes.

In [None]:
# Practice descriptive statistics here
# Create multi-dimensional sample data
stats_data = np.random.randn(5, 4)
print("Sample 2D data:")
print(stats_data)

# Your statistics and aggregation practice goes here...

## 3. Common Algorithms: Sorting, Unique, and Set Operations

Explore NumPy's efficient implementations of common algorithms used in data analysis.

In [None]:
# Practice sorting, unique, and set operations here
# Sample data for algorithm practice
algo_data = np.array([5, 2, 8, 2, 9, 1, 5, 8, 3, 7, 2, 9])
set_a = np.array([1, 2, 3, 4, 5])
set_b = np.array([4, 5, 6, 7, 8])

print("Algorithm practice data:", algo_data)
print("Set A:", set_a)
print("Set B:", set_b)

# Your algorithm practice goes here...

## 4. Data Alignment and Relational Data Manipulation

Learn how to merge and join heterogeneous datasets using NumPy's broadcasting and indexing capabilities.

In [None]:
# Practice data alignment and manipulation here
# Sample datasets for alignment practice
dataset_a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
dataset_b = np.array([10, 20, 30])
indices = np.array([0, 2, 1])

print("Dataset A:")
print(dataset_a)
print("Dataset B:", dataset_b)
print("Indices:", indices)

# Your data alignment practice goes here...

## 5. Expressing Conditional Logic as Array Expressions

Master the art of vectorized conditional operations using `np.where`, boolean indexing, and logical operations.

In [None]:
# Practice conditional logic here
# Sample data for conditional operations
condition_data = np.array([[-1.2, 2.3, -0.8], [1.5, -2.1, 3.7], [0.5, -1.8, 2.9]])
threshold = 0.0

print("Condition data:")
print(condition_data)
print(f"Threshold: {threshold}")

# Your conditional logic practice goes here...

## 6. Group-wise Data Manipulation

Explore techniques for performing operations on grouped data using advanced indexing and aggregation methods.

In [None]:
# Practice group-wise operations here
# Sample grouped data
values = np.array([10, 15, 20, 25, 30, 35, 40, 45])
groups = np.array(['A', 'B', 'A', 'C', 'B', 'A', 'C', 'B'])

print("Values:", values)
print("Groups:", groups)

# Your group-wise manipulation practice goes here...

---

# 🎯 ULTIMATE CHALLENGE: Portfolio Risk Analysis with Monte Carlo Simulation

Now that you've mastered the fundamentals of NumPy, it's time for your final challenge! This exercise combines all the concepts you've learned in a real-world financial application.

## The Challenge: Multi-Asset Portfolio Risk Assessment

You are a quantitative analyst tasked with analyzing the risk and return characteristics of a diversified investment portfolio using Monte Carlo simulation. This challenge will test your ability to work with vectorized operations, statistical analysis, and complex data manipulations.

In [None]:
# PREBUILT DATA FOR THE CHALLENGE
# Do not modify this cell - use these arrays in your solution

# Portfolio of 8 assets with their characteristics
asset_names = np.array(['US_Stocks', 'EU_Stocks', 'Asia_Stocks', 'Bonds', 
                       'Real_Estate', 'Commodities', 'Crypto', 'Cash'])

# Historical annual returns (mean) for each asset
expected_returns = np.array([0.10, 0.08, 0.12, 0.04, 0.07, 0.06, 0.15, 0.02])

# Historical volatility (standard deviation) for each asset
volatilities = np.array([0.16, 0.18, 0.22, 0.05, 0.14, 0.24, 0.45, 0.01])

# Correlation matrix (8x8) between assets
correlation_matrix = np.array([
    [1.00, 0.75, 0.65, -0.20, 0.60, 0.30, 0.15, -0.05],
    [0.75, 1.00, 0.70, -0.15, 0.55, 0.25, 0.10, -0.03],
    [0.65, 0.70, 1.00, -0.25, 0.50, 0.35, 0.20, -0.08],
    [-0.20, -0.15, -0.25, 1.00, 0.10, -0.10, -0.05, 0.90],
    [0.60, 0.55, 0.50, 0.10, 1.00, 0.40, 0.25, 0.05],
    [0.30, 0.25, 0.35, -0.10, 0.40, 1.00, 0.30, -0.15],
    [0.15, 0.10, 0.20, -0.05, 0.25, 0.30, 1.00, -0.20],
    [-0.05, -0.03, -0.08, 0.90, 0.05, -0.15, -0.20, 1.00]
])

# Current portfolio weights (must sum to 1.0)
portfolio_weights = np.array([0.25, 0.20, 0.15, 0.20, 0.10, 0.05, 0.03, 0.02])

# Initial portfolio value
initial_portfolio_value = 1000000  # $1 million

# Simulation parameters
num_simulations = 10000
time_horizon = 252  # Trading days in a year
num_scenarios = 5   # Different market scenarios to test

print("📊 PORTFOLIO CHALLENGE DATA LOADED")
print(f"Assets: {len(asset_names)}")
print(f"Portfolio Value: ${initial_portfolio_value:,}")
print(f"Simulations: {num_simulations:,}")
print(f"Time Horizon: {time_horizon} days")
print("✅ Ready for your challenge!")

## 🎯 Your Mission: Complete These 10 Tasks

### **Task 1: Covariance Matrix Construction**
Using the correlation matrix and volatilities, construct the full covariance matrix for all assets. Remember: `Cov(i,j) = Corr(i,j) * Vol(i) * Vol(j)`

### **Task 2: Portfolio Risk Metrics**
Calculate the portfolio's expected annual return and risk (standard deviation) using matrix operations and the portfolio weights.

### **Task 3: Monte Carlo Setup**
Create a function that generates correlated random returns for all assets using the covariance matrix. Use `np.random.multivariate_normal()` for correlated sampling.

### **Task 4: Single Simulation Path**
Generate one complete simulation path showing daily portfolio values over the time horizon. Plot this path to visualize the portfolio's potential trajectory.

### **Task 5: Full Monte Carlo Simulation**
Run the complete Monte Carlo simulation with all 10,000 paths. Store the final portfolio values for analysis.

### **Task 6: Risk Analysis**
From your simulation results, calculate:
- Value at Risk (VaR) at 95% and 99% confidence levels
- Expected Shortfall (Conditional VaR)
- Maximum Drawdown statistics
- Probability of loss greater than 20%

### **Task 7: Scenario Analysis**
Create 5 different market scenarios by modifying the expected returns:
1. **Bull Market**: Increase all equity returns by 50%
2. **Bear Market**: Decrease all equity returns by 40%
3. **High Inflation**: Increase commodity/real estate returns by 30%, decrease bonds by 20%
4. **Economic Crisis**: Decrease all returns by 30%, increase correlations by 20%
5. **Stagflation**: Flat equity returns, high commodity returns, negative bond returns

Run simulations for each scenario and compare results.

### **Task 8: Optimal Rebalancing**
Implement a dynamic rebalancing strategy:
- Every 21 days (monthly), check if any asset weight deviates by more than 2% from target
- If yes, rebalance back to original weights
- Compare performance with buy-and-hold strategy

### **Task 9: Advanced Analytics**
Create a comprehensive analysis including:
- Rolling 30-day Sharpe ratios across all simulation paths
- Asset contribution to portfolio risk (component VaR)
- Efficient frontier analysis (test 3 different weight combinations)
- Stress testing (what happens if worst-performing asset drops 50%?)

### **Task 10: Visualization and Reporting**
Create a professional dashboard with:
- Histogram of final portfolio values
- Time series plot of portfolio value percentiles (5th, 25th, 50th, 75th, 95th)
- Risk decomposition by asset class
- Scenario comparison table
- Executive summary with key risk metrics

---

## 🔍 Success Criteria
Your solution should demonstrate mastery of:
- ✅ **Vectorized operations** instead of loops
- ✅ **Efficient statistical calculations** using NumPy functions
- ✅ **Advanced indexing and boolean operations**
- ✅ **Broadcasting and array manipulation**
- ✅ **Conditional logic with `np.where` and boolean masks**
- ✅ **Group-wise analysis and aggregations**
- ✅ **Memory-efficient handling of large arrays**

## 💡 Hints
- Use `np.random.multivariate_normal()` for correlated random sampling
- Leverage `np.percentile()` for VaR calculations  
- Use `np.cumprod()` for cumulative returns
- Apply `np.newaxis` for proper broadcasting
- Utilize fancy indexing for rebalancing logic
- Remember that daily returns should be `annual_return / 252`

**Good luck, Quantitative Analyst! Show me your NumPy mastery! 🚀**

In [None]:
# 🚀 YOUR SOLUTION WORKSPACE
# Complete all 10 tasks below using NumPy's vectorized operations

# Task 1: Covariance Matrix Construction
# Your code here...



In [None]:
# Task 2: Portfolio Risk Metrics
# Your code here...


In [None]:
# Task 3: Monte Carlo Setup
# Your code here...

In [None]:
# Tasks 4-10: Continue your implementation here
# Use additional cells as needed for each task
# Remember to use vectorized operations and avoid loops!

# Your complete solution goes here...