# 0 - Project kick off

### **Goals:**

1. Show that you can define application and business requirements (Data Architecture competency)
  - Clear list of initial requirements
2. Show how you have identified an interesting project that has the ability to cover many modules of the course. You have in mind a number of ways to increase complexity to meet more challenging learning objectives and you have in mind some way of obtaining or generating useful data.

## 1: Define application & business requirements

### **Idea: Stock data & portfolio mgmt**

Portfolio optimization tool
- A new investor doesn't know how much to invest for a set of companies, but has researched companies they like
- the tool outputs how much of each company this person should buy depending on their risk tolerance
- outputs expected return, some other things as well that we might need 

GPT Output
To implement Modern Portfolio Theory (MPT) in a program, several key aspects need to be considered, including data acquisition, mathematical modeling, optimization algorithms, and user interface elements. Here’s a detailed breakdown of the necessary components:

### 1. Data Acquisition
- **Historical Price Data**: Collect historical price data for the assets in the portfolio. This data is typically sourced from financial data providers such as Yahoo Finance, Alpha Vantage, or Quandl.
- **Risk-free Rate**: Obtain the current risk-free rate, often represented by government bond yields (e.g., 10-year Treasury bond rate).
- **Market Data**: For some advanced models, you might also need market indices data for benchmarking.

### 2. Mathematical Modeling
- **Expected Returns**: Calculate the expected returns for each asset based on historical data. This can be done using methods such as the arithmetic mean or geometric mean of historical returns.
- **Covariance Matrix**: Compute the covariance matrix of asset returns to understand how the assets' returns move relative to each other.
- **Portfolio Returns and Variance**: Develop formulas to calculate the expected return and variance of a portfolio based on the weights of the assets.

### 3. Optimization Algorithms
- **Efficient Frontier**: Implement algorithms to generate the efficient frontier, which is the set of optimal portfolios offering the highest expected return for a defined level of risk.
- **Optimization Techniques**: Use optimization techniques such as:
  - **Quadratic Programming**: Commonly used for mean-variance optimization to find the minimum variance portfolio.
  - **Linear Programming**: For certain types of constraints and objective functions.
  - **Heuristic Methods**: For more complex or non-convex optimization problems.

### 4. Constraints Handling
- **Budget Constraint**: Ensure that the sum of the portfolio weights equals 1.
- **Non-Negativity**: Often, portfolios are constrained to no short-selling, meaning weights must be non-negative.
- **Other Constraints**: Allow for additional constraints such as maximum or minimum weights for certain assets, sector constraints, etc.

### 5. Risk Measures
- **Standard Deviation**: Use standard deviation as a measure of portfolio risk.
- **Other Measures**: Consider other risk measures like Value at Risk (VaR) or Conditional Value at Risk (CVaR) for more sophisticated risk assessments.

### 6. User Interface
- **Input Interface**: Provide an interface for users to input their preferences, such as the list of assets, risk tolerance, investment horizon, etc.
- **Output Interface**: Display the optimized portfolio, including asset weights, expected return, risk (standard deviation), and other relevant metrics.
- **Visualization**: Use charts and graphs to visualize the efficient frontier, portfolio allocation, and risk-return profiles.

### 7. Programming Languages and Libraries
- **Programming Languages**: Common choices include Python, R, MATLAB, and Excel/VBA.
- **Libraries**:
  - **Python**: `numpy` for numerical calculations, `pandas` for data manipulation, `scipy` for optimization, `matplotlib` and `seaborn` for visualization, `cvxopt` or `pyportfolioopt` for optimization routines.
  - **R**: `quantmod` for data acquisition, `PerformanceAnalytics` for performance and risk analysis, `quadprog` for quadratic programming.

### Example Implementation Steps in Python
1. **Data Acquisition**:
    ```python
    import yfinance as yf
    tickers = ['AAPL', 'MSFT', 'GOOGL']
    data = yf.download(tickers, start="2020-01-01", end="2023-01-01")['Adj Close']
    returns = data.pct_change().dropna()
    ```
2. **Expected Returns and Covariance Matrix**:
    ```python
    import numpy as np
    expected_returns = returns.mean()
    covariance_matrix = returns.cov()
    ```
3. **Optimization**:
    ```python
    from scipy.optimize import minimize

    def portfolio_performance(weights, expected_returns, covariance_matrix):
        portfolio_return = np.sum(weights * expected_returns)
        portfolio_volatility = np.sqrt(np.dot(weights.T, np.dot(covariance_matrix, weights)))
        return portfolio_return, portfolio_volatility

    def negative_sharpe_ratio(weights, expected_returns, covariance_matrix, risk_free_rate):
        p_return, p_volatility = portfolio_performance(weights, expected_returns, covariance_matrix)
        return - (p_return - risk_free_rate) / p_volatility

    num_assets = len(tickers)
    args = (expected_returns, covariance_matrix, 0.01)  # Example risk-free rate
    constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})
    bounds = tuple((0, 1) for asset in range(num_assets))
    result = minimize(negative_sharpe_ratio, num_assets * [1. / num_assets,], args=args, method='SLSQP', bounds=bounds, constraints=constraints)
    optimized_weights = result.x
    ```
4. **Visualization**:
    ```python
    import matplotlib.pyplot as plt

    # Generate random portfolios for visualization of the efficient frontier
    def random_portfolios(num_portfolios, expected_returns, covariance_matrix):
        results = np.zeros((3, num_portfolios))
        weights_record = []
        for i in range(num_portfolios):
            weights = np.random.random(num_assets)
            weights /= np.sum(weights)
            portfolio_return, portfolio_volatility = portfolio_performance(weights, expected_returns, covariance_matrix)
            results[0,i] = portfolio_return
            results[1,i] = portfolio_volatility
            results[2,i] = (portfolio_return - 0.01) / portfolio_volatility
            weights_record.append(weights)
        return results, weights_record

    results, weights = random_portfolios(5000, expected_returns, covariance_matrix)

    plt.scatter(results[1,:], results[0,:], c=results[2,:], cmap='YlGnBu', marker='o')
    plt.colorbar(label='Sharpe Ratio')
    plt.xlabel('Volatility')
    plt.ylabel('Return')
    plt.scatter(portfolio_performance(optimized_weights, expected_returns, covariance_matrix)[1],
                portfolio_performance(optimized_weights, expected_returns, covariance_matrix)[0], marker='*', color='r', s=100)
    plt.show()
    ```

These steps provide a foundation for implementing Modern Portfolio Theory in a program, allowing for the construction and analysis of optimal portfolios based on historical data and user-defined constraints.

```mermaid
graph TD;
    M[Movies]
    U[User]
    AS(Admin Status)
    UID("<u>User ID</u>")
    MID("<u>Movie ID</u>")
    C(Num Ratings)
    R[Rating]
    C(Count)
    V(Value)

    M-->MID
    U-->UID
    U-->AS
    R-->C
    R-->V
```

## 2: Link to modules & come up with some ways to (1) increae complexity, (2) generate/obtain data