# A Test Case for Refactoring a Basic Portfolio Optimization Workflow

We recreated a basic portfolio optimization workflow from 
functions that will eventually be called from an automated pipeline.
Then compared to a past project using assets from the 
S&P 500 to ensure proper implimentation. 
We then replaced the current statistical estimators,
which assume a normal distribution, 
with robust estimators like those used in past projects through 202410-202411.
Finally we considered a slight variation that uses a bayesian approch 
to incorperate expectations derived from other sources.
Eventaully the functions called here will become a key part of the 
inital Condor Funds web-based minimally viable prototype.

The goal here is to identify:
* create seperate functions to support a basic portfolio optimization workflow
* call them from this project to comapre against a past project to ensure concistancy
* add alternate stratagies from past projects and compare
* add an alternate bayesian optimization approch and compare 

### *Notice*
*We stress that none of our Jupyter Notebook projects are to be considered final and of publication quality unless otherwise stated. We also provide no warranty or guarantee of any kind. These projects are meant for testing code and learning concepts in a transparent, and often evolving manner. If new methodologies, stratagies or fundamental understandings are applied in the future, they will most likely be explored in future projects. For transparency and documentation, old projects will not be removed but instead will be deprecated (and clearly marked as such). Look for updates on GitHub https://github.com/Rtasseff/condor_test/tree/main/project*

## Setup

In [1]:
# data set
datasetName = 'sp500_combined.csv'

# Paths

# Analytics dir path *USER SET*
analyticsDir = '/Users/rtasseff/projects/condor_test/analytics'
# Data dir path *USER SET*
dataDir = '/Users/rtasseff/projects/tmp/data_analytics_v1'
import sys
# adding analytics to the system path
sys.path.insert(0, analyticsDir)

from data_mining import load
from functions import genStats
from functions import genFin
from functions import assetPreassess as apa
from functions import utils


import numpy as np
import matplotlib.pyplot as plt


## Background 

*** need to review all text ***
Modern Portfolio Theory (MPT) is a concept in Finance that describes ways of diversifying and allocating assets in a financial portfolio in order to maximize the portfolio's expected return given the owner's risk tolerance. American economist Harry Markowitz first introduced MPT in a 1952 paper. The theory was intended to eliminate ideosyncratic risk, which is the risk inherent in a particular investment due to its unique characteristics.

*** At some point want to stress that mathmatically we ae using volitility as a statistical measure for 'risk', but the use of risk has some implied meaning that is not covered by this analysis. then further expand on what it might mean in a later example ***

A key component of this framework is diversification. When using MPT, an investor bundles different types of investments together so that when some of the securities fall in value, other securities rise in equal amount. Thus, the overall portfolio stays even but as markets rise overall, the portfolio rises along with the market's inside tide.

MPT argues that any given investment's risk and return characteristics should not be viewed alone but evaluated by how it affects the overall portfolio's risk and return. That is, an investor can construct a portfolio of multiple assets that will result in greater returns without a higher level of risk. As an alternative, starting with a desired level of expected return, the investor can construct a portfolio with the lowest possible risk that is capable of producing that return.

MPT uses precise financial mathematics to carefully construct the portfolio. The steps involved included:
1. portfolio selection - valuing and choosing the securities that might be included in the portfolio;
2. portfolio optimization - calculating the optimal allocation, or percentage, for each of the selected assets (i.e. choosing the right mix of assets to maximize return and/or minimize risk);
3. portfolio rebalancing - after the portfolio is implimented (i.e. assets are purchased) the value of individual assets will change over time and the holdings of various assets will need to be decreased or increased to ensure the original allocation (i.e. to maintain the correct overall percentage of each asset); 
4. portfolio reallocation - periodically it is important to check on the behaviour of individual assets as changes to the asset allocations may needed.

*** Went off on a tangent here, need to focus this ***
In the analysis here, we focused on step (2). Other steps will be convered in future projects and Condor Funds materials. It is worth pointing out that step (4) is of important practicale concern. Since the analysis here is enterly dependant on historical data, future events may change the statistical estiamtes of this data as they ocurr. In addition, there are many other market factors to consider for investment choices. The importance of these factors depends on the preferences of the individual investor. In particular, the time-frame for investment and the size (number of different assets) of the portfolio impact the liklyhood that observed returns and risks mirror historical ones. Longer investment time-frames and bigger portfolios (e.g. a 'buy-and-hold' straetegy over 10+ years on a well diversifed portfolio) allow for noise and abnormal events to smooth out, or correct themselves, over time. In these cases, reallocation could be done by simply repeating step (2). Shorter time frames (e.g. < 5 years) become increasingly reliant on other market insights not considered here. Of course there is no guarentee on any time-frame, as past performance does not always mean future results. For example, a new technology could drive a once strong company into bankruptcy over the course of a decade. In this case there would be no chance of recovery over time. However, there are stratagies accessible to even novice investors for periodically (sometimes without effort) repeating step (1) as well, but these will not be discussed here. 


Valuing the securities that might be included in the portfolio.
Calculating the desired asset allocation, that is, the mix of assets.
Performing calculations to optimize the portfolio to get the maximum amount of return for the minimum amount of risk.
Using financial analysis to monitor the portfolio to see if it meets expectations and then making changes to the individual securities or asset mix when market warrant a change.
An important consideration in MPT is that based on statistical measures such as variance and correlation, a single investment's performance is less important than how it impacts the entire portfolio.

MPT also assumes that investors are risk-averse, meaning that they prefer a less risky portfolio to a riskier one for a given level of return. As a practical matter, risk aversion implies that most people should invest in multiple asset classes (stocks, bonds, commodities, cash equivalents or cryptocurrencies for example).

## Data

In [21]:
# Read saved date base
# select stocks of interest by symbol list *USER SET*
stocklist = ["MSFT", "NEE", "CVX"]

# get all data
assets = load.multiAssetHist_CSV(dataDir+'/'+datasetName)
print('Full data set of assets')
assets
assets.info


Full data set of assets


<bound method DataFrame.info of Symbol               A        AAL        AAPL        ABBV        ABNB  \
Date                                                                    
2014-04-01   36.972244  35.767616   17.015982   34.212463         NaN   
2014-04-02   37.175602  35.550785   17.044250   35.026749         NaN   
2014-04-03   37.129684  35.201973   16.926130   35.131809         NaN   
2014-04-04   36.454021  34.466633   16.707170   34.278126         NaN   
2014-04-07   35.653683  33.731285   16.444855   33.240589         NaN   
...                ...        ...         ...         ...         ...   
2024-03-22  147.210846  14.820000  172.046646  176.798874  167.860001   
2024-03-25  145.323914  14.920000  170.618591  176.878128  167.990005   
2024-03-26  144.175781  14.920000  169.480133  177.532028  167.389999   
2024-03-27  147.130981  15.300000  173.075241  178.681305  166.410004   
2024-03-28  145.273987  15.350000  171.247726  180.415100  164.960007   

Symbol            

In [24]:
# get subset

assetsSub = assets[stocklist]

print('----------------')
print('Subset of assets')
print('----------------')

assetsSub
assetsSub.info

----------------
Subset of assets
----------------


<bound method DataFrame.info of Symbol            MSFT        NEE         CVX
Date                                         
2014-04-01   35.011497  18.322134   78.053154
2014-04-02   34.952339  18.163750   78.282722
2014-04-03   34.664944  18.271915   78.079414
2014-04-04   33.701321  18.351105   77.921974
2014-04-07   33.642155  18.136713   76.964340
...                ...        ...         ...
2024-03-22  427.968048  61.779999  154.660004
2024-03-25  422.098633  62.610001  156.470001
2024-03-26  420.890808  61.430000  155.270004
2024-03-27  420.671204  63.790001  156.350006
2024-03-28  419.962494  63.910000  157.740005

[2516 rows x 3 columns]>

In [None]:
# optimization




