## <center> <font color= 'blue'> S&P 500 Risk Optimizations Forecast </font> 
<div align="right">

[![S&P500-Optimizations-Forecast](https://img.shields.io/badge/EstebanMqz-README_>_S&P500_Risk_Optimizations_Forecast-black?style=square&logo=github&logoColor=black)](https://github.com/EstebanMqz/SP500-Risk-Optimizations-Forecast)
</div>



##### <font color= 'blue'> Abstract: <font>

<font color= 'black'> 
Time series modelling is a powerful forecast tool and the stock market tends to be an interesting example because statistical estimators are of special interest.<br> 
They are used for general prediction purposes and to make decision-making processes more efficient.<br>
The industries where it can be applied are numerous, but the most common are the following:<br>

- Government
- Banking
- Insurance
- Energy
- Healthcare
- Telecommunications
- Retail
- Education

#### <font color= 'blue'> Description: <font>

<font color= 'black'>

Since Covid, data has changed in most industries with few exceptions and the markets are just another example.<br>

With this in mind, the present repository automates tasks that deliver a full understanding of the market since and until the user *(you)* execution's date. <br>
Furthermore, it generates a variety of optimizations whose purpose is to forecast the mentioned period while being able to incorporate newly generated data with the usage of the snippets provided by the scripts, which are executed by:&nbsp; [![S&P500-Optimizations-Forecast](https://img.shields.io/badge/Notebook-Run>All-black?style=square&logo=github&logoColor=black)](https://github.com/EstebanMqz/SP500-Risk-Optimizations-Forecast/blob/main/SP500-Risk-Optimized-Portfolios-ML.ipynb) <br>


<left>

##### <font color= 'blue'> Work Contact: <font>

[![Website](https://img.shields.io/badge/Website-ffffff?style=square&logo=opera&logoColor=red)](https://estebanmqz.com) [![LinkedIn](https://img.shields.io/badge/LinkedIn-041a80?style=square&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/esteban-m65381722210212839/) [![Portfolio](https://img.shields.io/badge/Github-Portfolio-010b38?style=square&logo=github&logoColor=black)](https://estebanmqz.github.io/Portfolio/) [![E-mail](https://img.shields.io/badge/Business-Mail-052ce6?style=square&logo=mail&logoColor=white)](mailto:esteban@esteban.com)
<br>

![GitHub Logo](https://github.com/EstebanMqz.png?size=50) [![Github](https://img.shields.io/badge/Github-000000?style=square&logo=github&logoColor=white)](https://github.com/EstebanMqz) 


<div class="alert alert-block alert-info">

##### <font color= 'blue'> Table of Contents: </font>

It is divided in the folllowing sections:

0. *Requirements*:

1. *Data Extraction and Exploration*: 
    + $x_i\in [x_1,x_{500}] \hookrightarrow S\&P500$<br><br>

2. *Descriptive Statistics $\&$ Analytics*: 
    + $x_i\in [x_1,x_{500}] \hookrightarrow S\&P500$<br>
    + $x_{j\neq i}\in [x_1,x_{25}]_{{R_{Sortino_{+_{25}}}}} \subset x_i$ <br><br>

3. *Optimizations*:
    + ${max}_{\vec{w_{j\neq{i}}}} R_{Sortino_{P}} \models$ 
    $\begin{Bmatrix} {max}_{\vec{w_{j}}} R_{Sharpe_{P}} \\ {min}_{\vec{w_{j}}} R_{\sigma^2_{P}} \\ {max}_{\vec{w_{j}}} R_{Calmar_{P}} \\ {max}_{\vec{w_{j}}} R_{Burke{P}} \\ {max}_{\vec{w_{j}}} R_{Traynor_{P}} \\ {max}_{\vec{w_{j}}} R_{Jensen_{P}} \
    \end{Bmatrix}$
    $\forall$  $x_{j\neq i}\in [x_1,x_{25}]_{{R_{Sortino_{+_{25}}}}}$ 
4. *Simulations*: 
$$ x_i \sim X_i \hookrightarrow {max}_{\vec{w_{j\neq{i}}}} R_{Sortino_{P}}$$

5. *Forecast*: 
    + $X_{(t_1+t_2+..+t_n)} \hookrightarrow {max}_{\vec{w_{j\neq{i}}}} R_{Sortino_{P}}$<br>

### <font color= 'blue'> Methodology: </font>

<font color= 'black'>

Efficient Data Extraction techniques are made for its cleaning and Exploration followed by Descriptive Statistics $\forall x_i\in [x_1,x_{500}] \hookrightarrow$ S&P 500 that make a storytell out of themselves as well as what and how it has happened. 

*This is made in addition to theoretical demonstrations and experimental comparisons that opt for the use of transformed data.* <br>

Moreover, estimators and statistical measures are modelled and they incorporate common periodicity resampling periods, <br>
as well as some of the tools displayed on this `README.md`. </br>

As result, the following optimizations are made to subsequently generate simulations with what would have been its past behavior, <br>
concluding with the optimization's forecast out of simulated data:

<div class="alert alert-block alert-success">

Optimizations:

$$\bigg[{max}_{\vec{w_{j\neq{i}}}} R_{Sortino_{P}} \models {max|min}_{\vec{w_{j\neq{i}}}} R_{k}\bigg] \forall x_{j\neq i}\in [x_1,x_{25}]_{{R_{Sortino_{+_{25}}}}}$$


Simulations:

$$\sum_{j\neq{i}}^{n} x_i \sim X_i \hookrightarrow {max}_{\vec{w_{j\neq{i}}}} R_{Sortino_{P}}$$

Time series forecast:

$$X_{t_1+t_2+...+t_n} \hookrightarrow {max}_{\vec{w_{j\neq{i}}}} R_{Sortino_{P}}$$

#### <font color= 'blue'> 0. Modules: <font>

<font color= 'black'> 

Modules in repository will be imported as the following callables in list `mod` (dt,fn, vs). <br>


In [2]:
import glob
mod = [__import__(name[:-3]) for name in glob.glob('*.py')]
glob.glob('*.py')

['data.py', 'functions.py', 'visualizations.py']

### <font color= 'blue'> 1. Requirements: <font>

<font color= 'black'> 

Libraries used for the project are imported from <span style='color:teal'> [`fn.get_requirements`](https://github.com/EstebanMqz/SP500-Risk-Optimized-Portfolios-PostCovid-ML/blob/main/functions.py) <span style='color:black'> (mod) into `requirements.txt`:

<span style='color:gray'> *Skip to installation if you are not contributing to the project.* <font>

In [3]:
!pipreqs --encoding utf-8 "./" --force

docstring = """# -- -------------------------------------------------------------------------------------- -- # 
# -- project: S&P500-Risk-Optimized-Portfolios-PostCovid-ML                                 -- # 
# -- script: requirements.txt: txt file to download Python modules for execution            -- # 
# -- author: EstebanMqz                                                                     -- # 
# -- license: CC BY 3.0                                                                     -- # 
# -- repository: SP500-Risk-Optimized-Portfolios-PostCovid-ML/blob/main/requirements.txt    -- #                                  
# -- -------------------------------------------------------------------------------------- -- # 
\n
"""
mod[1].get_requirements(docstring)

with open(glob.glob('*.txt')[0], 'r') as file: print(file.read())

# -- -------------------------------------------------------------------------------------- -- # 
# -- project: S&P500-Risk-Optimized-Portfolios-PostCovid-ML                                 -- # 
# -- script: requirements.txt: txt file to download Python modules for execution            -- # 
# -- author: EstebanMqz                                                                     -- # 
# -- license: CC BY 3.0                                                                     -- # 
# -- repository: SP500-Risk-Optimized-Portfolios-PostCovid-ML/blob/main/requirements.txt    -- #                                  
# -- -------------------------------------------------------------------------------------- -- # 


fitter >= 1.2.3
matplotlib >= 3.5.3
numpy >= 1.24.3
pandas >= 1.4.4
plotly >= 5.6.0
scikit_learn >= 1.2.2
scipy >= 1.7.3
seaborn >= 0.11.2
tabulate >= 0.8.9
yahoofinancials >= 1.6
jupyter >= 1.0.0 
ipython >= 8.10.0 



INFO: Successfully saved requirements file in ./requirements.txt


<font color= 'black'>

Libraries in [`requirements.txt`](https://github.com/EstebanMqz/SP500-Risk-Optimized-Portfolios-PostCovid-ML/blob/main/requirements.txt) are installed:

In [4]:
%%capture
!pip install -r requirements.txt

#### <font color= 'blue'> 1.1 Load Libraries: <font>

<font color= 'black'> 

Once `requirements.txt` libraries are installed, they can be imported in a python environment: 

In [5]:
import glob, os, pipreqs, sys, warnings

import numpy as np
import pandas as pd
pd.set_option("display.max_rows", None, "display.max_columns", None
              ,"display.max_colwidth", None, "display.width", None)

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use("dark_background")
%matplotlib inline

import plotly.graph_objects as go 
import plotly.express as px

import scipy
import scipy.stats as st
from scipy import optimize
from scipy.optimize import minimize

import sklearn
from sklearn.neighbors import KernelDensity
from sklearn.model_selection import GridSearchCV 
from sklearn import metrics

from statsmodels.tsa.stattools import pacf 
from statsmodels.tsa.stattools import acf
import statsmodels.api as sm 

import re
from yahoofinancials import YahooFinancials 
from tabulate import tabulate
import IPython.display as d
import IPython.core.display

import ast
from io import StringIO
from fitter import Fitter, get_common_distributions, get_distributions 
import logging

import datetime 
import time

import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=UserWarning)

<br><br>

### <font color= 'blue'> 2. Data Extraction and Exploration: <font>

#### <font color= 'blue'> 2.1 Data Extraction: <font>

<font color= 'black'> 

In this section $x_i\in [x_1,x_{500}] \hookrightarrow S\&P500$ quotes are fetched:

<span style='color:gray'> *Fetching a lot of data from Yahoo Finance by batches is required to avoid host disruptions (other sources could be used).* <br>

<span style='color:teal'> [fn.SP500_tickers](https://github.com/EstebanMqz/SP500-Risk-Optimized-Portfolios-PostCovid-ML/blob/main/functions.py):

<font color= 'black'> 

In this case, it is divided by 50 and it includes major indexes in stock exchanges. <font>

In [96]:
#How can i call mod[0].symbols_index(50) 12 times in a row without having to write it 12 times? You can do it with a list comprehension:
list = [mod[0].symbols_index(50) for _ in range(12)]

In [97]:
list[0]

SP500,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
0,MMM,AOS,ABT,ABBV,ACN,ATVI,ADM,ADBE,ADP,AAP,AES,AFL,A,APD,AKAM,ALK,ALB,ARE,ALGN,ALLE,LNT,ALL,GOOGL,GOOG,MO,AMZN,AMCR,AMD,AEE,AAL,AEP,AXP,AIG,AMT,AWK,AMP,ABC,AME,AMGN,APH,ADI,ANSS,AON,APA,AAPL,AMAT,APTV,ACGL,ANET,AJG
1,AIZ,T,ATO,ADSK,AZO,AVB,AVY,AXON,BKR,BALL,BAC,BBWI,BAX,BDX,WRB,BRK-B,BBY,BIO,TECH,BIIB,BLK,BK,BA,BKNG,BWA,BXP,BSX,BMY,AVGO,BR,BRO,BF-B,BG,CHRW,CDNS,CZR,CPT,CPB,COF,CAH,KMX,CCL,CARR,CTLT,CAT,CBOE,CBRE,CDW,CE,CNC
2,CNP,CDAY,CF,CRL,SCHW,CHTR,CVX,CMG,CB,CHD,CI,CINF,CTAS,CSCO,C,CFG,CLX,CME,CMS,KO,CTSH,CL,CMCSA,CMA,CAG,COP,ED,STZ,CEG,COO,CPRT,GLW,CTVA,CSGP,COST,CTRA,CCI,CSX,CMI,CVS,DHI,DHR,DRI,DVA,DE,DAL,XRAY,DVN,DXCM,FANG
3,DLR,DFS,DISH,DIS,DG,DLTR,D,DPZ,DOV,DOW,DTE,DUK,DD,DXC,EMN,ETN,EBAY,ECL,EIX,EW,EA,ELV,LLY,EMR,ENPH,ETR,EOG,EPAM,EQT,EFX,EQIX,EQR,ESS,EL,ETSY,RE,EVRG,ES,EXC,EXPE,EXPD,EXR,XOM,FFIV,FDS,FICO,FAST,FRT,FDX,FITB
4,FSLR,FE,FIS,FISV,FLT,FMC,F,FTNT,FTV,FOXA,FOX,BEN,FCX,GRMN,IT,GEHC,GEN,GNRC,GD,GE,GIS,GM,GPC,GILD,GL,GPN,GS,HAL,HIG,HAS,HCA,PEAK,HSIC,HSY,HES,HPE,HLT,HOLX,HD,HON,HRL,HST,HWM,HPQ,HUM,HBAN,HII,IBM,IEX,IDXX
5,ITW,ILMN,INCY,IR,PODD,INTC,ICE,IFF,IP,IPG,INTU,ISRG,IVZ,INVH,IQV,IRM,JBHT,JKHY,J,JNJ,JCI,JPM,JNPR,K,KDP,KEY,KEYS,KMB,KIM,KMI,KLAC,KHC,KR,LHX,LH,LRCX,LW,LVS,LDOS,LEN,LNC,LIN,LYV,LKQ,LMT,L,LOW,LYB,MTB,MRO
6,MPC,MKTX,MAR,MMC,MLM,MAS,MA,MTCH,MKC,MCD,MCK,MDT,MRK,META,MET,MTD,MGM,MCHP,MU,MSFT,MAA,MRNA,MHK,MOH,TAP,MDLZ,MPWR,MNST,MCO,MS,MOS,MSI,MSCI,NDAQ,NTAP,NFLX,NWL,NEM,NWSA,NWS,NEE,NKE,NI,NDSN,NSC,NTRS,NOC,NCLH,NRG,NUE
7,NVDA,NVR,NXPI,ORLY,OXY,ODFL,OMC,ON,OKE,ORCL,OGN,OTIS,PCAR,PKG,PARA,PH,PAYX,PAYC,PYPL,PNR,PEP,PFE,PCG,PM,PSX,PNW,PXD,PNC,POOL,PPG,PPL,PFG,PG,PGR,PLD,PRU,PEG,PTC,PSA,PHM,QRVO,PWR,QCOM,DGX,RL,RJF,RTX,O,REG,REGN
8,RF,RSG,RMD,RVTY,RHI,ROK,ROL,ROP,ROST,RCL,SPGI,CRM,SBAC,SLB,STX,SEE,SRE,NOW,SHW,SPG,SWKS,SJM,SNA,SEDG,SO,LUV,SWK,SBUX,STT,STLD,STE,SYK,SYF,SNPS,SYY,TMUS,TROW,TTWO,TPR,TRGP,TGT,TEL,TDY,TFX,TER,TSLA,TXN,TXT,TMO,TJX
9,TSCO,TT,TDG,TRV,TRMB,TFC,TYL,TSN,USB,UDR,ULTA,UNP,UAL,UPS,URI,UNH,UHS,VLO,VTR,VRSN,VRSK,VZ,VRTX,VFC,VTRS,VICI,V,VMC,WAB,WBA,WMT,WBD,WM,WAT,WEC,WFC,WELL,WST,WDC,WRK,WY,WHR,WMB,WTW,GWW,WYNN,XEL,XYL,YUM,ZBRA


In [98]:
list[1]

Dow_30,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29
0,MMM,AXP,AMGN,AAPL,BA,CAT,CVX,CSCO,KO,DIS,DOW,GS,HD,HON,IBM,INTC,JNJ,JPM,MCD,MRK,MSFT,NKE,PG,CRM,TRV,UNH,VZ,V,WBA,WMT


In [99]:
list[2]

Nasdaq_100,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
0,ATVI,ADBE,ADP,ABNB,ALGN,GOOGL,GOOG,AMZN,AMD,AEP,AMGN,ADI,ANSS,AAPL,AMAT,ASML,AZN,TEAM,ADSK,BKR,BIIB,BKNG,AVGO,CDNS,CHTR,CTAS,CSCO,CTSH,CMCSA,CEG,CPRT,CSGP,COST,CRWD,CSX,DDOG,DXCM,FANG,DLTR,EBAY,EA,ENPH,EXC,FAST,FISV,FTNT,GILD,GFS,HON,IDXX
1,ILMN,INTC,INTU,ISRG,JD,KDP,KLAC,KHC,LRCX,LCID,LULU,MAR,MRVL,MELI,META,MCHP,MU,MSFT,MRNA,MDLZ,MNST,NFLX,NVDA,NXPI,ORLY,ODFL,PCAR,PANW,PAYX,PYPL,PDD,PEP,QCOM,REGN,RIVN,ROST,SGEN,SIRI,SBUX,SNPS,TMUS,TSLA,TXN,VRSK,VRTX,WBA,WBD,WDAY,XEL,ZM
2,ZS,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [100]:
list[3]

Russell_1000,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
0,TXG,MMM,ABT,ABBV,ACHC,ACN,ATVI,AYI,ADM,ADBE,ADP,ADT,AAP,WMS,ACM,AES,AFRM,AFL,AGCO,A,AGL,AGNC,AIG,AL,APD,ABNB,AKAM,ALK,ALB,ACI,AA,ARE,ALGN,ALLE,ALGM,LNT,ALSN,ALL,ALLY,ALNY,GOOGL,GOOG,AYX,ATUS,MO,AMZN,AMC,AMCR,AMD,DOX
1,AMED,AEE,AAL,AEP,AXP,AFG,AMH,AMT,AWK,COLD,AMP,ABC,AME,AMG,AMGN,APH,ADI,NLY,ANSS,AM,AR,AON,APA,AIRC,APO,AAPL,AMAT,APP,ATR,APTV,ARMK,ACGL,AMBP,ARES,ANET,AWI,ARW,AJG,ASH,AZPN,AIZ,AGO,T,TEAM,ATO,ADSK,AN,AZO,AVB,AGR
2,AVTR,AVY,CAR,AVT,AXTA,AXS,AXON,AZEK,AZTA,BKR,BALL,BAC,BOH,OZK,BBWI,BAX,BDX,BSY,WRB,BRK-B,BERY,BBY,BILL,BIO,TECH,BIIB,BMRN,BJ,BKI,BLK,BX,HRB,SQ,OWL,BK,BA,BOKF,BKNG,BAH,BWA,SAM,BXP,BSX,BYD,BFAM,BHF,BMY,BRX,AVGO,BR
3,BEPC,BRO,BF-A,BF-B,BRKR,BC,BLDR,BG,BURL,BWXT,CHRW,CABO,CACI,CDNS,CZR,CPT,CPB,COF,CPRI,CAH,CSL,CG,KMX,CCL,CARR,CRI,CVNA,CASY,CTLT,CAT,CBOE,CBRE,CCCS,CDW,CE,CNC,CNP,CDAY,CERT,CF,CHPT,CRL,SCHW,CHTR,CHE,CC,LNG,CHK,CVX,CMG
4,CHH,CB,CHD,CHDN,CIEN,CI,CINF,CTAS,CRUS,CSCO,C,CFG,CLVT,CLH,CLF,CLX,NET,CME,CMS,CNA,KO,CGNX,CTSH,COHR,COIN,CL,COLB,COLM,CMCSA,CMA,CBSH,ED,CAG,CNXC,CFLT,COP,STZ,CEG,COO,CPA,CPRT,CNM,GLW,CTVA,CSGP,COST,CTRA,COTY,CUZ,CR
5,CXT,CACC,CRWD,CCI,CCK,CSX,CUBE,CMI,CW,CVS,DHI,DHR,DRI,DAR,DDOG,DVA,DECK,DE,DH,DELL,DAL,XRAY,DVN,DXCM,FANG,DKS,DLR,DFS,DISH,DIS,DOCU,DLB,DG,DLTR,D,DPZ,DCI,DASH,DV,DEI,DOV,DOW,DOCS,DKNG,DRVN,DBX,DTM,DTE,DUK,DNB
6,DD,DXC,DT,EXP,EWBC,EGP,EMN,ETN,EBAY,ECL,EIX,EW,ELAN,ESTC,EA,ESI,ELV,EMR,EHC,EHAB,ENOV,ENPH,ENTG,ETR,NVST,EVA,EOG,EPAM,EPR,EQT,EFX,EQIX,EQH,ELS,EQR,ERIE,ESAB,WTRG,ESS,EL,ETSY,EEFT,EVR,RE,EVRG,ES,EXAS,EXEL,EXC,EXPE
7,EXPD,EXR,XOM,FG,FFIV,FDS,FAST,FRT,FDX,FICO,FNF,FITB,FAF,FCNCA,FHB,FHN,FR,FSLR,FE,FIS,FISV,FIVE,FIVN,FLT,FND,FLO,FLS,FMC,FNB,F,FTNT,FTV,FBIN,FOXA,FOX,BEN,FCX,FRPT,FYBR,CFR,FCN,GME,GLPI,GPS,GRMN,IT,GTES,GE,GEHC,GEN
8,GNRC,GD,GIS,GM,G,GNTX,GPC,GILD,DNA,GPN,GFS,GLOB,GL,GMED,GDDY,GS,GGG,GWW,LOPE,GPK,GO,GH,GWRE,GXO,HAL,HBI,THG,HOG,HIG,HAS,HE,HAYW,HCA,HR,PEAK,HEI-A,HEI,JKHY,HSY,HTZ,HES,HPE,HXL,DINO,HIW,HLT,HOLX,HD,HON,HZNP
9,HRL,HST,HHC,HWM,HPQ,HUBB,HUBS,HPP,HUM,HBAN,HII,HUN,H,IAC,IBM,ICUI,IDA,IEX,IDXX,ITW,ILMN,INCY,INFA,IR,INGR,PODD,IART,INTC,IBKR,ICE,IFF,IP,INTU,ISRG,IVZ,INVH,IONS,IPG,IPGP,IQV,IRM,ITT,JBL,J,JAMF,JHG,JAZZ,JBHT,JBGS,JEF


In [101]:
list[4]

FTSE_100,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
0,III.L,ABDN.L,ADM.L,AAF.L,AAL.L,ANTO.L,AHT.L,ABF.L,AZN.L,AUTO.L,AV.L,BME.L,BA.L,BARC.L,BDEV.L,BEZ.L,BKG.L,BP.L,BATS.L,BLND.L,BT-A.L,BNZL.L,BRBY.L,CNA.L,CCH.L,CPG.L,CTEC.L,CRH.L,CRDA.L,DCC.L,DGE.L,EDV.L,ENT.L,EXPN.L,FCIT.L,FLTR.L,FRAS.L,FRES.L,GLEN.L,GSK.L,HLN.L,HLMA.L,HL.L,HSX.L,HSBA.L,IHG.L,IMB.L,INF.L,IAG.L,ITRK.L
1,JD.L,JMAT.L,KGF.L,LAND.L,LGEN.L,LLOY.L,LSEG.L,MNG.L,MRO.L,MNDI.L,NG.L,NWG.L,NXT.L,OCDO.L,PSON.L,PSH.L,PSN.L,PHNX.L,PRU.L,RKT.L,REL.L,RTO.L,RMV.L,RIO.L,RR.L,RS1.L,SGE.L,SBRY.L,SDR.L,SMT.L,SGRO.L,SVT.L,SHEL.L,SMDS.L,SMIN.L,SN.L,SKG.L,SPX.L,SSE.L,STAN.L,STJ.L,TW.L,TSCO.L,ULVR.L,UU.L,UTG.L,VOD.L,WEIR.L,WTB.L,WPP.L


In [102]:
list[5]

IPC_35,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34
0,AC.MX,ALFA A.MX,ALSEA.MX,AMX L.MX,ASUR B.MX,BBAJIO O.MX,BIMBO A.MX,BOLSA A.MX,CEMEX CPO.MX,CUERVO.MX,ELEKTRA.MX,FEMSA UBD.MX,GAP B.MX,GCARSO A1.MX,GCC.MX,GFINBUR O.MX,GFNORTE O.MX,GMEXICO B.MX,GRUMA B.MX,KIMBER A.MX,KOF L.MX,LAB B.MX,LIVEPOL C-1.MX,MEGA CPO.MX,OMA B.MX,ORBIA.MX,PE&OLES.MX,PINFRA.MX,Q.MX,R A.MX,SITES B-1.MX,TLEVISA CPO.MX,VESTA.MX,VOLAR A.MX,WALMEX.MX


In [103]:
list[6]

DAX_40,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
0,ADS.DE,AIR.DE,ALV.DE,BAS.DE,BAYN.DE,BEI.DE,BMW.DE,BNR.DE,CBK.DE,CON.DE,1COV.DE,DTG.DE,DBK.DE,DB1.DE,DPW.DE,DTE.DE,EOAN.DE,FRE.DE,HNR1.DE,HEI.DE,HEN3.DE,IFX.DE,MBG.DE,MRK.DE,MTX.DE,MUV2.DE,P911.DE,PAH3.DE,QIA.DE,RHM.DE,RWE.DE,SAP.DE,SRT3.DE,SIE.DE,ENR.DE,SHL.DE,SY1.DE,VOW3.DE,VNA.DE,ZAL.DE


In [104]:
list[7]

IBEX_35,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
0,ANA.MC,ACX.MC,ACS.MC,AENA.MC,AMS.MC,MTS.MC,SAB.MC,SAN.MC,BKT.MC,BBVA.MC,CABK.MC,CLNX.MC,ENG.MC,ELE.MC,FDR.MC,FER.MC,GRF.MC,IAG.MC,IBE.MC,ITX.MC,IDR.MC,COL.MC,MAP.MC,MEL.MC,MRL.MC,NTGY.MC,PHM.MC,RED.MC,REP.MC,ROVI.MC,SGRE.MC,SLR.MC,TEF.MC


In [105]:
list[8]

CAC_40,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
0,AI.PA,AIR.PA,ALO.PA,MT.AS,CS.PA,BNP.PA,EN.PA,CAP.PA,CA.PA,ACA.PA,BN.PA,DSY.PA,ENGI.PA,EL.PA,ERF.PA,RMS.PA,KER.PA,OR.PA,LR.PA,MC.PA,ML.PA,ORA.PA,RI.PA,PUB.PA,RNO.PA,SAF.PA,SGO.PA,SAN.PA,SU.PA,GLE.PA,STLAP.PA,STMPA.PA,TEP.PA,HO.PA,TTE.PA,URW.PA,VIE.PA,DG.PA,VIV.PA,WLN.PA


In [106]:
list[9]

EUROSTOXX_50,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
0,ADS.DE,ADYEN.AS,AD.AS,AI.PA,AIR.PA,ALV.DE,ABI.BR,ASML.AS,CS.PA,BAS.DE,BAYN.DE,BBVA.MC,SAN.MC,BMW.DE,BNP.PA,CRG.IR,BN.PA,DB1.DE,DPW.DE,DTE.DE,ENEL.MI,ENI.MI,EL.PA,FLTR.IR,RMS.PA,IBE.MC,ITX.MC,IFX.DE,INGA.AS,ISP.MI,KER.PA,KNEBV.HE,OR.PA,LIN.DE,MC.PA,MBG.DE,MUV2.DE,RI.PA,PHIA.AS,PRX.AS,SAF.PA,SAN.PA,SAP.DE,SU.PA,SIE.DE,STLAM.MI,TTE.PA,DG.PA,VOW.DE,VNA.DE


In [107]:
list[10]

FTSEMIB_40,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
0,A2A.MI,AMP.MI,AZM.MI,BGN.MI,BMED.MI,BAMI.MI,BPE.MI,BZU.MI,CPR.MI,CNHI.MI,DIA.MI,ENEL.MI,ENI.MI,ERG.MI,RACE.MI,FBK.MI,G.MI,HER.MI,IP.MI,ISP.MI,INW.MI,IG.MI,IVG.MI,LDO.MI,MB.MI,MONC.MI,NEXI.MI,PIRC.MI,PST.MI,PRY.MI,REC.MI,SPM.MI,SRG.MI,STLAM.MI,STMMI.MI,TIT.MI,TEN.MI,TRN.MI,UCG.MI,UNI.MI


In [108]:
list[11]
#Fill values in list[11] with as many zeroes as values need for them to be 7 digits long. Sure

HANGSENG_73,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
0,5.HK,11.HK,388.HK,939.HK,1299.HK,1398.HK,2318.HK,2388.HK,2628.HK,3968.HK,3988.HK,2.HK,3.HK,6.HK,1038.HK,2688.HK,12.HK,16.HK,17.HK,101.HK,688.HK,823.HK,960.HK,1109.HK,1113.HK,1997.HK,2007.HK,6098.HK,1.HK,27.HK,66.HK,175.HK,241.HK,267.HK,288.HK,291.HK,316.HK,386.HK,669.HK,700.HK,762.HK,857.HK,868.HK,881.HK,883.HK,941.HK,968.HK,981.HK,992.HK,1044.HK
1,1088.HK,1093.HK,1177.HK,1211.HK,1378.HK,1810.HK,1876.HK,1928.HK,1929.HK,2020.HK,2269.HK,2313.HK,2319.HK,2331.HK,2382.HK,3690.HK,3692.HK,6862.HK,9618.HK,9633.HK,9888.HK,9988.HK,9999.HK,,,,,,,,,,,,,,,,,,,,,,,,,,,


<font color= 'black'> 

$x_i\in [x_1,x_{500}] \hookrightarrow S\&P500$ symbols adj closes will be fetched for the optimizations time series forecasting: <br>

In [119]:
test = list[11]
test

HANGSENG_73,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
0,5.HK,11.HK,388.HK,939.HK,1299.HK,1398.HK,2318.HK,2388.HK,2628.HK,3968.HK,3988.HK,2.HK,3.HK,6.HK,1038.HK,2688.HK,12.HK,16.HK,17.HK,101.HK,688.HK,823.HK,960.HK,1109.HK,1113.HK,1997.HK,2007.HK,6098.HK,1.HK,27.HK,66.HK,175.HK,241.HK,267.HK,288.HK,291.HK,316.HK,386.HK,669.HK,700.HK,762.HK,857.HK,868.HK,881.HK,883.HK,941.HK,968.HK,981.HK,992.HK,1044.HK
1,1088.HK,1093.HK,1177.HK,1211.HK,1378.HK,1810.HK,1876.HK,1928.HK,1929.HK,2020.HK,2269.HK,2313.HK,2319.HK,2331.HK,2382.HK,3690.HK,3692.HK,6862.HK,9618.HK,9633.HK,9888.HK,9988.HK,9999.HK,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [None]:
tickers = fn.SP500_tickers(50)
tickers[0][:5], tickers[-1][0:5], sum([len(i) for i in tickers])

<span style='color:gray'> *Note: Skip to 1.1.2 if you prefer using .csv creation date.* <font> &nbsp; 

<font color= 'black'> 

$6_Y$ $x_i\in [x_1,x_{500}] \hookrightarrow S\&P500$ Adj. closes are fetched *(5min)* :<br>

<span style='color:teal'> [dt.get_historical_price_data](https://github.com/EstebanMqz/SP500-Risk-Optimized-Portfolios-PostCovid-ML/blob/main/data.py):

In [None]:
SP_Assets_f = pd.concat([dt.get_historical_price_data(tickers[i][j], 6) 
                         for i in range(0, len(tickers)) for j in range(0, len(tickers[i]))], axis=1)
SP_Assets_f.shape

In [None]:
SP_Assets_f.shape

<font color= 'black'> 

Adj. closes for $S\&P500$:

In [None]:
SP_f = dt.get_historical_price_data('^GSPC', 6)
SP_f = SP_f[SP_f.index.isin(SP_Assets_f.index)]
SP_f.shape

<font color= 'black'> 

Fetched data is saved in [*Data*](https://github.com/EstebanMqz/SP500-Risk-Optimized-Portfolios-PostCovid-ML/tree/main/Data) subdirectory:<br>
+ `Assets_SP500.csv`
+ `SP500_index.csv`

In [None]:
SP_Assets_f.to_csv("Data/Assets_SP500.csv")   
SP_f.to_csv("Data/SP500.csv")
SP_f = pd.read_csv("Data/SP500.csv", index_col=0)
SP_Assets_f = pd

<font color= 'black'> 

Fetched $x_i$ data:

In [None]:
SP_Assets_f.head(8)

In [None]:
SP_Assets_f.tail(8)


<font color= 'black'> 

To skip data fetching if needed, a data reader is made available:

In [None]:
SP_r = pd.read_csv("Data/SP500.csv", index_col=0)
SP_Assets_r = pd.read_csv("Data/Assets_SP500.csv", index_col=0)

#### <font color= 'blue'> 2.2 Data Exploration <font>

<font color= 'black'> 

Defining Returns:

<div class="alert alert-block alert-info">

Accumulated Simple and Log Returns:

+ *Multiplicative - Additive*: <br>

$r_t = \bigg(\frac{P_{t+1}}{P_{t}}-1\bigg)$

Simple Returns $R_t$ are multiplicative because the following is true to calculate compounded interests

$$1 + \sum_{t=1}^{n} r_t = 1+r_1+r_2+...+r_n \neq \prod_{t=1}^{n} (1+r_t) = \prod_{t=1}^{n}\bigg(1+\frac{P_{t+1}}{P_{t}-1}\bigg)$$

$\therefore$ Accumulated $R_t$ is only represented from the exponential law: $\bigg(1+\frac{P_{t+1}}{P_{t}}\bigg)^n$ as *multiplicative*:  $$\sum_{t=1}^{n} R_t = \prod_{t=1}^{n}(1+r_t)$$

On the other hand, Accumulated $r_t$ is only represented from the exponential law: $\mathrm{e}^{{P_t} \times {P_{{t+1}}}}= \mathrm{e}^{{P_t} + {P_{{t+1}}}}$ as *additive*:

$$\sum_{t=1}^{n} r_t = \bigg[{\mathrm{e}^{\sum_{t=1}^{n} ln (1+r_t) }} \bigg]$$

As conclusion:

$$\prod_{t=1}^{n}(1+r_t) \implies \bigg[{\mathrm{e}^{\sum_{t=1}^{n} ln (1+r_t) }} \bigg]$$

<font color= 'white'> 

#### Simple $R_t$ and $ln(1+ r_t)$ Returns:

Their Characteristics involve respectively:
+ *Multiplicative - Additive*:<br>
Simple Returns $R_t$ are multiplicative. Whereas  of what it was previously as it was previously demonstrated.


+ *Not Symmetric - Symmetric*:<br>
$R_t$ distribution can have $\pm$ skew which makes the $Mo$, median and $\mu$ not centered in $f(x)$.<br>
$ln(r_t)$ distribution is symmetrical which makes the $Mo$, median and $\mu$ centered in $f(x)$.

+ *Not Stationary - Stationary*:<br>
$R_t$ has a trend so they are not stationary, nor *i.i.d* and therefore correlated. $ln(1+ r_t)$ are stationary, therefore *i.i.d* and not correlated.

+ *Not Independent - Independent*: <br>
$R_t$ are not *i.i.d* because they are non stationary, they do have a trend so they are correlated and ultimately multiplicative.<br>
On the other hand $ln(r_t)$ are *i.i.d* because they are stationary, they do not have a trend so they aren't correlated, which makes them additive.

Simple Returns $R_t$ are correlated (not i.i.d) so they shouldn't be used to generate continous random variable simulations. <br>
On the other hand, Log Returns are i.i.d because their additive nature makes not correlated, therefore they can be used to generate continous random variables.


<font color= 'black'> 

Nevertheless, both Returns will be compared $\forall$ $x_i\in [x_1,x_{500}]$ $\hookrightarrow$ $S\&P500$ in Data Exploration.<br>
And $\forall$ $x_{i}\in [x_1,x_{n=25}]$ $\hookrightarrow$ ${max|min}_{\vec{w_{j\neq{i}}}} R_{k}$ Descriptive Statistics.

<font color= 'black'> 

[Continous](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html) random variables distributions in Scipy for $ln(1+ r_t)$:

In [None]:
continuous = [d for d in dir(st) if isinstance(getattr(st, d), getattr(st, "rv_continuous"))]
discrete = [d for d in dir(st) if isinstance(getattr(st, d), getattr(st, "rv_discrete"))]
pd.DataFrame(continuous).rename(columns={0:"Continuous"}).T

<font color= 'black'> 

[Discrete](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_discrete.html) random variables distributions in Scipy not considered:

In [None]:
pd.DataFrame(discrete).T.rename(index={0:"Discrete"})

<font color= 'black'> 
Required dates are shorter, they will be modified the rest of the project: 
<br><br>


<font color= 'gray'> 


Start Date modified:<br>
+ from $(2017-05-23)$ $\to$ $(2020-03-02)$ <br> 

Resulting dates:
+ from $(2020-03-02) \to (2023-05-19)$

<font color= 'black'> 

Variables that will be used for the rest of the project are defined for the mentioned dates:

In [None]:
rf, best, r_jump, start, end = .00169, 25, 0.05, "2020-03-02", SP_Assets_r.tail(1).index[0]
prices_start = SP_Assets_r.loc[start:end]

<font color= 'black'> 

Symbols with missing values are located with a DQR:

<span style='color:teal'>[`dt.DQR`](https://github.com/EstebanMqz/SP500-Risk-Optimized-Portfolios-PostCovid-ML/blob/main/data.py)<br>

In [None]:
DQR_start = dt.DQR(prices_start).sort_values(by='Missing_Values', ascending=False)
DQR_start.head(10)

In [None]:
index_missing = (DQR_start[DQR_start['Missing_Values'] >= 1].index).values
index_missing, index_missing.shape[0]

<font color= 'black'> 

498 columns are left by removing missing values. Prices have a new shape defined by *(actual_cols = original_cols - missing_cols)*:

In [None]:
prices_start = prices_start.drop(index_missing, axis=1)
len(prices_start.T), sum([len(i) for i in tickers]), index_missing.shape[0]

<span style='color:teal'> [`dt.data_describe`](https://github.com/EstebanMqz/SP500-Risk-Optimized-Portfolios-PostCovid-ML/blob/main/data.py)<br>

+ <span style='color:teal'> [`fn.VaR`](https://github.com/EstebanMqz/SP500-Risk-Optimized-Portfolios-PostCovid-ML/blob/main/functions.py)

##### <font color= 'blue'> 2.2.1 Prices <font> 

<font color= 'black'>
Statistical descriptions of the prices, sorted by their Total Change is shown for the period:

In [None]:
prices_stats = dt.data_describe(prices_start, 'Prices', .00169, start, end)
prices_stats = prices_stats.T.sort_values(by = 'Total_Change', ascending = False).T
prices_stats

<font color= 'white'> 

Simple Returns $x_i\in [x_1,x_{500}] \hookrightarrow S\&P500$ are:<br>
Not symmetric, they are skewed, they are not stationary and they are not i.i.d.

In [None]:
Stats_Simple, Simple_ret = data_describe(prices_start, 'Simple', rf, start, end)[:2]
Stats_Simple = Stats_Simple.T.sort_values(by = 'Yr_Return', ascending = False).T
Stats_Simple

<font color= 'white'> 

Considering Quotes are sorted by Yr_Return, what the following chart shows is basically that even though there are quotes who were highly ranked in terms of Returns, not so much in terms of negative volatility, so people or firms could still have lost.

In [None]:
fig, ax = plt.subplots(figsize=(20, 10))
sns.distplot(.T.Yr_Return, bins=50, color = 'blue', label = '$R_t$')
sns.kdeplot(Stats_Simple.T.Yr_Return, color = 'orange', linestyle="--")
sns.distplot(Stats_Simple.T.Yr_Return, bins=50, color = 'blue', label = '$R_t$')
sns.kdeplot(Stats_Simple.T.Yr_Return, color = 'orange', linestyle="--")

plt.show()

In [None]:
df_col, df_index = Stats_Simple.T['Sortino'], (Stats_Simple.T['Sortino'].index)
x_arange, y_arange = np.arange(0, Stats_Simple.T['Sortino'].index.shape[0], 10), np.arange(round(Stats_Simple.T['Sortino'].min(), 2), round(Stats_Simple.T['Sortino'].max(), 2), .10)
title = (str(Stats_Simple.T['Sortino'].index.name) + " Simple Sortino Ratio from " + str(start) + " to " + str(end))
x_label, y_label = "Datasets $x_i$", "Sortino Ratio"

vs.cmap_bar(df_col, df_index, x_arange, y_arange, title, x_label, y_label)

<font color= 'black'> 

As it was stated, not all risks are bad, so in this case the biggest winners had the most uncertainty which caused by rapid fluctuations which ended in a positive way. Tesla's volatility was one of the highest as well as its sucess in the Top 10.

In [None]:
df_col, df_index = Stats_Simple.T['Yr_Std'].head(50), (Stats_Simple.T['Yr_Std'].head(50).index)
x_arange, y_arange = np.arange(0, Stats_Simple.T['Yr_Std'].head(50).index.shape[0], 1), np.arange(round(Stats_Simple.T['Yr_Std'].head(50).min(), 2), round(Stats_Simple.T['Yr_Std'].head(50).max(), 2), .05)
title = (str(Stats_Simple.T['Yr_Std'].head(50).index.name) + " best 50 $R_t$ with $\sigma_{Yr}$ from" + str(start) + " to " + str(end))
x_label, y_label = "Datasets $x_i$", "Std. Deviation Yrly."

std_head = vs.cmap_bar(df_col, df_index, x_arange, y_arange, title, x_label, y_label)

df_col, df_index = Stats_Simple.T['Yr_Std'].tail(50), (Stats_Simple.T['Yr_Std'].tail(50).index)
x_arange, y_arange = np.arange(0, Stats_Simple.T['Yr_Std'].tail(50).index.shape[0], 1), np.arange(round(Stats_Simple.T['Yr_Std'].tail(50).min(), 2), round(Stats_Simple.T['Yr_Std'].tail(50).max(), 2), .05)
title = (str(Stats_Simple.T['Yr_Std'].tail(50).index.name) + " worst 50 $R_t$ with $\sigma_{Yr}$ from" + str(start) + " to " + str(end))
x_label, y_label = "Datasets $x_i$", "Std. Deviation Yrly."

std_tail = vs.cmap_bar(df_col, df_index, x_arange, y_arange, title, x_label, y_label)

In [None]:
Stats_Log = data_describe(prices_start, 'Log_returns', .00169, start, end)[0]
Stats_Log

<span style='color:teal'> [`vs.cmap_bar`](https://github.com/EstebanMqz/SP500-Risk-Optimized-Portfolios-PostCovid-ML/blob/main/visualizations.py)<br>

<font color= 'white'> 

Total Price Changes $\frac{P_{{(2023-05-19)}}}{P_{(2020-03-02)}} - 1$ &nbsp; $\forall$ &nbsp; $x_i\in [x_1,x_{500}] \hookrightarrow S\&P500$ Statistical Descriptions:

In [None]:
df_col, df_index = prices_stats.T['Total_Change'], (prices_stats.T['Total_Change'].index)
x_arange, y_arange = np.arange(0, prices_stats.T['Total_Change'].index.shape[0], 10), np.arange(round(prices_stats.T['Total_Change'].min(), 2), round(prices_stats.T['Total_Change'].max(), 2), .25)
title = (str(prices_stats.T['Total_Change'].index.name) + " Total Change from " + str(start) + " to " + str(end))
x_label, y_label = "Datasets $x_i$", "Total Price Change"

vs.cmap_bar(df_col, df_index, x_arange, y_arange, title, x_label, y_label)

In [None]:
fig, ax = plt.subplots(figsize=(20, 10))
sns.distplot(Stats_Simple.T.Yr_Return, bins=50, color = 'blue', label = '$R_t$')
sns.kdeplot(Stats_Simple.T.Yr_Return, color = 'orange', linestyle="--")
sns.distplot(Stats_Simple.T.Yr_Return, bins=50, color = 'blue', label = '$R_t$')
sns.kdeplot(Stats_Simple.T.Yr_Return, color = 'orange', linestyle="--")

plt.show()

In [None]:
Simple.T.Simple_skew.plot(kind="bar", figsize=(20, 5), color="orange", alpha=.5, label="Simple Skew")
Log.T.Logret_skew.plot(kind="bar", figsize=(20, 5), color="yellow", alpha=.5, label="Log Skew")
plt.xticks(rotation=0, fontsize=2)
#Drop x ticks and make new ticks from 0 to 500 with a step of 10
plt.xticks(np.arange(0, 500, 10))
plt.xticks(fontsize=8)
#x ticks rotation
plt.xticks(rotation=70)

plt.xlabel("Assets")
plt.ylabel("Skew")
plt.title("Simple Skew")
plt.legend()
plt.show()

#Simple.T.Simple_kurtosis.plot(kind="bar", figsize=(20, 5), color="green", alpha=.5, label="Simple Kurtosis")


In [None]:
def Dist_KDE(dataframe1, dataframe2, dist_label1, dist_label2, x_ticks, y_ticks):
    """
    Function to plot yearly returns distribution and kde with Yearly Simple & Log returns in index as dataframe with quotes in cols.
    Parameters:
    ----------
    Simple: dataframe
        Dataframe with simple returns.
    Log: dataframe
        Dataframe with log returns.
    color: str
        Color for plot ticks, labels and title text
    Returns:
    -------
    Plot with yearly returns.
    """
    fig, ax = plt.subplots(figsize=(20, 10))
    sns.distplot(dataframe1, bins=50, color = 'red', label = dist_label1)
    sns.distplot(dataframe2, bins=50, color = 'blue', label = dist_label2)

    sns.kdeplot(dataframe1, color = 'orange', linestyle="--")
    sns.kdeplot(dataframe2, color = 'teal', linestyle="--")

    plt.title(title,  fontsize=15)
    plt.grid(color='gray', linestyle='--')
    #plt.yticks(y_ticks)
    plt.xticks(x_ticks, rotation=45, fontsize=9)
    plt.xlabel(x_label, fontsize=15), plt.ylabel(y_label, fontsize=15)
    #plt.margins(x=0, y=0)
    #plt.tight_layout()
    ax.xaxis.label.set_color('red'), ax.yaxis.label.set_color('blue')
    ax.tick_params(axis='x', colors='white'), ax.tick_params(axis='y', colors='white')
    plt.legend()

    return plt.show()

def Yearly_Returns(Simple, Log, color):
    """
    Function to plot yearly returns distribution and kde with Yearly Simple & Log returns in index as dataframe with quotes in cols.
    Parameters:
    ----------
    Simple: dataframe
        Dataframe with simple returns.
    Log: dataframe
        Dataframe with log returns.
    color: str
        Color for plot ticks, labels and title text
    Returns:
    -------
    Plot with yearly returns.
    """
    fig, ax = plt.subplots(figsize=(20, 10))
    sns.distplot(Simple.T.Yr_Return, bins=50, color="red", label="Yearly Simple $R_t$")
    sns.distplot(Log.T.Yr_Return, bins=50, color="blue", label="Yearly Log $r_t$")

    sns.kdeplot(Simple.T.Yr_Return, color="orange", linestyle="--")
    sns.kdeplot(Log.T.Yr_Return, color="teal", linestyle="--")

    plt.title("$x_i\in [x_1,x_{500}]$ in S&P500 Yearly Returns", size=20).set_color(color)
    plt.xticks(np.arange(round(min(Simple.T.Yr_Return), 1)*1.5, round(max(Simple.T.Yr_Return), 1)*1.5, 0.05))
    plt.xticks(rotation=45)
    plt.xlabel("Yearly Returns")
    plt.ylabel("Frequency")
    ax.xaxis.label.set_color(color), ax.yaxis.label.set_color(color)
    ax.tick_params(axis='x', colors=color), ax.tick_params(axis='y', colors=color)


    plt.grid(color='gray', linestyle='--')
    plt.legend()

    plt.show()


def Stationarity(x, y, n):
    """
    Function that plots a time-series and its Trend, Seasonality and Residuals 
    returning the Augmented Dickey Fuller p-value for given n periods.

        Parameters
        ----------
        x: DateTime values from economic index (col). #data_raw['DateTime']
        y: Actual values from economic index (col). #data_raw['Actual']
        n: Periods for decomposition (int).

        Returns
        -------
        lines+marker Series, Trend, Seasonality and Residuals plots in a didactic graph with plotly.
    """

    decomposition = seasonal_decompose(y, period = n)

    trend = decomposition.trend
    seasonal = decomposition.seasonal
    residual = decomposition.resid

    fig = make_subplots(rows = 4, cols = 1, shared_xaxes = False, 
                        subplot_titles = ('Actual', 'Trend', 'Seasonal', 'Residuals'),
                        vertical_spacing = 0.15, row_width = [0.25, 0.25, 0.25, 0.25])

    fig.add_trace(go.Scatter(x=x, y=y, mode='lines+markers', name='Actual',
         line=dict(color='black'), marker=dict(symbol=2, color='black')))

    fig.add_trace(go.Scatter(x=x, y=trend, mode='lines+markers', name='Trend',
         line=dict(color='black'), marker=dict(symbol=2, color='blue')), row = 2, col = 1)

    fig.add_trace(go.Scatter(x=x, y=seasonal, mode='lines+markers', name='Seasonal',
         line=dict(color='black'), marker=dict(symbol=2, color='green')), row = 3, col = 1)

    fig.add_trace(go.Scatter(x=x, y=residual, mode='lines+markers', name='Residuals',
         line=dict(color='black'), marker=dict(symbol=2, color='gray')), row = 4, col = 1)

    fig.show(),fig.show("png")
   
    return "p-value:", adfuller(y)[1], 


def qq(index):
    """
    Function that graphs a QQ-plot intended to model economic index Actual values.

        Parameters
        ----------
        index: Actual values from economic index (col) 

        Returns
        -------
        QQ-plot for given data.
    """
    sm.qqplot(index, line= 'q', fit  = True)
    pylab.show()    


### <font color= 'lightblue'> 2. Descriptive Statistics: <font>

<font color= 'black'> 

Statistical descriptions are the foundation of the knowledge from data and valuable insights can be communicated.<br>
In this case, statistical descriptions establish foundations that can analyze new data at any given time and evaluate for example<br>
if it is more feasible due to reasons that aren't captured by data to include $\vec{w_{i\neq j}}$ or have $\vec{w_{i}}$ adjusted and/or discarded from ${max|min}_{\vec{w_i}} R_{j_+}$ 

For future references, fitted params. estimators $f(\hat{X_i})$ will be obtained and their relative qualities assesed.

<span style='color:teal'> [`vs.selection_data`](https://github.com/EstebanMqz/SP500-Risk-Optimized-Portfolios-PostCovid-ML/blob/main/visualizations.py)

<font color= 'black'> 

$x_i\in [x_1 , x_{25}] \hookrightarrow R_{Sortino_{+_{25}}}$

In [None]:
pd.DataFrame(((Sortino.sort_values(by="sortino", ascending=False).head(25).T).iloc[7:, :]).mean(axis=1)).rename(columns={0:"Equiprob. xi mean"})

In [None]:
prices, r_log, summary_log = vs.selection_data(SP_Assets_r, "Log", rf, best, start, end)
prices, r_simple, summary_simple = vs.selection_data(SP_Assets_r, "Simple", rf, best, start, end)

<font color= 'black'> 

Log Returns $r_t$ Data Selection from which optimizations will be performed are the following:

In [None]:
d.Markdown(tabulate(summary_log, headers='keys', tablefmt='pipe'))

[vs.BoxHist](https://github.com/EstebanMqz/SP500-Risk-Optimized-Portfolios-PostCovid-ML/blob/main/visualizations.py)

In [None]:
def BoxHist(data, output, bins, color, label, title, start, end):
    """Boxplot and Histogram for selected output method for returns method for data, assuming equiprobable weights.
    Parameters
    ----------
    data : DataFrame
        Data to plot.
    output: str
        'prices' or 'log_returns' string to return its stats.
    bins : int
        Number of bins for histogram.
    color : str
        Color for plots.
    x1_label : str
        x1_label for boxplot.
    x2_label : str
        x2_label for histogram.
    title : str
        Title for both plots.
    start : str
        Start date for Stats calculations from dt.data_describe.
    end : str
        End date for Stats calculations from dt.data_describe.
    Returns
    -------
    Boxplot and Histogram with Stats visualization 


    Returns
    -------
    Boxplot and Histogram of Returns Method with its dt.describe_stats summary with equiprobable weights.
    """
    plt.style.use("classic")
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(22, 8))
    data.plot.box(ax=ax1, color=color, vert=False)
    Box_Stats = pd.DataFrame(((dt.data_describe(data, output, .00169, start, end).sort_values(by="sortino", 
    ascending=False).head(25).T).iloc[7:, :]).mean(axis=1)).rename(columns={0:"Equiprob. xi mean"})

    plt.text(0.05, 0.05, data.describe().round(6).to_string(), transform=ax1.transAxes)

    ax1.set_xlabel(label)
    sns.histplot(data, bins=bins, kde=True, alpha=0.5, ax=ax2).legend().remove()
    for patch in ax2.patches:
        patch.set_facecolor(color)
    ax2.set_yticklabels(["{:.2f}%".format(x/10000) for x in ax2.get_yticks()])
    ax2.set_ylabel("Probability")
    ax2.set_xlabel(label)
    fig.suptitle(str(label) + title, fontsize=18)
    ax1.grid(color="gray", linestyle="--"), ax2.grid(color="lightgray", linestyle="--")
    #Face color for plots
    ax1.set_facecolor("lightgray"), ax2.set_facecolor("lightgray")

    plt.show()

In [None]:
BoxHist(r_log.mean()*252, 30, 'blue', "$\mu_{Yr}{{r_{t}}(x_i)$", "$\in [x_1,x_{500}]$ $\hookrightarrow$ S&P500")

In [None]:
def BoxHistTest(data, bins, color, label, title):
    """Boxplot and Histogram for given data
    ----------
    data : DataFrame
        Data to plot.
    bins : int
        Number of bins for histogram.
    color : str
        Color for plots.
    x1_label : str
        x1_label for boxplot.
    x2_label : str
        x2_label for histogram.
    title : str
        Title for both plots.
    Returns
    -------
    Boxplot and Histogram of data
    """
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(22, 8))
    data.plot.box(ax=ax1, color=color, vert=False)
    stats = pd.DataFrame(dt.data_describe(data).mean(axis=1).round(6)).iloc[3:].rename(columns={0:label}).dropna().to_string()
    plt.text(0.05, 0.05, stats, transform=ax1.transAxes)
    ax1.set_xlabel(label)
    sns.histplot(data, bins=bins, kde=True, alpha=0.5, ax=ax2).legend().remove()
    for patch in ax2.patches:
        patch.set_facecolor(color)
    ax2.set_yticklabels(["{:.2f}%".format(x/10000) for x in ax2.get_yticks()])
    ax2.set_ylabel("Probability")
    ax2.set_xlabel(label)
    fig.suptitle(str(label) + title, fontsize=18, fontweight="bold")
    ax1.grid(color="gray", linestyle="--"), ax2.grid(color="lightgray", linestyle="--")

    plt.show()

In [None]:
pd.DataFrame(dt.data_describe(r_simple.sample(100000, replace=True)).mean(axis=1)).iloc[3:].rename(columns={0:"μ(Rt)"}).dropna()

In [None]:
Xi = pd.DataFrame(r_simple.mean(axis=1)).sample(100000, replace=True)
Xi.sort_index(inplace=True)
Xi = Xi.groupby(Xi.index).mean()
Xi.head()

In [None]:
dt.data_describe(Xi)

In [None]:
BoxHistTest(r_simple.mean().to_frame(), 30, 'blue', "$\mu_{R_{t}}(x_i)$", " Simple Returns $X_i\sim N({\mu}_{R_t}, {\sigma^2}_{R_t})$ in S&P500")
#vs.BoxHist(r_simple.var().to_frame().sample(100000, replace=True).rename(columns={0:"σ(Rt)"}), 20, 'blue', "$\sigma^2_{R_{t}}(X_i)$", " Simple Returns Variance Simulations $X_i\in [X_1,X_{500}]$")

<font color= 'black'> 

Log Returns $r_{t_n}$ :

In [None]:
prices, r_log, summary_log = vs.selection_data(SP_Assets_r, "Log", rf, best, start, execution_date)
r_log.tail()

<font color= 'black'> 

$\therefore$ Random variables ${X_i\sim N(\mu_{r_{t}}, \sigma^2_{r_{t}})}$ :

In [None]:
vs.BoxHist(r_log.mean().to_frame().rename(columns={0:"mu(rt)"}), 20, '#0998eb', "$\mu_{r_{t}}(X_i)$", " Log Returns Mean Simulations$_{100k}$ $X_i\in [X_1,X_{500}]$")
#vs.BoxHist(r_log.var().to_frame().sample(100000, replace=True).rename(columns={0:"σ(Rt)"}), 20, 'lightblue', "$\sigma^2_{R_{t}}(X_i)$", " Log Returns Variance Simulations $X_i\in [X_1,X_{500}]$")

<font color= 'black'> 

The Simple Returns mean differ from Log Returns just enough for the model not to be modelled correctly.<br>
Nevertheless, compounding effects among other factors make Log Returns best suitable for the model. 

##### <font color= 'lightblue'> 

2.1.3 Cumulative $\bold{_{R_{t}}(X_i)}$ $\&$ Log $\bold{\mu_{r_{t}}(X_i)}$

<font color= 'black'> 

Simple Returns $R_{t}$ :

In [None]:
r_simple_acum = ((1+r_simple).cumprod()-1)
r_simple_acum = r_simple_acum.T[(r_simple_acum.T >= -1) & (r_simple_acum.T <= 1)].T
r_simple_acum = r_simple_acum.fillna(method='ffill')
r_simple_acum.tail()

<font color= 'black'> 

$\therefore$ Random variables ${X_i\sim N(\mu_{R_{t}}, \sigma^2_{R_{t}})}$ :

In [None]:
vs.BoxHist(r_simple_acum.mean().sample(100000, replace=True).rename(index={0:"Accum_mu(Rt)"}), 30, '#d1423d', "Accum. $\mu_{R_{t}}(X_i)$", "Simple Returns Mean Simulations$_{100k}$ $X_i\in [X_1,X_{500}]$")

In [None]:
vs.BoxHist(r_simple_acum.var().sample(100000, replace=True).rename(index={0:"Accum_s(Rt)"}), 30, '#eb1c15', "Accum. $\sigma^2_{R_{t}}(X_i)$", "Simple Returns Variance Simulations$_{100k}$ $X_i\in [X_1,X_{500}]$")

<font color= 'black'> 

Log Returns $r_{t_n}$ :

In [None]:
r_log_acum = ((1+r_log).cumprod()-1)
r_log_acum.tail()

<font color= 'black'> 

$\therefore$ Random variables ${X_i\sim N(\mu_{R_{t}}, \sigma^2_{R_{t}})}$ :

In [None]:
vs.BoxHist(r_log_acum.mean().to_frame().sample(100000, replace=True).rename(columns={0:"Accum_mu(rt)"}), 20, '#0e7a04', "$\mu_{r_{t}}(X_i)$", " Log Returns Mean Simulations $X_i\in [X_1,X_{500}]$")
#vs.BoxHist(r_log.var().to_frame().sample(100000, replace=True).rename(columns={0:"σ(Rt)"}), 20, 'green', "$\sigma^2_{R_{t}}(X_i)$", " Log Returns Variance Simulations $X_i\in [X_1,X_{500}]$")

In [None]:
print("Outliers:", len(r_log_acum.mean(axis=1)[abs(r_log_acum.mean(axis=1) - np.mean(r_log_acum.mean(axis=1))) < 2 * np.std(r_log_acum.mean(axis=1))].to_frame())/100000)

In [None]:
vs.BoxHist(r_log_acum.var().to_frame().sample(100000, replace=True).rename(columns={0:"Accum_s(rt)"}), 20, '#25c716', "Accum. $\sigma^2_{r_{t}}(X_i)$", " Log Returns Variance Simulations $X_i\in [X_1,X_{500}]$")

<div class="alert alert-block alert-info">

Sharpe's Ratio measures the units of risk *($\small \sigma$)* per unit of excess returns over a risk-free rate *($\small rf$)* :
+ $R_{Sharpe} = \frac{\mu_i - {rf}}{\sigma_i(r_t)}$.
<br>

Sortino's Ratio measures the units of negative risks *[$\sigma_{i}\small(r_{t\leq 0})$]* per unit of excess returns over a risk-free rate *($\small rf$)* :
+ $R_{Sortino} = \frac{\mu_i - {rf}}{\sigma_{i}(r_{t\leq 0})}$ 

To avoid risks associated to negative returns, Data Selection $\forall X_i\in [X_1,X_{500}] \rightarrow X_{P{_{R{max_{j}}}}}$ is based on $S\&P500$ *Sortino's Ratio Top 25*:

In [None]:
fn.retSLog_Selection(SP_Assets_r, rf, best, start, execution_date)


In [None]:
vs.Selection_R_SLog_Plot(SP_Assets_r, rf, best, start, execution_date, r_jump)

##### <font color= 'blue'> <br> 2.2 Modelling $X_i$ <font>

In [None]:
def Stats(dataframe, Selection, r, P, percentiles, dist, title, color):
    """
    Stats is a function that resamples data from a Selection performed over a dataframe.
    Parameters:
    ----------
    dataframe : dataframe
        Dataframe from which the Selection is made, in order to acess Selection's original data.
    Selection : list
        Selection to Resample for given period(s) etc. basis whose period is longer than original data.
    r : str
        Type of return for the model: "Simple" (multiplicative) or "Log" (additive).
    P : str
        Period of Resample (e.g. "W" for Weekly, "M" for Monthly, "3T" for Trimestral, "Q" for Quarterly,
        "Y" for Yearly, etc. for Dataframe.resample (see refs.).
    percentiles : list
        List of Returns of Percentiles returned by vs.Stats[0] dataframe (e.g. [.05, .25, .5, .75, .95]).
    dist : list
        Continuous Distributions to fit on datasets Xi
    title : str
        Title of the Box-plot
    color : str
        Color of the Box-plot.
    Returns:
    -------
    describe : dataframe
        Stats returns summary statistics (mean, std, min, max, percentiles, skewness and kurtosis) in a 
        markdown object callable as a dataframe by assigning a variable to the function in pos. [2].  
    """
    
    if  r == "Simple" :
        Selection = (dataframe[Selection.index].pct_change()).iloc[1:, :].dropna(axis = 1)
    if  r == "Log" :
        Selection = np.log(dataframe[Selection.index]).diff().iloc[1:, :].dropna(axis = 1)
    if r != "Simple" and r != "Log" :
        print("Aborted: Please select a valid Return type: 'Simple' or 'Log'. Stats help command: help(vs.Stats)")
    
    Selection.index = pd.to_datetime(Selection.index)
    Selection_Mo_r = Selection.resample(P).agg(lambda x: x[-1])
    Selection_Mo_r.plot(kind = "box", figsize = (22, 13), title = title, color = color, fontsize = 13)
    
    for i in range(0, len(Selection_Mo_r.columns)):
        plt.text(x = i + 0.96 , y = Selection_Mo_r.iloc[:, i].mean() + .0075, s = str("$\mu$ = +") + str(round(Selection_Mo_r.iloc[:, i].mean(), 4)), fontsize = 6.5, fontweight = "bold", color = "lightgreen")
        plt.text(x = i + 0.98 , y = Selection_Mo_r.iloc[:, i].max() + .010, s = str("+") + str(round(Selection_Mo_r.iloc[:, i].max(), 3)), fontsize = 8.5, color = "green")
        plt.text(x = i + 0.98 , y = Selection_Mo_r.iloc[:, i].min() - .015, s = str(round(Selection_Mo_r.iloc[:, i].min(), 3)), fontsize = 8.5, color = "red")

    describe = Selection_Mo_r.describe(percentiles)
    describe["mode"] = Selection_Mo_r.mode().iloc[0, :]
    describe["skewness"] = st.skew(Selection_Mo_r)
    describe["kurtosis"] = st.kurtosis(Selection_Mo_r)
    describe.replace("\n", "")

    dist_fit = np.empty(len(Selection_Mo_r.columns), dtype=object)
    
    for i in range(0, len(Selection.columns)):
        f = Fitter(pd.DataFrame(Selection_Mo_r.iloc[:, i]), distributions = dist, timeout=5)
        f.fit()
        params, AIC, BIC = [StringIO() for i in range(3)]
        (print(f.get_best(), file=params)), (print(f.get_best(method="aic"), file=AIC)), (print(f.get_best(method="bic"), file=BIC))
        params, AIC, BIC = [i.getvalue() for i in [params, AIC, BIC]]
        dist_fit[i] = (params + AIC + BIC).replace("\n", ", ")
    
    plt.title(title, fontsize = 20)
    plt.axhline(0, color = "red", lw = .5, linestyle = "--")
    plt.axhspan(0, Selection_Mo_r.min().min(), facecolor = "red", alpha = 0.2) 
    plt.axhspan(0, Selection_Mo_r.max().max(), facecolor = "green", alpha = 0.2)

    plt.xticks(rotation = 45)
    for i, t in enumerate(plt.gca().xaxis.get_ticklabels()):
        if (i % 2) != 0:
            t.set_color("lightgreen")
        else:
            t.set_color("white")
            
    plt.yticks(np.arange(round(Selection_Mo_r.min().min(), 1), round(Selection_Mo_r.max().max(), 1), 0.05))
    plt.grid(alpha = 0.5, linestyle = "--", color = "grey")
    IPython.core.display.clear_output() 
    return describe, dist_fit, plt.show()

In [None]:
Sortino25[2]

In [None]:
Selection.tail()

In [None]:
(SP_Assets_r.loc[start:today][Sortino25[2].index]).pct_change().iloc[1:, :].dropna(axis = 1).tail()

In [None]:
np.log(SP_Assets_r.loc[start:today][Sortino25[2].index]).diff().iloc[1:, :].dropna(axis = 1).tail()

In [None]:
SP_Assets_r.loc[start:today], Sortino25[2]

In [None]:
dist=([d for d in dir(st) if isinstance(getattr(st, d), getattr(st, "rv_continuous"))])[0:60]

def ret(dataframe, selection, r):
    if  r == "Simple" :
        returns = (dataframe[selection.index]).pct_change().iloc[1:, :].dropna(axis = 1)
    if  r == "Log" :
        returns = np.log(dataframe[selection.index]).diff().iloc[1:, :].dropna(axis = 1)   
    if r != "Simple" and r != "Log" :
        print("Aborted: Please select a valid Return type: 'Simple' or 'Log'. selection_data help command: help(vs.selection_data)")
    
    returns.index = pd.to_datetime(returns.index)
    returns_Mo_r = returns.resample("M").agg(lambda x: x[-1])
    returns_Mo_r.plot(kind = "box", figsize = (22, 13), title = "test", color = "yellow", fontsize = 13)

    return returns, returns_Mo_r.max()

ret(SP_Assets_r.loc[start:today], Sortino25[2], "Simple")[1]


#Selection.index = pd.to_datetime(Sortino25[2].index)
# Selection_Mo_r = Selection.resample(P).agg(lambda x: x[-1])
# Selection_Mo_r.plot(kind = "box", figsize = (22, 13), title = title, color = color, fontsize = 13)

# for i in range(0, len(Selection_Mo_r.columns)):
#     plt.text(x = i + 0.96 , y = Selection_Mo_r.iloc[:, i].mean() + .0075, s = str("$\mu$ = +") + str(round(Selection_Mo_r.iloc[:, i].mean(), 4)), fontsize = 6.5, fontweight = "bold", color = "lightgreen")
#     plt.text(x = i + 0.98 , y = Selection_Mo_r.iloc[:, i].max() + .010, s = str("+") + str(round(Selection_Mo_r.iloc[:, i].max(), 3)), fontsize = 8.5, color = "green")
#     plt.text(x = i + 0.98 , y = Selection_Mo_r.iloc[:, i].min() - .015, s = str(round(Selection_Mo_r.iloc[:, i].min(), 3)), fontsize = 8.5, color = "red")

# describe = Selection_Mo_r.describe(percentiles)
# describe["mode"] = Selection_Mo_r.mode().iloc[0, :]
# describe["skewness"] = st.skew(Selection_Mo_r)
# describe["kurtosis"] = st.kurtosis(Selection_Mo_r)
# describe.replace("\n", "")

# dist_fit = np.empty(len(Selection_Mo_r.columns), dtype=object)

# for i in range(0, len(Selection.columns)):
#     f = Fitter(pd.DataFrame(Selection_Mo_r.iloc[:, i]), distributions = dist, timeout=5)
#     f.fit()
#     params, AIC, BIC = [StringIO() for i in range(3)]
#     (print(f.get_best(), file=params)), (print(f.get_best(method="aic"), file=AIC)), (print(f.get_best(method="bic"), file=BIC))
#     params, AIC, BIC = [i.getvalue() for i in [params, AIC, BIC]]
#     dist_fit[i] = (params + AIC + BIC).replace("\n", ", ")

# plt.title(title, fontsize = 20)
# plt.axhline(0, color = "red", lw = .5, linestyle = "--")
# plt.axhspan(0, Selection_Mo_r.min().min(), facecolor = "red", alpha = 0.2) 
# plt.axhspan(0, Selection_Mo_r.max().max(), facecolor = "green", alpha = 0.2)

# plt.xticks(rotation = 45)
# for i, t in enumerate(plt.gca().xaxis.get_ticklabels()):
#     if (i % 2) != 0:
#         t.set_color("lightgreen")
#     else:
#         t.set_color("white")
        
# plt.yticks(np.arange(round(Selection_Mo_r.min().min(), 1), round(Selection_Mo_r.max().max(), 1), 0.05))
# plt.grid(alpha = 0.5, linestyle = "--", color = "grey")
# plt.show()

<font color= 'black'> 

$r_{Log}(X_i)$:

In [None]:
Selection = np.log(dataframe[Selection.index]).diff().iloc[1:, :].dropna(axis = 1)

In [None]:
#Stats(dataframe, Selection, r, P, percentiles, dist, title, color):
describe_Wk = Stats(SP_Assets_r.loc[start:today], Sortino25[2], "Log", "W", [.025, .25, .5, .75, .95], dist, 
                    "$S&P$ 500 $r_{Log}(X_i)$ Selection Weekly Resampling from" + str(start) + "to" + str(today), "lightyellow")

In [None]:
describe_Wk[0]

In [None]:
describe_Mo = vs.Stats(SP_Assets_r.loc["2020-03-02":today], Sortino25[2], P[1][0],
                  "$X_i$ Selection Resamplings from $S&P$ 500 on a " + str(P[1][1]) + " basis from ", "2020-03-02", today,
                  [.025, .25, .5, .75, .95], dist, color=color[1])
                  

In [None]:
describe_Mo[0]

In [None]:
describe_Qt = vs.Stats(SP_Assets_r.loc["2020-03-02":today], Sortino25[2], P[2][0],
                  "$X_i$ Selection Resamplings from $S&P$ 500 on a " + str(P[2][1]) + " basis from ", "2020-03-02", today,
                  [.025, .25, .5, .75, .95], dist, color=color[2])

In [None]:
describe_Qt[0]

##### <font color= 'black'> Estimators Parameters:
$f(X_i)$ and $AIC$ $\&$ $BIC$: <br>

Distributions and parameters that best estimate $f(X_i)$ are obtained from $104$ distribution classes and instances for continuous random variables in `Fitter` module  *(see refs.)*. <br>

The *$AIC$ Akaike $\&$ $BIC$ Bayesian Information Criterion* models are estimators of *relative quality* of predictions in the *Log-Likelihood* for fitted distributions.<br>
Minimum relative values for $AIC$ and $BIC$ are usually preferred and in this case, they are obtained to model $X_i$ resampled data on $W, M$   $\&$ $Q$ periods $P$.<br>
Criterion's goodness of fit is inversely related so they tend to be used together to avoid under/over fitting and they are defined as follows:
+ $AIC = 2k - 2ln(\hat{L})$<br>
+ $BIC = kln(n) - 2ln(\hat{L})$<br>

*where:*<br>

 $k$ = Params. in  model.<br>
 $n$ = No° of observations.<br>
 $\hat{L}$ = $Likelihood_{f_{max.}}$.<br>

In [None]:
dist_fit=pd.DataFrame([describe_Wk[1], describe_Mo[1], describe_Qt[1]]).T
dist_fit_format = fn.format_table(dist_fit, Sortino25[2])
dist_fit_format

### <font color= 'blue'> 

### 3. Descriptive and Prescriptive Analytics for $X_P$ 

##### <font color= 'blue'> <br>
3.1 $X_P$ Optimizations Models <font>

<span style='color:gray'> *Equal weighted datasets are omitted from the analysis for simplicity purposes.*

<div class="alert alert-block alert-info">

If we have $n$ *unequally* weighted datasets $X_i=1,2,.., n$, to model $X_P$ we need $\mu_P$ $\&$ $\sigma_P$.<br>

And their weighted average is concluded:<br>

$$\mu_{P} = \frac{\sum_{i=1}^{n} w_{i} \mu_{{X_{i}}}}{\sum_{i=1}^{n} w_{i}}$$ 

If $$\sum_{i=1}^{n} w_{i} = 1$$ then: <br>

$$\mu_{P} = \sum_{i=1}^{n} w_{i} \mu_{{X_{i}}}$$ 

For the variance $\sigma^2_P$ we need to express $X_{i,j}$ as a matrix from the selection in $S\&P500$ *(A-Z)* quotes where ${\sigma_{i} \sigma_{j}}$ is the product of $X_{i,j}$ units of risk:<br>

$$\sigma_{i,j} = \left[\begin{array}{cccc}\sigma_{1} & \sigma_{1,2} & \cdots & \sigma_{1,500} \\ \sigma_{2,1} & \sigma_{2} & \cdots & \sigma_{2,500} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{500,1} & \cdots & \cdots & \sigma_{500}\end{array}\right]$$

We also need $X_{i,j}$ correlation coefficients $\rho_{i j}$ = $\frac{Cov(X_i, X_{j})}{\sigma_{i} \sigma_{j}}$ or units of risk in $X_{i,j}$ that are not shared in their fluctuations directional relationship.<br>

Expressed and substituted as:

$$\sigma^2_P=\sum_{i=1}^{n}\sum_{j=1}^{n}w_{i}w_{j}\sigma_{i}\sigma_{j}\rho_{ij}$$ 

$$\sigma^2_P = \sum_{i=1}^{n}\sum_{j=1}^{n}w_{i}w_{j}Cov(X_i, X_j)$$

A product of matrices $\times$ vectors:<br>

$$\sigma^2_{P} = \vec{w}^T \times Cov_{i,j} \times \vec{w}$$

Reduced and expressed as the following in its expanded form:<br> 

$$\sigma^2_{P} = {\left[\begin{array}{cccc}w_{1} & w_{2} & \cdots & w_{n}\end{array}\right] \cdot \left[\begin{array}{cccc}1 & \rho_{1,2} & \cdots & \rho_{1,n} \\ \rho_{2,1} & 1 & \cdots & \rho_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ p_{n,1} & \cdots & \cdots & 1\end{array}\right] \cdot \left[\begin{array}{cccc}w_{1} \\ w_{2} \\ \vdots \\ w_{n}\end{array}\right]}$$

Now, the slope can be obtained from $X_{P}$ and $X_{S\&P500}$ which is expressed as:<br>

$\beta = \frac{Cov(r_P,r_{S\&P500})}{Var(r_{S\&P500})}$

To compute some metrics that include units of sensitivities the following are considered:<br>

+ $R_{Treynor} = \frac{Var(r_{S\&P500})(\mu_P - {rf})}{Cov(r_P,r_{S\&P500})}$<br>

or the *slope* per unit of $P$ excess returns over the risk-free.

+ $R_{Jensen}({r_P, r_{t_{S\&P500}}}) = (\mu_P - {rf}) - \frac{Cov(r_P,r_{t_{S\&P500}})}{Var(r_{t_{S\&P500}})}(\mu_{t_{S\&P500}} - {rf})$<br>

or excess returns of $P$ over the risk free minus the *slope* times $P$ excess returns of a benchmark over the risk-free.



Now, the slope can be obtained from $X_{P}$ and $X_{S\&P500}$ which is expressed as:<br>

$\beta = \frac{Cov(r_P,r_{S\&P500})}{Var(r_{S\&P500})}$

To compute some metrics that include units of sensitivities the following are considered:<br>

+ $R_{Treynor} = \frac{Var(r_{S\&P500})(\mu_P - {rf})}{Cov(r_P,r_{S\&P500})}$<br>

or the *slope* per unit of $P$ excess returns over the risk-free.

+ $R_{Jensen}({r_P, r_{t_{S\&P500}}}) = (\mu_P - {rf}) - \frac{Cov(r_P,r_{t_{S\&P500}})}{Var(r_{t_{S\&P500}})}(\mu_{t_{S\&P500}} - {rf})$<br>

or excess returns of $P$ over the risk free minus the *slope* times $P$ excess returns of a benchmark over the risk-free.


<span style='color:black'>

Optimizations $\forall w_i$ are made with `Scipy` and validated with `Numpy` from parameters $X_i \rightarrow X_P$ for:<br><br>
+ $R_{Treynor_{Arg_{max}}}$
+ $R_{Sharpe_{Arg_{max}}}$
+ $R_{Sortino_{Arg_{max}}}$
+ $\sigma^2_{P_{Arg_{min}}}$

In [None]:
def Optimizer(Assets, index, rf, title):
    Asset_ret = (Assets.pct_change()).iloc[1:, :].dropna(axis = 1)
    index_ret = index.pct_change().iloc[1:, :].dropna(axis = 1)
    index_ret = index_ret[index_ret.index.isin(Asset_ret.index)]

    mean_ret = Asset_ret.mean() * 252
    cov = Asset_ret.cov() * 252

    N = len(mean_ret)
    w0 = np.ones(N) / N
    bnds = ((0, None), ) * N
    cons = {"type" : "eq", "fun" : lambda weights : weights.sum() - 1}

    def Max_Sharpe(weights, Asset_ret, rf, cov):
        rp = np.dot(weights.T, Asset_ret)
        sp = np.sqrt(np.dot(weights.T, np.dot(cov, weights)))
        RS = (rp - rf) / sp
        return -(np.divide(np.subtract(rp, rf), sp))
    
    def Min_Var(weights, cov):
        return np.dot(weights.T, np.dot(cov, weights)) 
    
    def Min_Traynor(weights, Asset_ret, rf, cov):
        #(rp - rf) / Beta
        rp = np.dot(weights.T, Asset_ret)
        varp = np.dot(weights.T, np.dot(cov, weights))
        cov
        RT = (rp - rf) / sp
        return -(np.divide(np.subtract(rp, rf), sp))
    
    #-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    opt_EMV = optimize.minimize(Max_Sharpe, w0, (mean_ret, rf, cov), 'SLSQP', bounds = bnds,
                                constraints = cons, options={"tol": 1e-10})
    
    W_EMV = pd.DataFrame(np.round(opt_EMV.x.reshape(1, N), 4), columns = Asset_ret.columns, index = ["Weights"])
    W_EMV[W_EMV <= 0.0] = np.nan
    W_EMV.dropna(axis = 1, inplace = True)

    RAssets = Asset_ret[Asset_ret.columns[Asset_ret.columns.isin(W_EMV.columns)]]
    # MuAssets = mean_ret[mean_ret.index.isin(W_EMV.columns)]
    R_EMV = pd.DataFrame((RAssets*W_EMV.values).sum(axis = 1), columns = ["$r_{Sharpe_{Arg_{max}}}$"])
    index_ret.rename(columns={index_ret.columns[0]: "$r_{mkt}$" }, inplace=True)
    R_EMV.insert(1, index_ret.columns[0], index_ret.values)

    Muopt_EMV = np.dot(opt_EMV.x.T, mean_ret) 
    Sopt_EMV = np.sqrt(np.dot(opt_EMV.x.T, np.dot(cov, opt_EMV.x)))
    Beta_EMV = np.divide((np.cov(R_EMV.iloc[0], R_EMV.iloc[1])[0][1]), R_EMV.iloc[1].var())
    SR_EMV = (Muopt_EMV - rf) / Sopt_EMV

    #-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

    opt_MinVar = optimize.minimize(Min_Var, np.ones(N) / N, (cov,), 'SLSQP', bounds = bnds,
                                   constraints = cons, options={"tol": 1e-10})

    W_MinVar = pd.DataFrame(np.round(opt_MinVar.x.reshape(1, N), 4), columns = Asset_ret.columns, index = ["Weights"])
    W_MinVar[W_MinVar <= 0.0] = np.nan
    W_MinVar.dropna(axis = 1, inplace = True)

    RAssets_MinVar = Asset_ret[Asset_ret.columns[Asset_ret.columns.isin(W_MinVar.columns)]]
    R_MinVar = pd.DataFrame((RAssets_MinVar*W_MinVar.values).sum(axis = 1), columns = ["$r_{Var_{Arg_{min}}}$"])
    R_EMV.insert(2, R_MinVar.columns[0], R_MinVar.values)

    Muopt_MinVar = np.dot(opt_MinVar.x.T, mean_ret) 
    Sopt_MinVar = np.sqrt(np.dot(opt_MinVar.x.T, np.dot(cov, opt_MinVar.x)))
    Beta_MinVar = np.divide((np.cov(R_EMV.iloc[2], R_EMV.iloc[1])[0][1]), R_EMV.iloc[1].var())
    SR_MinVar = (Muopt_MinVar - rf) / Sopt_MinVar 

    #-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    #opt_Traynor = 
    
    #-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

    Mu, Sigma, Beta, SR = [Muopt_EMV, Muopt_MinVar], [Sopt_EMV, Sopt_MinVar], [Beta_EMV, Beta_MinVar], [SR_EMV, SR_MinVar]
    index = ["$r_{P{Sharpe_{Arg_{max}}}}$", "$r_{Var_{Arg_{min}}}$"]
    Popt = [pd.DataFrame({"$\mu_P$" : Mu[i], "$\sigma_P$" : Sigma[i], "$\Beta_{P}$": Beta[i], "$r_{Sharpe_{Arg_{max}}}$" : SR[i]},
                          index = [index[i]]) for i in range(0, len(Mu))]
    
    Popt[0].index.name = title
    Popt[1].index.name = title
    R_EMV = R_EMV[[R_EMV.columns[1], R_EMV.columns[2], R_EMV.columns[0]]]
    #Get the cumulative returns with cumsum for rmkt, rEMV and rMinVar
    accum = R_EMV.cumsum()

    Argmax = [d.Markdown(tabulate(Popt[i], headers = "keys", tablefmt = "pipe")) for i in range(0, len(Popt))]
    R_EMV = d.Markdown(tabulate(R_EMV, headers = "keys", tablefmt = "pipe"))
    
    return Argmax, R_EMV, accum

In [None]:
bench_md = "$S\&P500_{{20_{03}-23_{05}}}$"
Argmax, R_EMV, accum = vs.Optimizer(SP_Assets_r.loc["2020-03-02":today], SP_r.loc["2020-03-02":today], 0.0169, bench_md)

Port = display(Argmax[0], Argmax[1])

In [None]:
d.Markdown(tabulate(accum.dropna()[0:10], headers = "keys", tablefmt = "pipe")) 
#Non sliced: d.Markdown(tabulate(accum.diff().dropna()[], headers = "keys", tablefmt = "pipe")) 

In [None]:

d.display(d.Markdown(tabulate(accum[0:10], headers = "keys", tablefmt = "pipe")))

In [None]:

d.display(d.Markdown(tabulate(accum[0:10], headers = "keys", tablefmt = "pipe")))


In [None]:
vs.Accum_ts(accum)

##### <font color= 'blue'> Metrics: <font>

<div class="alert alert-block alert-info">

Confusion Matrix: <br> $\begin{bmatrix} TP & FP \\ FN & TN \end{bmatrix}$

Metrics:

- Accuracy: $\frac{TP + TN}{TP + TN + FP + FN}$ or the ability of the classifier to find + and - samples.

- Precision: $\frac{TP}{TP + FP}$ or the ability of the classifier not to label + samples as -.

- Recall: $\frac{TP}{TP + FN}$ or the ability of the classifier to find all + samples.

- F1 Score: $2 * \frac{Precision * Recall}{Precision + Recall}$ or Precision and Recall equilibrated score through the harmonic mean.    

- ROC AUC: $\frac{TPR}{FPR}$ or the ability of the classifier to find + samples and not - samples. Where a bigger number denotes a better model.<br><br><br>

#### <span style='color:lightyellow'> ~ *Past performance is not a guarantee of future results, the stock market tends to be irrational.* <font>

Note: <br>
Do not consider the results and/or its proceedures as an investment advice or recommendation.