# 
## <p style="background-color:white;font-family:newtimeroman;color:coral;font-size:100%;text-align:center;border-radius:20px 60px;">MARKOWITZ's MODERN PORTFOLIO THEORY FOR S and P 500 TECHNOLOGY STOCKS INVESTMENT OPTIMIZATION</p>
![](https://3.bp.blogspot.com/-L3bXJ18DfgY/WLWMIFWnd5I/AAAAAAAABRQ/nWxTcueDJt8H8fiZesjUCctJ8UJE6nKmgCLcB/s640/Untitled.png)

By Adedayo Adeboye Abiodun


In the fast-paced world of technology stocks within the S&P 500, the key to successful investment lies in informed decision-making. This project embarks on a journey through the realms of Modern Portfolio Theory (MPT) to construct a well-diversified portfolio. By leveraging historical data scraped from Yahoo Finance and the S&P 500 constituents from Wikipedia, this endeavor aims to unlock valuable insights into risk, return, and the optimal allocation of assets. Join us as we navigate the dynamic landscape of technology investments, guided by the principles of MPT.

In this Python project, we embark on a journey to harness the power of MPT to construct an optimal portfolio consisting of S&P 500 technology stocks. Our goal is twofold: to identify the most promising technology stocks within the S&P 500 and to allocate investments intelligently to minimize risk while maximizing returns. Through the fusion of data analysis, statistical modeling, and advanced Python programming, we will craft a portfolio that not only adapts to market dynamics but also aligns with our unique investment objectives.

This project is designed for both novice and seasoned investors, offering a practical hands-on experience in applying quantitative methods to the world of finance. By the end of our journey, you will have the tools and insights to construct your own tech-savvy portfolio, making informed decisions in a dynamic market.

Join us on this exciting exploration of Modern Portfolio Theory and its application to the dynamic world of S&P 500 technology stocks. Let's embark on this data-driven adventure to optimize your investment strategy and potentially secure a prosperous financial future.

Here are the list of Python Liberaries:
    
     1. datetime
     2. pandas
     3. yfinance
     4. numpy
     5. matplotlib.pyplo
     6. seaborn
     7. plotly_express
     8. scipy.stats.mstats

In [1]:
import datetime as dt
from datetime import date
import pandas as pd
import yfinance as yf
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from scipy.stats.mstats import gmean
import warnings
warnings.filterwarnings("ignore")
from scipy import optimize

ModuleNotFoundError: No module named 'yfinance'

In [None]:
 pip uninstall -y yfinance

In [None]:
pip install yfinance==0.1.83

### Web Scraping
Gathering data about the technology companies listed on the S&P 500 index from Wikipedia

In [None]:
sp500_url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"

In [None]:
sp500_table = pd.read_html(sp500_url)

In [None]:
# Check the tables lengths

len(sp500_table)

In [None]:
# Select the relevant data table index [0

sp500_table = sp500_table[0]

In [None]:
sp500_table

In [None]:
# Select only information technology dataframe

sp500_table_tech = sp500_table[sp500_table['GICS Sector'] == 'Information Technology']

In [None]:
sp500_table_tech.head(4)

In [None]:
# Generate tickers for IT companies

tickers_IT = sp500_table_tech['Symbol'].tolist()

In [None]:
# Scrap Historical data from yahoo finanace
today = date.today()

sp500_table_tech_priceIT = yf.download(tickers_IT ,start = '2023-01-10', end = today)['Adj Close']

In [None]:
sp500_table_tech_priceIT

### DESCRITIVE ANALYSIS

In [None]:
sp500_table_tech.describe()

In [None]:
sp500_table_tech.info()

In [None]:
sp500_table_tech.notnull().sum()

### Expolatory Data Analysid

In [None]:
# Generate tickers for IT companies

tickers_ID = sp500_table_tech['GICS Sub-Industry'].unique()

In [None]:
tickers_ID = tickers_ID.tolist()

In [None]:
tickers_ID

In [None]:
len(tickers_ID)

In [None]:
# Chech the symbols correclation

plt.figure(figsize = (10,10))
sns.heatmap(sp500_table_tech_priceIT.corr(),annot=True, cmap="coolwarm")

In [None]:
# Check the data distribution

sp500_table_tech_priceIT.hist(bins = 50 ,figsize = (20,15))
plt.show

In [None]:
# Apply the stack function to transpose the dataframe

sp500_table_tech_priceITranspos = pd.DataFrame(sp500_table_tech_priceIT.stack())

In [None]:
# Reset the index using reset_index function

sp500_table_tech_priceITranspos = sp500_table_tech_priceITranspos.reset_index()
sp500_table_tech_priceITranspos

In [None]:
# Rename the columns

sp500_table_tech_priceITranspos.rename({'level_1': 'Symbol', 0: 'Close'},
          axis = "columns", inplace = True)

In [None]:
sp500_table_tech_priceITranspos

#### Merge sp500_table_tech_priceITranspos with sp500_table_tech on Symbol

In [None]:
# merge

sp500_table_tech_priceITranspos  = sp500_table_tech_priceITranspos.merge(sp500_table_tech[['Symbol','GICS Sub-Industry']], on = 'Symbol')
# sp500_table_tech_priceITranspos['Date'] = pd.to_datetime(sp500_table_tech_priceITranspos['Date'])
sp500_table_tech_priceITranspos.head()

In [None]:
plt.figure(figsize=(15, 6))
ax = sns.countplot(y='GICS Sub-Industry', data=sp500_table_tech_priceITranspos)
plt.xticks(rotation=45)

In [None]:
# convert sp500_table_tech_priceITranspos to pivot_table

IndustryIT_pivot = pd.pivot_table(sp500_table_tech_priceITranspos, values = 'Close', index = ['Date'],columns = ['GICS Sub-Industry']).reset_index()
IndustryIT_pivot

In [None]:
IndustryIT_pivot.columns.name = None
IndustryIT_pivot

In [None]:
IndustryIT_pivot = IndustryIT_pivot.reset_index(drop = True)
IndustryIT_pivot

In [None]:
IndustryIT_pivot = IndustryIT_pivot.set_index('Date')
IndustryIT_pivot

### Selection base Correlation

In [None]:
plt.figure(figsize = (12,10))
sns.heatmap(IndustryIT_pivot.corr(),annot=True, cmap="coolwarm")

We've generated a correlation matrix for our Industries, with higher correlations indicating stocks moving in tandem. In simple terms, highly correlated stocks tend to rise or fall together.

For a well-rounded portfolio, investors target negatively correlated stocks, minimizing risk and promoting peace of mind. In a two-stock portfolio with negative correlation, when one falters, the other often excels. Conversely, risk enthusiasts may opt for positively correlated stocks, aiming for greater returns at the expense of heightened risk.

In [None]:
# Chech stastistical Data Distribution

IndustryIT_pivot.hist(bins = 50 ,figsize = (20,15))
plt.show

Histograms provide valuable insights into the distribution of statistical data, enhancing our ability to analyze data distribution patterns. This is particularly advantageous when applied to stock market prediction.

#  

### Selection base on pct_change

Here we can get better understanding between the previous columns values of our dataset and the curent values. This will help us to analysis how a numeric series or colums change over time

In [None]:
# px.line(IndustryIT_pivot * 100 / IndustryIT_pivot.iloc[0])

ret_port = IndustryIT_pivot.pct_change()
px.line(ret_port)

### Calculate Volatility and Shape Ratio of The Asset

In [None]:
np.random.seed(1)
# security Weight
weights = np.random.random((12,1))
# Do normalization
weights /= np.sum(weights)
print(f'The normalized Weights :  {weights}')


# Generate log of Return
log_return = np.log(IndustryIT_pivot / IndustryIT_pivot.shift(1))
log_return

# Expected return (weighted sum of mean returns).
# Mult by 252 as we always do annual calculation and year has 252 business days
exp_return = log_return.mean().dot(weights)*252
print(f'\nExpected return of the portfolio is : {exp_return[0]}')


# Exp Volatility (Risk)
exp_vol = np.sqrt(weights.T.dot(252*log_return.cov().dot(weights)))
print(f'\nVolatility of the portfolio: {exp_vol[0][0]}')


# Sharpe ratio
sr = exp_return / exp_vol
print(f'\nSharpe ratio of the portfolio: {sr[0][0]}')

Diversification Analysis: Normalized weights help assess portfolio diversification. A well-diversified portfolio should have relatively equal weights among its holdings, reducing exposure to individual stock risk.

In summary, normalized weights provide valuable insights into the composition and characteristics of a stock portfolio, aiding investors in making informed decisions regarding diversification, risk management, and performance evaluation.

##

### Selection based Monte Carlo Simulation

In a Monte Carlo Simulation, we randomly allocate weights to securities, evaluate returns and risks, and pinpoint the optimal weight combination for maximum return at a defined risk level. Alternatively, SciPy can solve this optimization problem.

In [None]:
# number of simulation
n = 50_000
# n = 10

port_weights = np.zeros(shape=(n,len(IndustryIT_pivot.columns)))
port_volatility = np.zeros(n)
port_sr = np.zeros(n)
port_return = np.zeros(n)

num_securities = len(IndustryIT_pivot.columns)
# num_securities
for i in range(n):
    # Weight each security
    weights = np.random.random(12)
    # normalize it, so that some is one
    weights /= np.sum(weights)
    port_weights[i,:] = weights
    #     print(f'Normalized Weights : {weights.flatten()}')

    # Expected return (weighted sum of mean returns). Mult by 252 as we always do annual calculation and year has 252 business days
    exp_ret = log_return.mean().dot(weights)*252
    port_return[i] = exp_ret
#     print(f'\nExpected return is : {exp_ret[0]}')

    # Exp Volatility (Risk)
    exp_vol = np.sqrt(weights.T.dot(252*log_return.cov().dot(weights)))
    port_volatility[i] = exp_vol
#     print(f'\nVolatility : {exp_vol[0][0]}')

    # Sharpe ratio
    sr = exp_ret / exp_vol
    port_sr[i] = sr
#     print(f'\nSharpe ratio : {sr[0][0]}')

In [None]:
# Index of max Sharpe Ratio
max_sr = port_sr.max()
ind = port_sr.argmax()
# Return and Volatility at Max SR
max_sr_ret = port_return[ind]
max_sr_vol = port_volatility[ind]

In [None]:
plt.figure(figsize=(20,15))
plt.scatter(port_volatility,port_return,c=port_sr, cmap='plasma')
plt.colorbar(label='Sharpe Ratio')
plt.xlabel('Volatility', fontsize=15)
plt.ylabel('Return', fontsize=15)
plt.title('Efficient Frontier (Bullet Plot)', fontsize=15)
plt.scatter(max_sr_vol, max_sr_ret, c='blue', s=150, edgecolors='red', marker='o', label='Max \
Sharpe ratio Portfolio')
plt.legend();

A volatility scatter plot for stock assets can provide several insights:

1. Risk-Return Relationship: It can help you visualize the relationship between risk (volatility) and return (typically represented by historical or expected returns). Assets located in the upper-right quadrant of the plot tend to offer higher returns but also come with higher volatility, while those in the lower-left quadrant have lower returns but lower volatility.

2. Risk Diversification: By plotting multiple assets, you can assess how different stocks behave in terms of volatility. Diversification strategies often involve selecting assets that have low or negatively correlated volatility to reduce overall portfolio risk.

3. Outliers: The plot can highlight outliers, which are stocks with exceptionally high or low volatility relative to others. These outliers may warrant further investigation, as they can present unique investment opportunities or risks.

4. Sector Analysis: Grouping assets by sector and plotting them can reveal sector-specific patterns in volatility. Some sectors, such as technology or biotech, may naturally exhibit higher volatility compared to more stable sectors like utilities or consumer staples.

5. Historical Trends: Over time, the scatter plot can show how the volatility of specific assets or the overall market has evolved. This can be valuable for assessing changing market conditions.

6. Portfolio Optimization: If you're building a diversified portfolio, the plot can help you select assets that align with your risk tolerance and return objectives. You can aim to balance assets with different risk-return profiles.

It's important to note that while a volatility scatter plot can provide insights, it should be used in conjunction with other financial analysis tools and considerations to make informed investment decisions. Additionally, historical volatility may not always predict future volatility, so risk assessment should be dynamic and adaptable.

In [None]:
for weight, tickers_IDs in zip(port_weights[ind],tickers_ID):
    print(f'{round(weight * 100, 2)} % of {tickers_IDs} should be bought.')

# best portfolio return
print(f'\nMarkowitz optimal portfolio return is : {round(max_sr_ret * 100, 2)}% with volatility \
{max_sr_vol}')

#### optimization

### SciPy to get the max of Sharpe Ration

In [None]:
log_mean = log_return.mean() * 252
cov = log_return.cov() * 252

In [None]:
# Some helper functions
def get_ret_vol_sr(weights):
    weights = np.array(weights)
    ret = log_mean.dot(weights)
    vol = np.sqrt(weights.T.dot(cov.dot(weights)))
    sr = ret / vol
    return np.array([ret, vol, sr])

# Negate Sharpe ratio as we need to max it but Scipy minimize the given function
def neg_sr(weights):
    return get_ret_vol_sr(weights)[-1] * -1

# check sum of weights
def check_sum(weights):
    return np.sum(weights) - 1

# Constraints for the optimization problem
cons = {'type':'eq','fun':check_sum}
# bounds on weights
bounds = ((0,1),(0,1),(0,1),(0,1),(0,1),(0,1),(0,1),(0,1),(0,1),(0,1),(0,1),(0,1))
# initial guess for optimization to start with
init_guess = [.25 for _ in range(12)]


# Call minimizer
opt_results = optimize.minimize(neg_sr, init_guess, constraints=cons, bounds=bounds, method='SLSQP')

In [None]:
optimal_weights = opt_results.x
# optimal_weights
for tickers_IDs, i in zip(tickers_ID,optimal_weights):
    print(f'Stock {tickers_IDs} has weight {np.round(i*100,2)} %')

In [None]:
mc_weights = port_weights[ind]
for st, i in zip(tickers_ID,mc_weights):
    print(f'Stock {st} has weight {np.round(i*100,2)} %')

Diversification: The weight of an asset in a portfolio indicates how much exposure an investor has to that particular asset. A well-diversified portfolio often has assets with different weights to spread risk effectively.

Risk Assessment: A high weight in a single asset can indicate concentration risk. If that asset performs poorly, it can significantly impact the overall performance of the portfolio.

In summary, the weight of an asset in stock analysis is a crucial metric for understanding portfolio composition, risk, and performance. It can help investors make informed decisions about asset allocation and assess the impact of individual holdings on their overall investment strategy.

In [None]:
# Comparing two results we see that we get very close results
(optimal_weights - mc_weights)

In [None]:
get_ret_vol_sr(optimal_weights), get_ret_vol_sr(mc_weights)

print('For a given portfolio we have: (Using SciPy optimizer)\n \n')
for i, j in enumerate('Return Volatility SharpeRatio'.split()):
    print(f'{j} is : {get_ret_vol_sr(optimal_weights)[i]}\n')

print('For a given portfolio we have: (Using Monte Carlo)\n \n')
for i, j in enumerate('Return Volatility SharpeRatio'.split()):
    print(f'{j} is : {get_ret_vol_sr(mc_weights)[i]}\n')

So MC and Optimizer gives very close results. MC is good, but if we have many assets, since MC computationally heavy, SciPy is going to be a saviour!!!

1. *SciPy Optimizer-Generated Weights*:
   - *Purpose*: SciPy optimizers are typically used in mathematical optimization problems to find the best set of parameters that minimize or maximize a specific objective function.
   - *Usage*: These weights are often used to optimize models in various domains, such as machine learning (e.g., neural network weights), engineering (e.g., optimizing control systems), and finance (e.g., portfolio optimization). They represent the optimal solution to a specific problem.

2. *Monte Carlo Simulation-Generated Weights*:
   - *Purpose*: Monte Carlo simulations are used to model uncertainty and variability in a system by repeatedly sampling random inputs and assessing the outcomes. In finance, it's often used for risk analysis, option pricing, and more.
   - *Usage*: Weights generated from Monte Carlo simulations are typically used to estimate probabilistic outcomes and risk measures. For example, in finance, these weights might represent the distribution of potential returns for a portfolio under various market scenarios. They help investors assess the risk associated with their investments.

In summary, SciPy optimizer-generated weights are primarily used to find optimal solutions to deterministic optimization problems, while Monte Carlo simulation-generated weights are used to model and analyze probabilistic outcomes and uncertainties in a system. The choice between these methods depends on the specific problem you're trying to solve and whether you're dealing with deterministic or stochastic (random) elements in your model.

### Frontier curve
Best return for given volatility or vice versa.

In [None]:
frontier_y = np.linspace(port_return.min(), port_return.max(), 100)
frontier_vol = []

def minimize_vol(weights):
    return get_ret_vol_sr(weights)[1]

for possible_ret in frontier_y:
    cons = ({'type':'eq','fun':check_sum},
            {'type':'eq','fun':lambda w:get_ret_vol_sr(w)[0] - possible_ret})
    result = optimize.minimize(minimize_vol, init_guess, method='SLSQP', constraints=cons, bounds=bounds)
    frontier_vol.append(result['fun'])

In [None]:
plt.figure(figsize=(20,15))
plt.scatter(port_volatility,port_return,c=port_sr, cmap='plasma')
plt.colorbar(label='Sharpe Ratio')
plt.xlabel('Volatility', fontsize=15)
plt.ylabel('Return', fontsize=15)
plt.title('Efficient Frontier', fontsize=15)
plt.scatter(max_sr_vol, max_sr_ret, c='blue', s=150, edgecolors='red', marker='o')

plt.plot(frontier_vol, frontier_y, c='magenta', ls='--', lw=3, label='Efficient Frontier')
plt.legend();

#### Frontier curve
Selecting the portfolio with the highest Sharpe ratio is a widely accepted approach, though personal preferences can also play a role in decision-making. The efficient frontier offers the maximum profit achievable for a given level of risk. Therefore, any portfolio situated below this efficient frontier is generally considered suboptimal, assuming we are aiming for returns above the risk-free rate.


##
#### Portfolio selection by Sector Companies

In [None]:
sp500_table_tech_priceITranspos['return'] = np.log(sp500_table_tech_priceITranspos.Close / sp500_table_tech_priceITranspos.Close.shift(1)) + 1
sp500_table_tech_priceITranspos['good'] = sp500_table_tech_priceITranspos['Symbol'] == sp500_table_tech_priceITranspos['Symbol'].shift(1)
sp500_table_tech_priceITranspos = sp500_table_tech_priceITranspos.drop(sp500_table_tech_priceITranspos[sp500_table_tech_priceITranspos['good'] == False].index)
sp500_table_tech_priceITranspos.dropna(inplace = True)

In [None]:
risk_free = 0.034
Industry_df = pd.DataFrame({'return' : (sp500_table_tech_priceITranspos.groupby('GICS Sub-Industry')['return'].mean() - 1) * 252, 'stdev' : sp500_table_tech_priceITranspos.groupby('GICS Sub-Industry')['return'].std()})
Industry_df['sharpe'] = (Industry_df['return'] - risk_free) / Industry_df['stdev']
plt.figure(figsize = (12,8))
ax = sns.barplot(x= Industry_df['sharpe'], y = Industry_df.index)

#### The Sharpe ratio

The Sharpe ratio is a common metric for evaluating portfolio performance, with higher values indicating stronger portfolios. Investors generally find a Sharpe ratio exceeding 1 acceptable. In practice, we focus on companies with Sharpe ratios above 1 when making investment choices.

#

In [None]:
port_list = Industry_df[Industry_df['sharpe'] >= 1].index
port_list

In [None]:
sp500_table_tech_priceITranspos

Once we've compiled the list of sectors, we'll select the top-performing stock from each sector. In practice, you would typically choose multiple stocks, but for the sake of simplicity in this example, I'll pick just one for illustration.

In [None]:
port_stock = []
return_stock = []
def get_stock(industry):
    list_stocks = sp500_table_tech_priceITranspos[sp500_table_tech_priceITranspos['GICS Sub-Industry'] == industry]['Symbol'].unique()
    performance = sp500_table_tech_priceITranspos.groupby('Symbol')['return'].apply(lambda x : (gmean(x) - 1) * 252).sort_values(ascending = False)

    for i in range(len(performance)):
        if performance.index[i] in list_stocks:
            port_stock.append(performance.index[i])
            return_stock.append(performance[i])
            break

for industry in port_list:
    get_stock(industry)

return_stock

In [None]:
port_df = sp500_table_tech_priceITranspos[sp500_table_tech_priceITranspos['Symbol'].isin(port_stock)].pivot('Date','Symbol','return')
port_df

### Porfolio risk and return calculation



In [None]:
return_pred = []
weight_pred = []
std_pred = []
for i in range(1000):
    random_matrix = np.array(np.random.dirichlet(np.ones(len(port_stock)),size=1)[0])
    port_std = np.sqrt(np.dot(random_matrix.T, np.dot(port_df.cov(),random_matrix))) * np.sqrt(252)
    port_return = np.dot(return_stock, random_matrix)
    return_pred.append(port_return)
    std_pred.append(port_std)
    weight_pred.append(random_matrix)

In [None]:
pred_output = pd.DataFrame({'weight' : weight_pred , 'return' : return_pred, 'stdev' :std_pred })
pred_output['sharpe'] = (pred_output['return'] - risk_free) / pred_output['stdev']
pred_output.head()

In [None]:
max_pos = pred_output.iloc[pred_output.sharpe.idxmax(),:]
safe_pos = pred_output.iloc[pred_output.stdev.idxmin(),:]

After running 2000 simulations, we finally plot the results, as well as the options for the portfolio, either the best performing or the safest one for risk adverse.

In [None]:
plt.subplots(figsize=(15,10))

plt.scatter(pred_output.stdev,pred_output['return'],c=pred_output.sharpe,cmap='OrRd')
plt.colorbar()
plt.xlabel('Volatility')
plt.ylabel('Return')

plt.scatter(max_pos.stdev,max_pos['return'],marker='^',color='r',s=500)
plt.scatter(safe_pos.stdev,safe_pos['return'],marker='<',color='g',s=500)
#ax.plot()

In [None]:
print("The highest sharpe porfolio is {} sharpe, at {} volitality".format(max_pos.sharpe.round(3),max_pos.stdev.round(3)))

for i in range(len(port_stock)):
    print("{} : {}%".format(port_stock[i],(max_pos.weight[i] * 100).round(3)))

In [None]:
print("The safest porfolio is {} risk, {} sharpe".format(safe_pos.stdev.round(3), safe_pos.sharpe.round(3)))
for i in range(len(port_stock)):
    print("{} : {}%".format(port_stock[i],(safe_pos.weight[i] * 100).round(3)))

## Conclusion

In conclusion, when applying the Markowitz Portfolio Theory to the technology sector in the S&P 500, our portfolio selection was based on several key insights. First, we considered the correlation among industries, aiming to include negatively correlated stocks to minimize risk. This approach promotes a well-rounded portfolio that provides peace of mind to investors. Alternatively, those seeking higher returns at the cost of increased risk may opt for positively correlated stocks.

Additionally, we used percentage changes to better understand how the values in our dataset evolved over time, aiding in our analysis of stock performance. This allowed us to track the dynamics of the stocks in our portfolio.

Furthermore, we employed Monte Carlo Simulation and SciPy optimization to randomly allocate weights to securities and determine the optimal combination for maximizing returns at a defined risk level. This approach helps investors make data-driven decisions regarding their portfolio composition.

When constructing our portfolio, we considered the efficient frontier and aimed to select the portfolio with the highest Sharpe ratio. The efficient frontier represents the maximum profit achievable for a given level of risk, and portfolios positioned below it are generally considered suboptimal. By focusing on the Sharpe ratio, we aimed for portfolios with higher values, typically exceeding 1, to ensure stronger portfolio performance in practice.

In summary, our approach to portfolio selection based on correlation, percentage changes, Monte Carlo Simulation, and the Sharpe ratio provides a comprehensive framework for constructing a technology S&P 500 portfolio that balances risk and return to meet investors' financial goals and risk tolerance.