<a href="https://colab.research.google.com/github/TemiOyee/Quantitative-Analysis-of-Stock-Market/blob/main/Quantitative_Analysis_of_Stock_Market.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Quantitative Analysis of Stock Market**

## **Problem Statement**

The stock market is a dynamic and complex environment, where the behavior of individual stocks and their interrelationships can have significant implications for investors, traders, and market participants. In this project, we aim to perform a comprehensive quantitative analysis of stock market data, leveraging various statistical and visualization techniques to uncover valuable insights. By analyzing historical stock data, we can gain a deeper understanding of market trends, volatility, correlations, and risk-return dynamics, enabling informed decision-making processes.

## **Aim and Objectives**
The primary aim of this project is to perform a comprehensive quantitative analysis of stock market data, leveraging various statistical, technical analysis, portfolio optimization, sentiment analysis, and machine learning techniques to uncover valuable insights and support data-driven decision-making processes.The objectives are as follows:


1. Compute and analyze descriptive statistics for individual stocks, such as mean, median, standard deviation, and quartile ranges.

2. Visualize and interpret time series data for stock closing prices, identifying trends, patterns, and potential market cycles.
3. Evaluate the volatility of different stocks by calculating and comparing their standard deviations.
4. Analyze the correlation between various stocks, highlighting potential diversification opportunities or sector-specific trends.
5. Perform a comparative analysis of stock performance by calculating the percentage change in closing prices over the given time period.
6. Investigate the risk-return trade-off by plotting the average daily return against the associated risk (standard deviation) for each stock.
7. Calculate and visualize popular technical analysis indicators, such as moving averages, Bollinger Bands, and the Relative Strength Index (RSI), to identify potential buy/sell signals.
8. Conduct Monte Carlo simulations to estimate the potential range of future stock prices based on historical data and various assumptions.
9. Implement portfolio optimization techniques, such as the Markowitz model, to construct an optimal portfolio based on risk-return preferences.


## **Implementation**
The implementation of this project leverages the Python programming language and several data analysis and visualization libraries, including Pandas, Plotly Express, and Plotly Graph Objects.

### **Data Loading and Exploration**

In this step, we import the necessary libraries, including Pandas, which is a popular data manipulation and analysis library. The pd.read_csv() function is used to load the stock market data from a CSV file named "stocks.csv" into a Pandas DataFrame called stocks_data.

To gain an initial understanding of the dataset, we print the first few rows using the head() method. This allows us to examine the column names and ensure that the data has been loaded correctly.

In [1]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.templates.default = "plotly_white"

# Load the dataset
stocks_data = pd.read_csv("stocks.csv")

# Display the first few rows of the dataset
print(stocks_data.head())

  Ticker        Date        Open        High         Low       Close  \
0   AAPL  2023-02-07  150.639999  155.229996  150.639999  154.649994   
1   AAPL  2023-02-08  153.880005  154.580002  151.169998  151.919998   
2   AAPL  2023-02-09  153.779999  154.330002  150.419998  150.869995   
3   AAPL  2023-02-10  149.460007  151.339996  149.220001  151.009995   
4   AAPL  2023-02-13  150.949997  154.259995  150.919998  153.850006   

    Adj Close    Volume  
0  154.414230  83322600  
1  151.688400  64120100  
2  150.639999  56007100  
3  151.009995  57450700  
4  153.850006  62199000  


### **Descriptive Statistics**
Descriptive statistics provide a concise summary of the key characteristics of the data.

In this section, we group the data by the 'Ticker' column and calculate various descriptive statistics for the 'Close' column, such as count, mean, standard deviation, minimum, and maximum values. The groupby() and describe() methods are used to achieve this.

In [2]:
# Descriptive Statistics for each stock
descriptive_stats = stocks_data.groupby('Ticker')['Close'].describe()

print(descriptive_stats)

        count        mean        std         min         25%         50%  \
Ticker                                                                     
AAPL     62.0  158.240645   7.360485  145.309998  152.077499  158.055000   
GOOG     62.0  100.631532   6.279464   89.349998   94.702501  102.759998   
MSFT     62.0  275.039839  17.676231  246.270004  258.742500  275.810013   
NFLX     62.0  327.614677  18.554419  292.760010  315.672493  325.600006   

               75%         max  
Ticker                          
AAPL    165.162506  173.570007  
GOOG    105.962503  109.459999  
MSFT    287.217506  310.649994  
NFLX    338.899994  366.829987  


### **Time Series Analysis**

Time series analysis is crucial for understanding the evolution of stock prices over time.

In this section, we first convert the 'Date' column to a datetime data type using pd.to_datetime(). Then, we pivot the data, setting the 'Date' column as the index and the 'Ticker' column as columns, with the 'Close' column as values.

Next, we create a subplot using make_subplots() from the Plotly library. We iterate over each stock ticker and add a trace (line plot) to the subplot, representing the closing prices over time. The go.Scatter() function is used to create each trace.

Finally, we update the layout of the plot with a title, axis labels, and a legend, and then display the interactive plot using fig.show().

In [3]:
# Time Series Analysis
stocks_data['Date'] = pd.to_datetime(stocks_data['Date'])
pivot_data = stocks_data.pivot(index='Date', columns='Ticker', values='Close')

# Create a subplot
fig = make_subplots(rows=1, cols=1)

# Add traces for each stock ticker
for column in pivot_data.columns:
    fig.add_trace(
        go.Scatter(x=pivot_data.index, y=pivot_data[column], name=column),
        row=1, col=1
    )

# Update layout
fig.update_layout(
    title_text='Time Series of Closing Prices',
    xaxis_title='Date',
    yaxis_title='Closing Price',
    legend_title='Ticker',
    showlegend=True
)

# Show the plot
fig.show()

### **Volatility Analysis**
Volatility analysis helps identify the riskiness of individual stocks.

 In this section, we calculate the standard deviation of the closing prices for each stock, which serves as a measure of volatility. The std() method is used to compute the standard deviation, and the resulting Series is sorted in descending order using sort_values().

We then create a bar chart using Plotly Express (px.bar()), where the x-axis represents the stock tickers, and the y-axis shows the corresponding standard deviations. The chart is labeled with appropriate titles and axis labels, and the interactive plot is displayed using fig.show().

In [4]:
# Volatility Analysis
volatility = pivot_data.std().sort_values(ascending=False)

fig = px.bar(volatility,
             x=volatility.index,
             y=volatility.values,
             labels={'y': 'Standard Deviation', 'x': 'Ticker'},
             title='Volatility of Closing Prices (Standard Deviation)')

# Show the figure
fig.show()

### **Correlation Analysis**

Correlation analysis helps identify the strength and direction of the relationship between different stocks. In this section, we calculate the correlation matrix using the corr() method on the pivoted data.

We then create a heatmap using go.Heatmap() from Plotly Graph Objects, where the z parameter represents the correlation matrix values. The x and y parameters specify the x and y-axis labels, which are the stock tickers in this case. We also set a colorscale and add a colorbar title.

Additionally, we update the layout of the plot with a title and axis labels, and display the interactive heatmap using fig.show().

In [5]:
# Correlation Analysis
correlation_matrix = pivot_data.corr()

fig = go.Figure(data=go.Heatmap(
                    z=correlation_matrix,
                    x=correlation_matrix.columns,
                    y=correlation_matrix.columns,
                    colorscale='blues',
                    colorbar=dict(title='Correlation'),
                    ))

# Update layout
fig.update_layout(
    title='Correlation Matrix of Closing Prices',
    xaxis_title='Ticker',
    yaxis_title='Ticker'
)

# Show the figure
fig.show()

### **Comparative Analysis**
Comparative analysis allows us to evaluate the performance of different stocks relative to each other. In this section, we calculate the percentage change in closing prices between the first and last trading days in the dataset. We use the iloc indexer to access the first and last rows of the pivoted data, and then calculate the percentage change.

We then create a bar chart using Plotly Express (px.bar()), where the x-axis represents the stock tickers, and the y-axis shows the corresponding percentage changes. The chart is labeled with appropriate titles and axis labels, and the interactive plot is displayed using fig.show().

In [6]:
# Calculating the percentage change in closing prices
percentage_change = ((pivot_data.iloc[-1] - pivot_data.iloc[0]) / pivot_data.iloc[0]) * 100

fig = px.bar(percentage_change,
             x=percentage_change.index,
             y=percentage_change.values,
             labels={'y': 'Percentage Change (%)', 'x': 'Ticker'},
             title='Percentage Change in Closing Prices')

# Show the plot
fig.show()

### **Daily Risk Vs. Return Analysis**
The daily risk vs. return analysis is a fundamental concept in portfolio management, as it helps investors understand the trade-off between risk and potential returns. In this section, we first calculate the daily returns for each stock using the pct_change() method on the pivoted data. We remove any rows with missing values using dropna().

Next, we recalculate the average daily return and risk (standard deviation) for each stock using the mean() and std() methods, respectively.

We then create a DataFrame called risk_return_df containing the risk and average daily return for each stock.

To visualize the risk-return trade-off, we create a scatter plot using go.Scatter() from Plotly Graph Objects. The x-axis represents the risk (standard deviation), and the y-axis represents the average daily return. We set the mode parameter to 'markers+text' to display both markers and text labels (stock tickers) on the plot. The textposition parameter is set to "top center" to position the text labels above each marker.


Finally, we display the interactive scatter plot using fig.show().
This risk vs. return analysis provides a visual representation of the trade-off between risk and potential returns, allowing investors to identify stocks with favorable risk-return profiles based on their risk tolerance and investment objectives.


In [7]:
daily_returns = pivot_data.pct_change().dropna()

# Recalculating average daily return and standard deviation (risk)
avg_daily_return = daily_returns.mean()
risk = daily_returns.std()

# Creating a DataFrame for plotting
risk_return_df = pd.DataFrame({'Risk': risk, 'Average Daily Return': avg_daily_return})

fig = go.Figure()

# Add scatter plot points
fig.add_trace(go.Scatter(
    x=risk_return_df['Risk'],
    y=risk_return_df['Average Daily Return'],
    mode='markers+text',
    text=risk_return_df.index,
    textposition="top center",
    marker=dict(size=10)
))

# Update layout
fig.update_layout(
    title='Risk vs. Return Analysis',
    xaxis_title='Risk (Standard Deviation)',
    yaxis_title='Average Daily Return',
    showlegend=False
)

# Show the plot
fig.show()

### **Additional Analysis**

1. **Moving Averages**: Calculating and plotting moving averages (e.g., Simple Moving Average, Exponential Moving Average) can help identify trends and potential buy/sell signals.

2. **Bollinger Bands**: Plotting Bollinger Bands, which are volatility bands based on standard deviations from the moving average, can assist in identifying potential overbought or oversold conditions.
3. **Relative Strength Index (RSI**): Calculating and visualizing the RSI, a momentum oscillator, can help identify overbought or oversold conditions and potential trend reversals.
4. **Monte Carlo Simulations:** Performing Monte Carlo simulations can help estimate the potential range of future stock prices based on historical data and various assumptions.
5. **Portfolio Optimization:** Implementing portfolio optimization techniques, such as the Modern Portfolio Theory (MPT) and Markowitz model, can assist in constructing an optimal portfolio based on risk-return preferences.


#### **Moving Averages**

In this code snippet, we calculate and visualize two popular moving averages: Simple Moving Average (SMA) and Exponential Moving Average (EMA).

The SMA is calculated by taking the mean of the last window_size observations, and it's computed using the rolling() method with window=window_size and mean(). In this example, we set window_size=20, which means the SMA is calculated based on the previous 20 trading days.

The EMA is a weighted moving average that gives more importance to recent observations. It's calculated using the ewm() (Exponential Weighted Moving) method with span=span and mean(). The span parameter determines the smoothing factor, with a higher value resulting in more smoothing.

After calculating the SMA and EMA, we create a line plot using Plotly Graph Objects. We add three traces to the plot: one for the stock price, one for the SMA, and one for the EMA. The go.Scatter() function is used to create each trace, and the mode='lines' parameter is set to display the data as lines.

Finally, we update the layout of the plot with a title, axis labels, and show the interactive plot using fig.show().


In [8]:
import numpy as np

# Simple Moving Average
window_size = 20
pivot_data['SMA_20'] = pivot_data['AAPL'].rolling(window=window_size).mean()

# Exponential Moving Average
span = 20
pivot_data['EMA_20'] = pivot_data['AAPL'].ewm(span=span, adjust=False).mean()

# Visualize the stock prices along with SMA and EMA
fig = go.Figure()
fig.add_trace(go.Scatter(x=pivot_data.index, y=pivot_data['AAPL'], mode='lines', name='Stock Price'))
fig.add_trace(go.Scatter(x=pivot_data.index, y=pivot_data['SMA_20'], mode='lines', name='SMA (20)'))
fig.add_trace(go.Scatter(x=pivot_data.index, y=pivot_data['EMA_20'], mode='lines', name='EMA (20)'))
fig.update_layout(title='Stock Price with SMA and EMA', xaxis_title='Date', yaxis_title='Price')
fig.show()

#### **Bollinger Bands**

Bollinger Bands are a technical analysis tool used to measure the volatility of a stock price and identify potential overbought or oversold conditions.

In this code, we first calculate the moving average and standard deviation of the stock prices using the rolling() method with window=window_size. The window_size parameter determines the number of observations used for the calculation.
Next, we calculate the upper and lower Bollinger Bands using the following formulas:

* Upper Band = Moving Average + (2 * Standard Deviation)
* Lower Band = Moving Average - (2 * Standard Deviation)

We then create a line plot using Plotly Graph Objects, adding three traces: one for the stock price, one for the upper band, and one for the lower band. The go.Scatter() function is used to create each trace, and the mode='lines' parameter is set to display the data as lines.
Finally, we update the layout of the plot with a title, axis labels, and show the interactive plot using fig.show().

Bollinger Bands can be used to identify potential buy and sell signals. When the stock price touches or crosses the upper band, it may indicate an overbought condition, suggesting a potential sell signal. Conversely, when the stock price touches or crosses the lower band, it may indicate an oversold condition, suggesting a potential buy signal.

In [9]:
# Calculate Bollinger Bands
window_size = 20
moving_avg = pivot_data['AAPL'].rolling(window=window_size).mean()
std_dev = pivot_data['AAPL'].rolling(window=window_size).std()
upper_band = moving_avg + 2 * std_dev
lower_band = moving_avg - 2 * std_dev

# Visualize Bollinger Bands
fig = go.Figure()
fig.add_trace(go.Scatter(x=pivot_data.index, y=pivot_data['AAPL'], mode='lines', name='Stock Price'))
fig.add_trace(go.Scatter(x=pivot_data.index, y=upper_band, mode='lines', name='Upper Band'))
fig.add_trace(go.Scatter(x=pivot_data.index, y=lower_band, mode='lines', name='Lower Band'))
fig.update_layout(title='Bollinger Bands', xaxis_title='Date', yaxis_title='Price')
fig.show()

#### **Relative Strength Index (RSI)**

The Relative Strength Index (RSI) is a momentum oscillator that measures the speed and change of price movements. It is used to identify overbought or oversold conditions in the market.

In this code, we first calculate the daily price changes using the diff() method. We then separate the positive and negative changes into gain and loss variables, respectively, using the where() method.

Next, we calculate the average gain and average loss over the specified window_size (typically 14 days) using the rolling() method with mean().

In [10]:
# Calculate RSI
window_size = 14
delta = pivot_data['AAPL'].diff()
gain = delta.where(delta > 0, 0)
loss = -delta.where(delta < 0, 0)
avg_gain = gain.rolling(window=window_size).mean()
avg_loss = loss.rolling(window=window_size).mean()
rs = avg_gain / avg_loss
rsi = 100 - (100 / (1 + rs))

# Visualize RSI
fig = go.Figure()
fig.add_trace(go.Scatter(x=pivot_data.index, y=pivot_data['AAPL'], mode='lines', name='Stock Price'))
fig.add_trace(go.Scatter(x=pivot_data.index, y=rsi, mode='lines', name='RSI'))
fig.update_layout(title='Relative Strength Index (RSI)', xaxis_title='Date', yaxis_title='Price')
fig.show()

After calculating the RSI, we create a line plot using Plotly Graph Objects. We add two traces: one for the stock price and one for the RSI. The go.Scatter() function is used to create each trace, and the mode='lines' parameter is set to display the data as lines.

Finally, we update the layout of the plot with a title, axis labels, and show the interactive plot using fig.show().

The RSI oscillates between 0 and 100. Typically, an RSI value above 70 is considered overbought, indicating a potential sell signal, while an RSI value below 30 is considered oversold, indicating a potential buy signal.

#### **Monte Carlo Simulations**

Monte Carlo simulations are a computational technique used to model and predict the behavior of complex systems by simulating various scenarios based on random sampling. In the context of stock market analysis, Monte Carlo simulations can be used to estimate the potential range of future stock prices based on historical data and various assumptions.

In this code snippet, we first calculate the daily returns for a specific stock (in this case, 'AAPL') using the pct_change() method and removing any rows with missing values using dropna().

Next, we set the simulation parameters, including the number of simulations (num_simulations) and the number of trading days to simulate (num_trading_days).

We then initialize an array stock_price_sim to store the simulated stock prices for each trading day, starting with the last observed stock price from the dataset (pivot_data['AAPL'].iloc[-1]).

The Monte Carlo simulation is performed using two nested loops. The outer loop iterates over the number of simulations, and the inner loop iterates over the trading days.

For each trading day, we simulate the next stock price by taking the previous day's stock price and multiplying it by (1 + a randomly sampled daily return). The np.random.choice function is used to randomly sample a daily return from the historical daily returns, with replace=True to allow sampling with replacement.

After performing all the simulations, we create a line plot using Plotly Graph Objects, where the x-axis represents the trading days, and the y-axis represents the simulated stock prices. We add a single trace to the plot using go.Scatter with mode='lines'.

Finally, we update the layout of the plot with a title, axis labels, and show the interactive plot using fig.show().

Monte Carlo simulations can provide valuable insights into the potential range of future stock prices by incorporating historical data and random sampling. However, it's important to note that these simulations are based on assumptions and may not accurately capture all the complexities and uncertainties of the real-world stock market.

In [11]:
import numpy as np

# Calculate daily returns
daily_returns = pivot_data['AAPL'].pct_change().dropna()

# Set simulation parameters
num_simulations = 1000
num_trading_days = 252

# Monte Carlo simulations
stock_price_sim = np.zeros(num_trading_days)
stock_price_sim[0] = pivot_data['AAPL'].iloc[-1]

for j in range(num_simulations):
    for i in range(1, num_trading_days):
        stock_price_sim[i] = stock_price_sim[i-1] * (1 + np.random.choice(daily_returns, size=1, replace=True))

# Plot the simulated stock prices
fig = go.Figure()
fig.add_trace(go.Scatter(x=np.arange(num_trading_days), y=stock_price_sim, mode='lines', name='Simulated Stock Price'))
fig.update_layout(title='Monte Carlo Simulation of Stock Price', xaxis_title='Trading Days', yaxis_title='Stock Price')
fig.show()


Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)



#### **Portfolio Optimization (Markowitz Model)**


This code implements the Markowitz Model, a prominent portfolio optimization technique from modern portfolio theory. The goal is to find the optimal weights for each asset in the portfolio that maximizes the expected return for a given level of risk.

First, we calculate the expected returns and covariance matrix of the daily returns. We annualize these values by multiplying by the number of trading days in a year (252).

Next, we define a function portfolio_performance that calculates the Sharpe ratio of a given portfolio. The Sharpe ratio is a metric that measures the risk-adjusted return of a portfolio, taking into account both the expected return and volatility (risk).

We then set up the optimization problem using the minimize function from the SciPy library. We provide the portfolio_performance function as the objective function to be minimized (negated to maximize the Sharpe ratio). We set the initial guess for the weights as equal weights and define the bounds for the weights (between 0 and 1). Additionally, we add a constraint that the sum of the weights must be equal to 1.

The minimize function uses the Sequential Least Squares Programming (SLSQP) method to solve the optimization problem and find the optimal weights that maximize the Sharpe ratio.

Finally, we print the optimal portfolio weights for each asset.

In [12]:
import numpy as np
import pandas as pd
from scipy.optimize import minimize

# Assuming daily_returns is a Series
# Convert daily_returns Series to DataFrame with one column
daily_returns_df = pd.DataFrame(daily_returns, columns=['Returns'])

# Calculate expected returns and covariance matrix
expected_returns = daily_returns_df.mean() * 252  # Annualize returns
cov_matrix = daily_returns_df.cov() * 252  # Annualize covariance matrix

# Define the portfolio optimization function
def portfolio_performance(weights, expected_returns, cov_matrix, risk_free_rate=0.02):
    portfolio_return = np.sum(weights * expected_returns)
    portfolio_volatility = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
    sharpe_ratio = (portfolio_return - risk_free_rate) / portfolio_volatility
    return -sharpe_ratio  # Negate to maximize the Sharpe ratio

# Optimize the portfolio
num_assets = len(expected_returns)
init_weights = np.ones(num_assets) / num_assets  # Equal weights as initial guess
bounds = [(0, 1) for _ in range(num_assets)]  # Set bounds for weights (0 to 1)
constraint = {'type': 'eq', 'fun': lambda x: np.sum(x) - 1}  # Sum of weights must be 1

optimal_weights = minimize(portfolio_performance, init_weights, args=(expected_returns, cov_matrix),
                            method='SLSQP', bounds=bounds, constraints=constraint)

print("Optimal Portfolio Weights:")
for i, weight in enumerate(optimal_weights.x):
    print(f"{daily_returns_df.columns[i]}: {weight:.2f}")


Optimal Portfolio Weights:
Returns: 1.00



Mean of empty slice.


invalid value encountered in divide


Degrees of freedom <= 0 for slice


divide by zero encountered in divide


invalid value encountered in multiply

