<a href="https://colab.research.google.com/github/anaemcaro/QuantitativeAnalysis/blob/main/Quantitative_Analysis_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basic Quantitative Analysis of Market Indexes
**Quantitative Analysis** of the Stock Market is the application of mathematics and statistics to understand, analyze, predict and take decisions on financial investmets.

In this notebook, I am going to perform a Basic Quantitative Analysis to answer the following question about the indexes **S&P500**, **NASDAQ** and **Dow Jones**:

* **Do the three principal indexes behave the same way, even though they are a representation of different sectors of the market?**

These three indexes are also calculated with different methodologies. The result of this analysis can lead to a better understanding of the U.S. stock market and answering to this can help a normal person to decide where to invest their savings and how to balance the investment of that money to obtain a better revenue in the mid- to long-term.

To compare the indexes, I am going to perform the following quantitative analysis:
* **Descriptive Analysis**
* **Volatility Analysis**
* **Correlational Analysis**
* **Performance Analysis**
* **Daily Risk Vs. Return Analysis**
* **Time Series Analysis**

All the analysis are going to be based on the daily closing price for each one of the indexes during the last two years.


In [19]:
# Data manipulation
import pandas as pd

# Data Visualization
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio

In [20]:
pio.templates.default = "plotly_white"

In [21]:
# Load Dataset
# Write the correct path of the file
indexes = pd.read_csv('/content/drive/MyDrive/Assets/index_data.csv')
indexes['Date'] = pd.to_datetime(indexes['Date'])
indexes.set_index('Date', inplace=True)
indexes

Unnamed: 0_level_0,DowJones,NASDAQ,SP500
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2022-02-28,33892.601562,13751.400391,4373.939941
2022-03-01,33294.949219,13532.459961,4306.259766
2022-03-02,33891.351562,13752.019531,4386.540039
2022-03-03,33794.660156,13537.940430,4363.490234
2022-03-04,33614.800781,13313.440430,4328.870117
...,...,...,...
2024-02-20,38563.800781,15630.780273,4975.509766
2024-02-21,38612.238281,15580.870117,4981.799805
2024-02-22,39069.109375,16041.620117,5087.029785
2024-02-23,39131.531250,15996.820312,5088.799805


The dataset has the daily Closing price for each of the indexes in a two-year period.



## Descriptive Statistics Analysis

This analysis intends to describe the characteristics of the data in terms of measures of central tendency (mean), measures of variability (standard deviation - std, min, max), and frequency distribution (percentils 25%, 50% 75%).

In [22]:
# Descriptive Statistics
desc_stats = indexes.describe()
desc_stats

Unnamed: 0,DowJones,NASDAQ,SP500
count,501.0,501.0,501.0
mean,33741.720625,12687.791115,4219.886035
std,1981.121175,1418.463676,325.163433
min,28725.509766,10213.290039,3577.030029
25%,32799.921875,11466.980469,3970.040039
50%,33684.53125,12500.570312,4158.240234
75%,34517.730469,13751.400391,4451.140137
max,39131.53125,16041.620117,5088.799805


Here we can see that the **Dow Jones** has the highest standard deviation of 1981, but it also moves in a higher range of prices from 28726 to 39132 with an average (mean) of 33742.

Meanwhile, **S&P500** has the lowest standard deviation of 325 in the also lowest range of price from 3577 to 5089 with an average price of 4220.

The **NASDAQ** index is in between the other two with a standard deviation of 1418 and an average price of 12688 that moves form 10213 and 16042.

## Volatility Analysis

Because the standard deviation represents how far the values move from the mean or average, it could be used as a messure of volatility. However, the range of prices for the three indexes are different, so the standard deviation in absolute values can't be compared between the three indexes. For a better approach, here I am going to use the **Coefficient of Variation (CV)**, that meassures the variation of the price in relation with the mean

In [23]:
# Transpose the dataframe for easier manipulation
desc_stats = desc_stats.transpose()

# Add aa new column with de Coefficient of Variation (CV) multiplied by 100 to make it percents
desc_stats['CV'] = desc_stats.apply(lambda row: (100*row['std']/row['mean']), axis=1)
desc_stats

Unnamed: 0,count,mean,std,min,25%,50%,75%,max,CV
DowJones,501.0,33741.720625,1981.121175,28725.509766,32799.921875,33684.53125,34517.730469,39131.53125,5.871429
NASDAQ,501.0,12687.791115,1418.463676,10213.290039,11466.980469,12500.570312,13751.400391,16041.620117,11.179753
SP500,501.0,4219.886035,325.163433,3577.030029,3970.040039,4158.240234,4451.140137,5088.799805,7.705503


In [24]:
# Graphic for Standard Deviation
fig = px.bar(desc_stats,
             x=desc_stats.index,
             y=desc_stats['std'],
             labels={'y': 'Standard Deviation', 'x': 'Index'},
             title='Standard Deviation')

# Show the figure
fig.show()

# Graphic for Coefficient of Variation
fig = px.bar(desc_stats,
             x=desc_stats.index,
             y=desc_stats.CV,
             labels={'y': 'Coefficient of Variation', 'x': 'Index'},
             title='Volatility of Closing Prices (Coefficient of Variation)')

# Show the figure
fig.show()

Here we can see that, while the Dow Jones has the highest standard deviation, it is the least volatile (6%). The most volatile of the three indexes is NASDAQ with 11% and S&P500 is in between with 8%, even when it has the lowest standard deviation.

All three indexes have low volatility.

## Correlation Analysis

The correlation analysis indicates how close are related the variables (indexes).

In [25]:
corr_matrix = indexes.corr()
corr_matrix

Unnamed: 0,DowJones,NASDAQ,SP500
DowJones,1.0,0.843789,0.926604
NASDAQ,0.843789,1.0,0.98127
SP500,0.926604,0.98127,1.0


In [26]:
fig = go.Figure(data=go.Heatmap(
                    z=corr_matrix,
                    x=corr_matrix.columns,
                    y=corr_matrix.columns,
                    colorscale='blues',
                    colorbar=dict(title='Correlation'),
                    ))

# Update layout
fig.update_layout(
    title='Correlation Matrix of Closing Prices',
    xaxis_title='Index',
    yaxis_title='Index'
)

# Show the figure
fig.show()

The correlation between these three indexes is direct and linear (all above 0.8), meaning that if one of them moves in one direction or the other, there are big chances of the other two move similarly in the same direction. This conclusion is expected, because the indexes can be considered a representation of the market in general.

## Performance Analysis

In this analysis, I am going to compare the performance of the three indexes over the 2 years.  This performance is calculated with the percentage of change of the closing price for each index from the begining to the end of the period.

In [27]:
# Calculating the percentage change in closing prices during the timeframe
desc_stats['Change'] = ((indexes.iloc[-1] - indexes.iloc[0]) / indexes.iloc[0]) * 100
desc_stats

Unnamed: 0,count,mean,std,min,25%,50%,75%,max,CV,Change
DowJones,501.0,33741.720625,1981.121175,28725.509766,32799.921875,33684.53125,34517.730469,39131.53125,5.871429,15.273625
NASDAQ,501.0,12687.791115,1418.463676,10213.290039,11466.980469,12500.570312,13751.400391,16041.620117,11.179753,16.179077
SP500,501.0,4219.886035,325.163433,3577.030029,3970.040039,4158.240234,4451.140137,5088.799805,7.705503,15.90305


In [28]:
fig = px.bar(desc_stats['Change'],
             x=desc_stats['Change'].index,
             y=desc_stats['Change'].values,
             labels={'y': 'Percentage Change (%)', 'x': 'Ticker'},
             title='Percentage Change in Closing Prices')

# Show the plot
fig.show()

All three indexes changed similarly over the 2-year period, with and increment of 15% to 16% in their value

## Daily Risk Vs. Return Analysis

In this part, I am going to calculate the average of the daily percentage of change in indexes' closing prices as a representation of the Return, and its  standard deviation as meassure of the Risk.


In [29]:
# Calculate the daily percentage of change
daily_change =  indexes.pct_change().dropna()*100
daily_change

Unnamed: 0_level_0,DowJones,NASDAQ,SP500
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2022-03-01,-1.763371,-1.592132,-1.547350
2022-03-02,1.791270,1.622466,1.864269
2022-03-03,-0.285298,-1.556710,-0.525467
2022-03-04,-0.532212,-1.658302,-0.793404
2022-03-07,-2.372234,-3.624010,-2.951816
...,...,...,...
2024-02-20,-0.166168,-0.918315,-0.600532
2024-02-21,0.125604,-0.319307,0.126420
2024-02-22,1.183229,2.957152,2.112288
2024-02-23,0.159773,-0.279272,0.034795


In [30]:
# Creating a DataFrame for plotting
risk_return = pd.DataFrame({'Risk': daily_change.std(), 'Average Daily Return': daily_change.mean()})
risk_return

Unnamed: 0,Risk,Average Daily Return
DowJones,0.992886,0.033356
NASDAQ,1.564551,0.042217
SP500,1.192253,0.036619


In [31]:
fig = go.Figure()

# Add scatter plot points
fig.add_trace(go.Scatter(
    x=risk_return['Risk'],
    y=risk_return['Average Daily Return'],
    mode='markers+text',
    text=risk_return.index,
    textposition="top center",
    marker=dict(size=10)
))

# Update layout
fig.update_layout(
    title='Risk vs. Return Analysis',
    xaxis_title='Risk (Standard Deviation)',
    yaxis_title='Average Daily Return %',
    showlegend=False
)

# Show the plot
fig.show()

Here we can see that Dow Jones has the lowest risk but also the lowest average daily return. NASDAQ has the highest values for both measures. And S&P500 is in between tending to have lower risk

## Time Series Analysis

In [32]:
# Time Series Analysis
fig = make_subplots(rows=1, cols=1)
for column in indexes.columns:
    fig.add_trace(
        go.Scatter(x=indexes.index, y=indexes[column], name=column),
        row=1, col=1
    )

# Document the graph
fig.update_layout(
    title_text='Time Series of Closing Prices',
    xaxis_title='Date',
    yaxis_title='Closing Price',
    legend_title='Name',
    showlegend=True
)
fig.show()

In the graphic above, the fluctuations in the S&P500 and Nasdaq are almost impossible to see, because these two indexes move in a range of prices that is much lower than the Dow Jones.

To solve this, I decided to plot each index in a separate graph, to be able to see each behavior in its range of prices; I also made them smaller to be able to see them on the screen and facilitate the comparison.

In [33]:
# Make three differnt, smaller plots to make easier to observe the patterns
for col in indexes.columns:
  fig = px.line(indexes, x=indexes.index, y=indexes[col],
                width=500, height=200)
  fig.update_layout(
    title_text='Time Series of Closing Prices '+col,
    xaxis_title='Date',
    yaxis_title='Closing Price',
    legend_title='Name',
    showlegend=True)
  fig.show()

Observations:

* The graphics are very similar for the three indexes in their own range of prices. This confirms the high correlation calculated before.
* In the first months there was a downward trend that is recovering and now they have upward trend

## Conclusion

This Basic Quantitative Analysis of the Dow Jones, S&P500 and Nasdaq indexes; shows that they have a strong Correlation and are similar in terms of Volatility, Performance, Risk and Trends.  

That means that the three of them are good to invest.

Although I am not qualified to issue any financial recommendation, my conclusion from this analysis would be to invest in any or all the three indexes, as they represent a bigger revenue than the traditional bank account with a relatively low risk.

**Note**: You can't buy Indexes directly, but an index-linked investment product that is created by a financial institution that is designed to replicate the performance of the underlying index as closely as possible.

## Reflection

Here, I have chosen to perform a Basic Quantitative Analysis of the three most representative indexes in the stock market. It turned out that the three of them have very similar characteristics. I think this happens because the indexes, each one in its particular area, reflect the market in general.

That raises other questions:

* **What could be the impact on the economy if the indexes didn’t behave in a similar way? What event could cause such a behavior?**

* **What happens with this analysis if it is applied to other investments, such as stocks that are in and out of the indexes, or crypto?**

These questions could lead to a more in depth understanding of the market and help investors that are willing to take more risks with their investments.
