# Financial data analysis with Python.

Python, one of the most widely used programming languages. Apart from making fun computergames like 'rock, paper, siccors', it can also be used to solve financial problems. Like analysing America's biggest indexes, the NASDAQ100 and SP500. That is what I am going to show you. I will walk through a few of the basics of financial data analysis. But in an interactive way, because you can 'play' with the code yourself. 

### Importing the data.

To start, it is necessary to import various python libaries. Such as the 'pandas_datareader' with which the prices of America's largest indexes will be imported. Namely, the NASDAQ100 and SP500.

In [None]:
import pandas as pd
import pandas_datareader as wd
import pandas_datareader.data as web
from datetime import datetime
from matplotlib import pyplot as plt

Secondly, a start and end time of the analysis must be defined. Below, a period between 1-6-2021 and 1-1-2022 has been chosen, but this can of course be changed to one's own preference.

In [None]:
start = datetime(2021, 6, 1)
end = datetime(2022, 1, 1)

Now comes the most important part of the analysis, importing the data. Below, we import the stock prices of the NASDAQ100 and the SP500 from FRED, 'Federal Reserve Economic Data'. Here you can see that this is done from the start to the end time. <br>
In addition, the function .dropna() is applied to the just obtained datasets so that the dates for which there is no complete data are removed. 

In [None]:
nasdaq_data = web.DataReader("NASDAQ100", "fred", start, end)
sp500_data = web.DataReader("SP500", "fred", start, end)
nasdaq_data = nasdaq_data.dropna()
sp500_data = sp500_data.dropna()

### Visualising the data.

In between, we can also visualise this data. Using the 'matplotlib' library we can plot the prices in graphs. 

In [None]:
plt.subplot(1,2,1)
plt.plot(nasdaq_data)
plt.title('NASDAQ')
plt.xticks(rotation=90)

plt.subplot(1,2,2)
plt.plot(sp500_data)
plt.title('SP500')
plt.xticks(rotation=90)

plt.show()

Now the analysis of the data can really begin. To start, we simply convert the prices into indices where the index at the start time equals 100.

In [None]:
#indices
nasdaq_index = []
index = nasdaq_data.iloc[0,0] / 100
for item in nasdaq_data['NASDAQ100']:
    nasdaq_index.append(item / index)
    
sp500_index = []
index = sp500_data.iloc[0,0] / 100
for item in sp500_data['SP500']:
    sp500_index.append(item / index)

Now the stock prices can be compared with each other, as they can be displayed side by side.

In [None]:
plt.plot(nasdaq_index)
plt.plot(sp500_index)
plt.legend(['NASDAQ100', 'SP500'])
plt.title('Stock prices in indices.')
plt.xticks(rotation=90)
plt.show()

### Analysing the data.

The rates of the NASDAQ and SP500 change from the start to the end time, of course, but how much do they change? Does the price change quickly? This can be expressed in a number, namely the variance. A variable that measures the dispersion of set of numbers, in this case stock prices. This can also be expressed in the formula below.

$
\text{Variance} \\
$
$$
S^2 = \frac{\Sigma{(x_{i} -  \bar{x})^2}}{n - 1}
$$


Here, $x_{i}$ stands for the value at one moment in time, $\bar{x}$ is the mean of all the values and $n$ the number of values. These steps are also reflected in the code, where first the mean is calculated, then all values are looped through and finally it is devided by the total number of values to calculate the variance.

In [None]:
# mean
nasdaq_mean = nasdaq_data["NASDAQ100"].mean()
sp500_mean = sp500_data['SP500'].mean()

# variance
n = 0
for item in nasdaq_data['NASDAQ100']:
    n += (item - nasdaq_mean)**2
nasdaq_variance = n / len(nasdaq_data['NASDAQ100'])

n = 0
for item in sp500_data['SP500']:
   n += (item - sp500_mean)**2
sp500_variance = n / len(sp500_data['SP500'])

print(nasdaq_variance)
print(sp500_variance)


This value is really large because we are looking at a very long period instead of a shorter period which is analysed more often. Therefore this value normally is smaller. <br>
In addition, another value can be directly calculated from this volatility, namely the standard deviation. This is also a term that is used to indicate the dispersion of a set of numbers. This value is the square root of the volatility.

$
\text{Standard deviation} \\
$
$$
S = \sqrt{\frac{\Sigma{(x_{i} -  \bar{x})^2}}{n - 1}}
$$

In [None]:
# standard deviation
nasdaq_stdev = nasdaq_variance**0.5
sp500_stdev = sp500_variance**0.5

print(nasdaq_stdev)
print(sp500_stdev)

So the conclusion is that the deviation of NASDAQ is greater than that of the SP500. This is also true in reality since the NASDAQ is an index that is spread over 100 reasonably related shares, whereas the SP500 is spread over 500 less related shares, so the deviation is less, because if one stock in SP500 decreases in price it less likely that other shares will do the same. So here the risk is lower, but there is also less chance to make a large profit. So it may be a good investment for more risk averse people.

In addition to calculating the NASDAQ or SP500 individually, both indices can also be compared with each other. How are the two indices related to each other? If one rises, is it likely that the other will rise as well? All questions that can be answered with the correlation, a coefficient that shows the degree to which variables are related. There is a formula for this, but fortunately there is also a function in the Pandas library that does this automatically.

$
\text{Pearson correlation coefficient} \\
$
$$
r = \frac{\Sigma{(x_{i} -  \bar{x})(y_{i} -  \bar{y})}}{\sqrt{\Sigma{(x_{i} -  \bar{x})^2\Sigma(y_{i} -\bar{y})^2}}}
$$

In [None]:
# correlation
correlation = nasdaq_data['NASDAQ100'].corr(sp500_data['SP500'])

print(correlation)

This shows that the NASDAQ and the SP500 are very related to each other. The correlation takes a value between -1 and 1. Whereby a negative correlation means that if one rises in price, the other has a great chance of decreasing in price. This high positive correlation shows that if one rises, the chance that the other one rises as well is very large. This can also be seen in the earlier made graphs. This means that investing in both indices at the same time is risky, because if you make a loss on one of them, there is a big chance that you will also make a loss on the other.

### The conclusion.

So, with the help of Python, financial data can be easily imported, visualised and analysed. In this article, only the tip of the iceberg is shown. Of course, much more is possible in the area of visualisation and analysis. In addition, there are a number of follow-up steps, namely predicting the data and letting the computer trade on the stock market without human intervention.


### The sources.

The following sources have been used to make this article: <br>
https://www.codecademy.com/learn/paths/finance-python <br>
https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html <br>
https://github.com/dunovank/jupyter-themes#monospace-fonts-code-cells <br>
https://jupyterbook.org/content/math.html <br>
https://medium.com/analytics-vidhya/writing-math-equations-in-jupyter-notebook-a-naive-introduction-a5ce87b9a214 <br>
https://fred.stlouisfed.org/ <br>
https://www.investopedia.com/terms/v/variance.asp <br>
https://corporatefinanceinstitute.com/resources/knowledge/finance/correlation/ <br>