# Activity - Chapter 8
In this activity we explore the possibility of using the normal distribution to understand the daily returns of the stock price. By the end of the activity you should have an opinion of whether the normal distribution is an appropriate model for daily returns of stocks.

In this example we will use the daily information of the Microsoft stock provided by yahoo finance. 

In [24]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

%matplotlib inline

1.	Using pandas, read the csv file named “MSFT.csv” from the data folder. Name your dataframe "msft"

In [5]:
msft = pd.read_csv('../data/MSFT.csv')

2.	Optionally, rename the columns so they are easy to work with

In [6]:
msft.rename(
    columns=lambda x: x.lower().replace(' ', '_'),
    inplace=True
)

3.	Transform the date column to a proper datetime column

In [7]:
msft['date'] = pd.to_datetime(msft['date'])

4.	Set the “Date” column as index of the dataframe

In [8]:
msft.set_index('date', inplace = True)

5.	In finance, the daily returns of a stock are defined as the percentage change of the daily closing price. Create the "returns" column in the msft DataFrame by calculating the percent change of the Adj Close column. Use the pct_change Series pandas method.

In [9]:
msft['returns'] = msft['adj_close'].pct_change()

6.	Restrict the analysis period to the dates between 2014-01-01 and 2018-12-31 (inclusive)

In [10]:
start_date = '2014-01-01'
end_date = '2018-12-31'
msft = msft.loc[start_date: end_date]

7.	Use a histogram to visualize the distribution of the returns column, use 40 bins. Does it look like a Normal distribution?

In [None]:
msft['returns'].hist(ec='k', bins=40);

8.	Calculate the descriptive statistics of the returns column

In [13]:
msft['returns'].describe()

count    1258.000000
mean        0.000996
std         0.014591
min        -0.092534
25%        -0.005956
50%         0.000651
75%         0.007830
max         0.104522
Name: returns, dtype: float64

9.	Create a random variable named R_rv which will represent “The daily returns of the MSFT stock”, use the mean and standard deviation of the return column as the parameters for this distribution

In [17]:
R_mean = msft['returns'].mean()
R_std = msft['returns'].std()

R_rv = stats.norm(
    loc = R_mean,
    scale = R_std
)

In [32]:
R_rv.mean()

0.0009959218960820999

10.	Plot the probability distribution R_rv and the histogram of the actual data, use plt.hist() function with the parameter density=True so both the real data and the theoretical distribution appear in the same scale. 

In [None]:
fig, ax = plt.subplots()

ax.hist(
    x = msft['returns'],
    ec = 'k',
    bins = 40,
    density = True,
);

x_values = np.linspace(msft['returns'].min(), msft['returns'].max(), num=100)
densities = R_rv.pdf(x_values)
ax.plot(x_values, densities, color='r')
ax.grid();

11.	After looking at the preceding plot: would you say that the Normal distribution provides an accurate model for the daily returns of the Microsoft stock? 

12.	Additional practice: repeat the above steps with the “PG.csv” which contains the information about the Procter and Gamble stock. 