## Basic Terms

** `Mean`: Sum of all values in the sample divided by the number of values in the sample. The price of a stock will revert back to it's mean value. This is known as mean reversion.

** `Median`: The value at the midpoint in a set of observed values.

** `Mode`: The most frequently occurring value in a set of values.

** `Variable`: Any characteristic that can be measured and change.

** `Range`: Difference between the lowest and highest values in a set of observations.

** `Variance`: Average of the squared differences of all values from the mean.

** `Standard deviation`: Square root of the variance; a measure used to quantify the average change around the mean. The volatility of a price is quantified by standard deviation.

** `Normal distribution`: A probability distribution symmetric around the mean, where data near the mean are more frequent in occurrence than data far from the mean. More specifically, approximately 68% of values fall within one standard deviation of the mean, 95% within two, and 99% within three.

In [1]:
# Import Modules
import pandas as pd
from pathlib import Path


In [2]:
# Set paths to csv files
tsla_csv_path = Path("../Resources/tsla.csv")
sp500_path = Path("../Resources/sp500.csv")

In [3]:
# Read in csv files
tsla_df = pd.read_csv(
    tsla_csv_path, 
    index_col="date", 
    infer_datetime_format=True, 
    parse_dates=True
)

sp500_df = pd.read_csv(
    sp500_path, 
    index_col="date", 
    infer_datetime_format=True, 
    parse_dates=True
)


In [6]:
# Calculate the daily percent changes and drop n/a values
tsla_df["pct_change"] = tsla_df.close.pct_change()
tsla_df = tsla_df.dropna()

In [7]:
tsla_df

Unnamed: 0_level_0,close,pct_change
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2014-05-21,199.45,0.021249
2014-05-22,204.88,0.027225
2014-05-23,207.30,0.011812
2014-05-27,211.56,0.020550
2014-05-28,210.24,-0.006239
...,...,...
2019-05-09,241.98,-0.011681
2019-05-10,239.52,-0.010166
2019-05-13,227.01,-0.052229
2019-05-14,232.31,0.023347


In [8]:
# Preview the data
tsla_df

Unnamed: 0_level_0,close,pct_change
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2014-05-21,199.45,0.021249
2014-05-22,204.88,0.027225
2014-05-23,207.30,0.011812
2014-05-27,211.56,0.020550
2014-05-28,210.24,-0.006239
...,...,...
2019-05-09,241.98,-0.011681
2019-05-10,239.52,-0.010166
2019-05-13,227.01,-0.052229
2019-05-14,232.31,0.023347


In [10]:
# Calculate the daily percent changes and drop n/a values
sp500_df["pct_change"] = sp500_df["close"].pct_change()

sp500_df = sp500_df.dropna()
sp500_df

Unnamed: 0_level_0,close,pct_change
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2014-05-21,1888.03,0.008116
2014-05-22,1892.49,0.002362
2014-05-23,1900.53,0.004248
2014-05-27,1911.91,0.005988
2014-05-28,1909.78,-0.001114
...,...,...
2019-05-09,2870.72,-0.003021
2019-05-10,2881.40,0.003720
2019-05-13,2811.87,-0.024131
2019-05-14,2834.41,0.008016


In [11]:
# Preview the data
sp500_df

Unnamed: 0_level_0,close,pct_change
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2014-05-21,1888.03,0.008116
2014-05-22,1892.49,0.002362
2014-05-23,1900.53,0.004248
2014-05-27,1911.91,0.005988
2014-05-28,1909.78,-0.001114
...,...,...
2019-05-09,2870.72,-0.003021
2019-05-10,2881.40,0.003720
2019-05-13,2811.87,-0.024131
2019-05-14,2834.41,0.008016


In [18]:
# Create a function named 'calculate_mean'.
# We choose a function name that will not conflict with any modules that may have been imported.

def calculate_mean(data_set):
    sum = 0
    length = 0
    
    for value in data_set:
        sum = sum + value
        length = length + 1
        avg = sum / length
    return avg





In [19]:
# Calculate the mean for Tesla.
tsla_mean = calculate_mean(tsla_df["pct_change"])

In [20]:
# Verify the calculated value with pandas mean() method
tsla_df["pct_change"].mean()

0.0005143162010646481

<img src="https://dm2302files.storage.live.com/y4mlFcKHdfKj1n_bo-Ls-K8gGNozI8HlsX0cnkSEnemn5B5cofBLPpvxUDnTUOELCY5NdVbUI-c-S1uASmMzakRuus4RY5ygi4wQsmrcapGKEP9OeNs5VeNxl_CwvkTUt9yqnjQzVoWXt6cV4luu_ipsS91IcUY-wZqxpoVpEuuKeS3Czj-KzGd6Z3pxBZlM1nr?width=1000&height=313&cropmode=none" width="1000" height="313" />

In [25]:
# Create a function named 'calculate_variance'.
# We choose a function name that will not conflict with any modules that may have been imported.
# Variance is the squared average change around the mean
# It should be noted that sample variance is also used.
# It removes an element from the length of the dataset.
# We square the difference between the value and the mean of the dataset each time because we don't
# want negative values for variance. The reason is because we take the square root of variance for
# standard deviation and this will introduce imaginary numbers.

def calculate_variance(data_set):
    sum = 0
    length = 0
    mean_value = calculate_mean(data_set)
    for value in data_set:
        sum = sum + (value - mean_value) ** 2
        length = length + 1
#         variance = sum / (length - 1)
    return sum / (length - 1)
        


In [26]:
# Calculate the variance for Tesla.
tsla_variance = calculate_variance(tsla_df["pct_change"])
tsla_variance

0.000756907117811057

In [27]:
# Verify the calculated value with pandas var() method
tsla_df["pct_change"].var()

0.000756907117811057

In [28]:
# Create a function named 'calculate_standard_deviation'.
def calculate_standard_deviation(data_set):
    return (calculate_variance(data_set)) ** (1 / 2)


In [29]:
# Calculate the variance for Tesla.
tsla_std = calculate_standard_deviation(tsla_df["pct_change"])
print(tsla_std)

0.027511945002326843


In [30]:
# Verify the calculated value with pandas std() method
tsla_df["pct_change"].std()

0.027511945002326843

In [31]:
# Create a function to check to most recent price against the mean price
# to determine if the stock is overvalued.
def recent_vs_mean(current_price, mean_price):
    if current_price > mean_price:
        print("Stock Overvalued")
    elif current_price < mean_price:
        print("stock undervalued")
    else:
        print("The stock mean value is: ")



In [37]:
# Get tesla's most recent value. We can use -1 to get the last element.
tsla_df_recent_value = tsla_df["close"][-1]
tsla_df_recent_value

tsla_df_mean_price = tsla_df["close"].mean()
tsla_df_mean_price

265.2580637450203

In [38]:
# Check if tesla is overvalued or not
recent_vs_mean(tsla_df_recent_value,tsla_df_mean_price)

stock undervalued


In [None]:
# Calculate mean for sp500


In [None]:
# Get most recent value for sp500.


In [None]:
# Calculate the standard deviation for the sp500


In [None]:
# Check if the sp500 is overvalued or not


In [40]:
# Create a function to compare the volatility with the underlying market
def compare_volatility(stock_std, market_std):
    if stock_std > market_std:
        print("The stock is more volatile than the market")
    else:
        print("The stock is less volatile than the market")

In [42]:
# Compare the volatility of tesla and the sp500
sp500_std = sp500_df["pct_change"].std()
compare_volatility(tsla_std,sp500_std)

The stock is more volatile than the market
