# Analyzing Stock Data to Identify Investment Risk

Even though the internet dates back to the 70s when the internet was still in its infancy, it was during the 90s when it became a mainstay in homes across the globe. And with new age of information, so did the people who set up shop on the space.

It was during that time when companies began setting up virtual marketplaces, namely Jeff Bezo's Amazon and Pierre Omidyar's eBay to match buyers and sellers of goods and services. Although these two are strong household names, how do these companies hold up when it comes to investing in them now?

In this analysis, we will analyze the risk and return for each of these E-commerce companies by calculating the rates of return as well as other key statistics such as variance and correlation for assessing risk.

To ensure that the plots display correctly in this document, run the code cell below to reload modules before executing user code to make workflow possible.

In [1]:
%load_ext autoreload
%autoreload 2

## Risk Investment Analysis Helper Functions

In order to calculate the risk and reward between the two companies, we must first set up basic functions to calculate log return, variance, standard deviation, and the correlation coefficient (we will go over why these are important later.)

But first, let's import the necessary math functions as well as a function for displaying a decimal as a percentage

In [2]:
from math import log, sqrt

def display_as_percentage(val):
    return '{:.1f}%'.format(val*100)

### Logarithmic Rate of Return

The first helper function that we need to write is the **logarithmic rate of return** also known as the **continuously compounded return** given by the formula below.


$$
r = log(E) - log(S) = log(\frac{E}{S})
$$

where 

- `r`: logarithmic rate of return
- `S`: starting price of investment
- `E`: ending price of investment

This formula calculates the expected return for an investment where the earnings are assumed to be continually reinvested over a certain time period.

In [3]:
def calculate_log_return(start_price, end_price):
    return log(end_price / start_price)

### Variance

To assess the risk involved in an investment, one of the key statistics to understand risk is **variance** which measures the spread of a dataset or how far apart each value is from the mean. The greater the variance, the more spread out or variable the data is represented by the formula below.

$$\sigma^{2} = \frac{\Sigma(X_{i} - \bar{X})^{2}}{n}$$

- `σ2`: variance
- `Xi`: the ith value in the dataset
- `X̄`: the mean of the dataset
- `n`: the number of values in the dataset


In the context of investing, an asset with a high variance is generally a riskier one because its return can vary significantly in a short period of time, making it less stable and more unpredictable.

In [4]:
def calculate_variance(dataset):
    mean = sum(dataset)/len(dataset)
    numerator = 0
    for data in dataset:
        numerator += (data-mean) ** 2
    return numerator / len(dataset)

### Standard Deviation


Although the variance is useful in determining the relative risk of an investment, it is sometimes not the easiest statistic to interpret since it does not have the same unit as the original data. Alternatively, we can use the standard deviation to describe the spread of the dataset.

And in this case, we can calculate the standard deviation simply as the square root of the variance.

$$\sigma = \sqrt{\frac{\Sigma(X_{i} - \bar{X})^{2}}{n}}$$

- `σ`: standard deviation
- `Xi`: the ith value in the dataset
- `X̄`: the mean of the dataset
- `n`: the number of values in the dataset

In [5]:
def calculate_stddev(dataset):
    variance = calculate_variance(dataset)
    return sqrt(variance)

### Calculating Correlation


Another important statistic for assessing risk is the correlation between the returns of two assets. Correlation is a measure of how closely two datasets are associated with each other. It is often represented by the correlation coefficient, which is a value that ranges between -1 and 1. This indicates whether there is a positive correlation, negative correlation, or no correlation.

In finance, two assets from the same industry generally have a positive correlation, as they are likely affected by similar external conditions. So for example, automobile stocks may be positively correlated with each other while oil stocks may be negatively correlated with automobile stocks.

When building a portfolio, it is generally a good idea to include assets that are not correlated with each other. If assets are independent of one another, then there is a lower risk of the financial loss that can occur when assets in a portfolio are correlated. This allows for greater diversification and balances out the overall risk and return of the portfolio.

And to get a single value that tells us the relationship between two continuous variables,, we use the Pearson Correlation Coefficient that measures both the strength and direction of the linear relationship.

We use the formula shown below:

$$\rho = \frac{\text{cov}(X,Y)}{\sigma_x \sigma_y}$$

- `x̄` = Mean of x

- `ȳ` = Mean of y

In [6]:
# Calculate Correlation Coefficient
def calculate_correlation(set_x, set_y):
    sum_x = sum(set_x)
    sum_y = sum(set_y)
    
    sum_x2 = sum([x ** 2 for x in set_x])
    sum_y2 = sum([y ** 2 for y in set_y])
    sum_xy = sum([x * y for x,y in zip(set_x, set_y)])
    
    n = len(set_x)
    numerator = n * sum_xy - sum_x * sum_y
    denominator = sqrt((n * sum_x2 - sum_x ** 2) * (n * sum_y2 - sum_y ** 2))
    return numerator / denominator

## Calculate Rate of Return

2. Let's start by calculating the logarithmic rates of return from the stock prices. Define a function called `get_returns()` that takes a parameter called `prices`, which will be a list of stock prices.

   The function will eventually return a list of log returns calculated from each adjacent pair of prices. For now, create a variable called `returns` inside the function body, and set it equal to an empty list.
   
   Next, use a `for` loop to iterate over the `prices` list, from the 1st element to the 2nd to last.

   You will be accessing the elements by their indices, so use the `range()` function to help generate the sequence of numbers to iterate over. Recall that Python uses zero-based indexing, meaning the index numbering of the list starts with `0` and ends with `n - 1`, where `n` is the number of elements in the list.
   
   Comment out the `for` loop for now so that the kernel doesn't throw an error at you.

In [7]:
amazon_prices = [1699.8, 1777.44, 2012.71, 2003.0, 1598.01, 1690.17, 1501.97, 1718.73, 1639.83, 1780.75, 1926.52, 1775.07, 1893.63]
ebay_prices = [35.98, 33.2, 34.35, 32.77, 28.81, 29.62, 27.86, 33.39, 37.01, 37.0, 38.6, 35.93, 39.5]

def get_returns(prices):
    returns = []
    for i in range(len(prices)-1):
        start_price = prices[i]
        end_price = prices[i+1]
        returns.append(calculate_log_return(start_price, end_price))
    return returns

4. As you iterate over each index `i`, the element in the `prices` list that is at position `i` will be the start price and the element with index `i + 1` will be the end price. Use the `calculate_log_return()` function to calculate the rate of return from the start and end prices. Then, append the rate of return to the `returns` list.

   After your for loop, return `returns` from the `get_returns()` function.

5. Use the `get_returns()` function to find the monthly log rates of return from the Amazon and eBay stock prices. Store those list of returns in the variables `amazon_returns` and `ebay_returns`, respectively.

In [8]:
amazon_returns = get_returns(amazon_prices)
ebay_returns = get_returns(ebay_prices)

6. Time to print out the lists of monthly returns! Since rates of return is often expressed as a percentage, use the `display_as_percentage()` function and list comprehension to display each value in `amazon_returns` and `ebay_returns` as a percentage.

   How do the monthly returns of the two stocks compare? Are they on average profitable?

In [9]:
[display_as_percentage(price_return) for price_return in amazon_returns]

['4.5%',
 '12.4%',
 '-0.5%',
 '-22.6%',
 '5.6%',
 '-11.8%',
 '13.5%',
 '-4.7%',
 '8.2%',
 '7.9%',
 '-8.2%',
 '6.5%']

In [10]:
[display_as_percentage(price_return) for price_return in ebay_returns]

['-8.0%',
 '3.4%',
 '-4.7%',
 '-12.9%',
 '2.8%',
 '-6.1%',
 '18.1%',
 '10.3%',
 '-0.0%',
 '4.2%',
 '-7.2%',
 '9.5%']

7. Now, let's calculate the annual rate of return for each stock!

   Recall that log returns can easily be aggregated over time. Since `amazon_returns` and `ebay_returns` contain the monthly log returns for all 12 months in the past year, the annual return is simply the sum of all monthly returns. Use the `display_as_percentage()` function to help format the annual return as a percentage when you print out the results.
   
   How do the annual returns of the two stocks compare?

In [11]:
amzn_annual_return = sum(amazon_returns)
ebay_annual_return = sum(ebay_returns)
print(amzn_annual_return)
print(ebay_annual_return)

0.1079850248487889
0.09333744338468924


## Assess Investment Risk

8. Let's move on to assessing the risk of each investment! Start by calculating the variance of each stock's monthly returns. Use the `calculate_variance()` function we provided in the first task and print out the results.

   How do the variance for each stock compare? What does this tell you about their relative risk?

In [12]:
print(calculate_variance(amazon_returns))
print(calculate_variance(ebay_returns))

0.010738060556609724
0.007459046435081462


9. Now, calculate the standard deviation of each stock's monthly returns using the `calculate_stddev()` function, and print out the results.

   Recall that the standard deviation has the same unit of measurement as the original dataset, or the monthly returns in this case. Since rates of return are often expressed as a percentage, use `display_as_percentage()` to help format the standard deviation for easier interpretation.

In [13]:
print(display_as_percentage(calculate_stddev(amazon_returns)))
print(display_as_percentage(calculate_stddev(ebay_returns)))

10.4%
8.6%


10. Finally, calculate the correlation between the stock returns using the `calculate_correlation()` function, and print out the results.

    Are Amazon and eBay stock returns strongly or weakly correlated? Is the correlation positive or negative?

In [14]:
print(calculate_correlation(amazon_returns, ebay_returns))

0.6776978564073072
