---
## <font color=#FF8181>Unit 9 - Exercises: </font>

In the following exercises, we will leverage knowledge from previous units and test our understanding on __Time Series__. We shall be using YahooFinancials to conduct Time Series Analysis on historical stock prices. Refer to the materials on Time Series and Aggregations for guidance on how to obtain data from YahooFinancials.

In [3]:
import pandas as pd
import numpy as np
from yahoofinancials import YahooFinancials

---
### <font color=#14F278> Task 1:  </font>

<font color=#14F278> **The purpose of Task 1 is to produce a Time Series of average yearly Apple stock prices for the years 2019-2021 via downsampling and present it in a pre-defined format. This exercise tests your knowledge on Time Series, as well as melting and sorting a dataframe. The task deliverable is to produce the dataframe below:** </font>

<center>
    <div>
        <img src="..\images\timeseries_003.png"/>
    </div>
</center>


__Steps: Create a python function `apple_downsample()` which does the following:__
- Scrapes historical monthly price data on Apple from __YahooFinancials__
- Constructs a DataFrame with the monthly price observations for period 2019 - 2021 - columns 'high', 'low' and 'close'
- Sets column 'formatted_date' to be your DataFrame index
- Downsamples DataFrame from monthly to annual level of granularity via averaging monthly observations
- Resets index to new DataFrame (to ensure that 'formatted_date' values are in a column)
- Melts the DataFrame, so that column names 'high', 'low', 'close' become entries of column __'price_type'__ and observations are stored in column __'avg_price'__
- Sorts melted DataFrame by 'formatted_date' and 'price_type' in ascending order
- Returns final sorted DataFrame

In [4]:
# Solution

def apple_downsample():
    from yahoofinancials import YahooFinancials
    data = YahooFinancials('AAPL').get_historical_price_data('2019-01-01', '2022-01-01', 'monthly')
    data = data['AAPL']['prices']
    df = pd.DataFrame(data)
    df = df[['formatted_date', 'high', 'low', 'close']]
    df['formatted_date'] = pd.to_datetime(df['formatted_date'], format = '%Y-%m-%d')
    df.set_index('formatted_date', inplace = True)
    df_upsample = df.resample('Y').mean()
    df_upsample.reset_index(inplace = True)
    df_melt = df_upsample.melt(id_vars = ['formatted_date'], var_name = 'price_type', value_name = 'avg_price')
    df_melt.sort_values(['formatted_date', 'price_type'], ascending = [True, True], inplace = True)
    return df_melt

In [5]:
# Call the function and display returned objects
apple_downsample()

  df_upsample = df.resample('Y').mean()


Unnamed: 0,formatted_date,price_type,avg_price
6,2019-12-31,close,53.304375
0,2019-12-31,high,55.11375
3,2019-12-31,low,48.476251
7,2020-12-31,close,97.094583
1,2020-12-31,high,104.093332
4,2020-12-31,low,86.300833
8,2021-12-31,close,141.688337
2,2021-12-31,high,148.499168
5,2021-12-31,low,132.827499


---
### <font color=#14F278> Task 2:</font>

<font color=#14F278> **The purpose of Task 2 is to analyse the deivation which linear interpolation produces on a Time Series. We know that linear interpolation effectively 'smoothes out' a Time Series - in this exercise you will be asked to quantify this 'smoothing effect' in the context of Apple daily stock prices for Q4 2021. Task 2 aims to compare the real daily stock prices to linearly interpolated ones and assess the percentage deviation between the two. The deliverable is to produce the dataframe below (NB: image shows a part of the dataframe):** </font>


<center>
    <div>
        <img src="..\images\timeseries_004.png"/>
    </div>
</center>



__Steps: Create a python function `apple_upsample()` which does the following:__
- Scrapes historical monthly price data on Apple from __YahooFinancials__
- Creates two dataframes - __monthly_prices__ and __daily_prices__:
    - __monthly_price__ containing the monthly price observations for Q4 2021 - column 'close'
    - __daily_price__ containing the daily price observations for Q4 2021 - column 'close'
- Sets column 'formatted_date' to be the index to __monthly_prices__ dataframe
- Upsamples __monthly_prices__ to daily level of granularity
- Fills in missing values via Linear Interpolation
- Resets index to __monthly_prices__ (to ensure that 'formatted_date' values are in a column)
- Performs a left merge of __monthly_prices__ onto __daily_prices__
- Renames column __close_x__ to __real_close__ and __close_y__ to __inter_close__
- Creates a new column __'perc_dev'__, calculating the deviation of the interpolated price value from the real price value on a daily basis 
     - __perc dev__ = ((__inter_close__ - __real_close__)/__real_close__)*100)
- Returns the final DataFrame

In [None]:
# Solution 

def apple_upsample():
    from yahoofinancials import YahooFinancials
    data1 = YahooFinancials('AAPL').get_historical_price_data('2021-10-01', '2022-01-02', 'monthly')
    data2 = YahooFinancials('AAPL').get_historical_price_data('2021-10-01', '2022-01-02', 'daily')
    data1 = data1['AAPL']['prices']
    data2 = data2['AAPL']['prices']
    monthly_prices = pd.DataFrame(data1)[['formatted_date', 'close']]
    daily_prices = pd.DataFrame(data2)[['formatted_date', 'close']]
    monthly_prices['formatted_date'] = pd.to_datetime(monthly_prices['formatted_date'], format = '%Y-%m-%d')
    daily_prices['formatted_date'] = pd.to_datetime(daily_prices['formatted_date'], format = '%Y-%m-%d')
    monthly_prices.set_index('formatted_date', inplace = True)
    monthly_prices = monthly_prices.resample('D').mean()
    monthly_prices.interpolate(inplace = True)
    monthly_prices.reset_index(inplace = True)
    combined_df = pd.merge(daily_prices, monthly_prices, on = 'formatted_date', how = 'left')
    combined_df.rename(columns = {'close_x':'real_close', 'close_y':'inter_close'}, inplace = True)
    combined_df['perc_dev'] = combined_df.apply(lambda row: ((row['inter_close'] - row['real_close'])/row['real_close'])*100, axis = 1)
    return combined_df

In [None]:
# Call the function and display returned objects
apple_upsample()

---
### <font color=#14F278> Task 3:</font>

<font color=#14F278> **The purpose of Task 3 is to detect the dates in December 2021 when Apple stock prices increased by more than 2% compared to the previous day. This exercise tests your knowledge on performing rolling statistics on a Time Series, as well as filtering a dataframe via a Boolean Mask. The task deliverable is to produce the dataframe below:**</font>


<center>
    <div>
        <img src="..\images\timeseries_005.png"/>
    </div>
</center>


__Steps: Create a python function `apple_rolling_stats()` which does the following:__
- Scrapes historical monthly price data on Apple from __YahooFinancials__
- Constructs a DataFrame with the daily price observations for December 2021 - column 'close'
- Sets column 'formatted_date' to be your DataFrame index
- Calculates the __percentage change__ of the Time Series across observations - use `.pct_change()` and store in column __perc_change__
- Takes the __first difference__ of the Time Series - use `.diff(1)` and store in column __first_diff__
- Filters the resulting DataFrame via __Boolean Mask__ to contain only daily observations with __percentage change__ above 2% (0.02)
- Resets index to new DataFrame (to ensure that 'formatted_date' values are in a column)
- Returns the final DataFrame

In [None]:
# Solution

def apple_rolling_stats():
    from yahoofinancials import YahooFinancials
    data = YahooFinancials('AAPL').get_historical_price_data('2021-12-01', '2022-01-01', 'daily')
    data = data['AAPL']['prices']
    df = pd.DataFrame(data)
    df = df[['formatted_date', 'close']]
    df['formatted_date'] = pd.to_datetime(df['formatted_date'], format = '%Y-%m-%d')
    df.set_index('formatted_date', inplace = True)
    df['perc_change'] = df['close'].pct_change()
    df['first_diff'] = df['close'].diff(1)
    mask = df['perc_change'] >0.02
    new_df = df[mask]
    new_df.reset_index(inplace = True)
    return new_df

In [None]:
# Call the function and display returned objects
apple_rolling_stats()

__NB__: *Solutions to these exercises are distributed separately in the form of a stand-alone unit at a later point in time. This is to ensure that consultants have had the chance to attempt the exercises autonomously, leveraging the reading materials and concept check solutions.*