# Data Structures and Processing

## Week 10: Time Series

### Remarks:

1. Install the libraries listed below using your package manager.  An example of installing the library `numpy` is to issue a the following command on a terminal (in Ubuntu) `python3 -m pip install numpy`.

2. Make sure that you are following the conventions.  For examples, `import numpy as np`, which imports `numpy` packages and sets the abbreviation for it.

3. Do not import the packages without the short names, unless it is intentional.  Doing so might lead to a namespace conflict, or unintended uses of functions coming from two libraries as a part of different implementations.

4. We are assigning `None` to variables and use `pass` in the body of the functions, where we expect a solution from you.  Please replace these values and statements with your solution.

The exercises in this notebook are aligned with the material provided for the lecture.

### Load Libraries

In [1]:
import numpy as np
import pandas as pd
import datetime as dtime

## Task 1 - 1pt

Consider a file `IBM.csv` attached to this notebook, which contains stocks since 2000 of IBM.  Import the data from the file as a pandas `DataFrame` into a variable `ts1`.  Since there is no mentioning of index, therefore, the default integer incremented index is associated.  Notice that one of the column is named `"Date"`.  We would like to use this as an index for `df1`.

Your task is to use pandas builtin function to change the index of the above `DataFrame` named `df1`.

In [2]:
import pandas as pd

ts1 = pd.read_csv("IBM.csv")

df1 = ts1.set_index("Date")

df1.head()

<class 'FileNotFoundError'>: [Errno 44] No such file or directory: 'IBM.csv'

## Task 2 - 1pt

Reconsider the `DataFrame` named `df1` from Task 1 above.

Your task is to write down a function `average_until_months_end` that takes three arguments:

1. `df` a pandas `DataFrame` (for example, `df1`, as defined above)
2. `ts` a string containing a timestamp.
3. `col` as a column name.

and it returns the mean value along the column `col` with the range of time starting from the timestamp `ts` until the end of the month.


In [None]:
import pandas as pd
from pandas.tseries.offsets import MonthEnd

def average_until_months_end(df, ts, col):
    ts_dt = pd.to_datetime(ts)

    end_of_month = ts_dt + MonthEnd(0)

    mask = (df.index >= ts_dt) & (df.index <= end_of_month)

    return df.loc[mask, col].mean()

## Task 3 - 1 pt

Write down a function `change_in_month` that takes three arguments:

1. `df`, the `DataFrame`,
2. `"col"`, the column name for consideration,
3. `"date"`, date as a string of the form "YYYY-mm-dd" or "YYYY/mm/dd".

that returns the difference of the values at the beginning and at the end of a particular month.

In [None]:
import pandas as pd
from pandas.tseries.offsets import MonthEnd
def change_in_month(df, col, date):
  date_dt = pd.to_datetime(date)

  start_month = date_dt - MonthBegin(1) + MonthBegin(1)
  end_month = start_month + MonthEnd(0)

  start_value = df.loc[start_month, col]
  end_value = df.loc[end_month, col]

  return end_value - start_value

## Task 4 - 1 pt

The file `IBM.csv` has down-sampled data in it, where each record has a daily report of the `open`, `high`, `low` and `close` and other entries.

Your task is to down-sample the column `"Open"` to yearly periods reporting its `ohlc`.  Finally, plot the values for `open` and `close`.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

df1.index = pd.to_datetime(df1.index)

open_yearly_ohlc = df1["Open"].resample("Y").ohlc()

close_yearly = df1["Close"].resample("Y").last()

plt.figure(figsize=(10,5))
plt.plot(open_yearly_ohlc["open"], label="Open")
plt.plot(close_yearly, label="Close")
plt.title("Yearly Open and Close Prices")
plt.xlabel("Year")
plt.ylabel("Price")
plt.legend()
plt.grid(True)
plt.show()

## Task 5 - 1 pt

Reconsider the `IBM.csv` file containing the IBM stocks values/day.  Consider a window of 1000 days and plot mean values.

In [None]:
import matplotlib.pyplot as plt

rolling_mean = df1["Open"].rolling(window=1000).mean()

plt.figure(figsize=(10,5))
plt.plot(df1["Open"], label = "Actual Open", alpha = 0.5)
plt.plot(rolling_mean, label = "1000-day Rolling Mean", color = "red")
plt.title("IBM Stock - 1000 Day Rolling Mean")
plt.xlabel("Date")
plt.ylabel("Price")
plt.legend()
plt.grid(True)
plt.show()

## Task 6 - 2pt

Reconsider the `IBM.csv` file containing the IBM stocks.  Consider a smoothing factor `alpha=0.01`, and plot the `ewm` average for the column `"Open"`, and the actual values in that column.

In [None]:
ewm_mean = df1["Open"].ewm(alpha=0.01).mean()

plt.figure(figsize=(10,5))
plt.plot(df1["Open"], label = "Actual Open", alpha = 0.5)
plt.plot(ewm_mean, label = "EWMA (alpha=0.01)", color = "red")
plt.title("IBM Stock - EWMA (alpha=0.01)")
plt.xlabel("Date")
plt.ylabel("Price")
plt.legend()
plt.grid(True)
plt.show()

## Task 7 - 3pt

Using the stock data for IBM and Apple (AAPL), plot the stock price trends between 2007-12-11 and 2008-10-13.

1) Load the stock data from the provided IBM.csv and AAPL.csv files.

2) Filter the data to only include entries between 2007-12-11 and 2008-10-13.

3) Plot the closing prices of both IBM and AAPL over time on the same graph.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

ibm = pd.read_csv("IBM.csv", parse_dates=["Date"], index_col="Date")
aapl = pd.read_csv("AAPL.csv", parse_dates=["Date"], index_col="Date")

ibm = ibm.sort_index()
aapl = aapl.sort_index()

start_date = "2007-12-11"
end_date = "2008-10-13"

ibm_filtered = ibm.loc[start_date:end_date]
aapl_filtered = aapl.loc[start_date:end_date]

plt.figure(figsize=(10,5))
plt.plot(ibm_filtered["Close"], label="IBM Close")
plt.plot(aapl_filtered["Close"], label="AAPL Close")
plt.title("Stock Prices: IBM vs AAPL (2007-12-11 to 2008-10-13)")
plt.xlabel("Date")
plt.ylabel("Closing Price")
plt.legend()
plt.grid(True)
plt.show()
