<div class="alert alert-block alert-info">
<b>Attention:</b> 
    
Please read and follow the instructions carefully to avoid point deduction.
    
You are encouraged to use class materials and online resources to help you with this assignment. However, copying code directly from Generative AI (ChatGPT, Llama, etc.) or coding websites (Stack Overflow, GitHub, etc.) is strictly forbidden. We TAs have used these tools to generate answers for this assignment, so we WILL know if you directly copy or plagiarize your code. If we suspect any dishonest conduct, we reserve the right to call you in during office hours for a code review. If you fail to explain your code, we reserve the right to give you a 0 for the assignment. 

Feel free to email us or come to our office hours if you have any questions regarding this assignment.
</div>

<h1>Problem 1: Product Operations within Ndarray using Numpy</h1>

<li><b>argmin</b> A numpy function that returns the indices of the minimum values along an axis
<li><b>argmax</b> A numpy function that returns the indices of the maximum values along an axis
<li><b>sort</b> A numpy function that returns a sorted copy of an array
<li><b>prod</b> A numpy function that returns the product of all elements in an array
<li><b>cumprod</b> A numpy function on an ndarray of size n that returns an array of size n where each element i is the product of all elements from 0 to i

In [None]:
"""
Replace pass with the code that uses ndarray, np.argmin, np.prod to calculate 
the product of elements before the minimum element (inclusive)

You can assume that there is only one minimum element

The function should take an ndarray as input and return a numpy.array

"""

import numpy as np

def min_product(ndarray):
    min_index = np.argmin(ndarray)
    return np.prod(ndarray[:min_index + 1])


# arr = np.array([3, 2, 5, 1, 6, 4])
# print(min_product(arr))

In [6]:
import numpy as np
min_product(np.array([5,3,6,2,9])) # should print out 180

180

In [12]:
"""
Replace pass with the code that uses ndarray, np.argmax, np.cumprod to calculate 
the cumulative product of elements after the maximum element (inclusive)

You can assume that there is only one maximum element

The function should take an ndarray as input and return a numpy.array

"""

def max_cumprod(ndarray):
    max_index = np.argmax(ndarray)
    return np.cumprod(ndarray[max_index :])

In [13]:
import numpy as np
max_cumprod(np.array([10,20,1,9])) # should print out array([ 20,  20, 180])

array([ 20,  20, 180])

In [23]:
"""
Replace pass with the code that uses ndarray, np.sort, np.cumprod to calculate
the cumulative product of elements after sorting the original ndarray and selecting only the even elements

The function should take an ndarray as input and return a numpy.array

"""

def sort_cumprod(ndarray):
    # print(np.sort(ndarray))
    sorted_array = np.sort(ndarray)
    return np.cumprod([i for i in sorted_array if i % 2 == 0])


In [24]:
import numpy as np
sort_cumprod(np.array([10,20,4,9,5])) # should print out array([  4,  40, 800])

array([  4,  40, 800])

<h1>Problem 2: Simple Moving Averages</h1>

Write a function that constructs an ndarray from data in a file and returns the simple moving averages on an ndarray after removing any NaN values. If there are fewer values than the window size at the start, return the average of all available values up to that point.

<p>You may consider making use of the following functions:
<li><b>np.genfromtxt:</b> https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.genfromtxt.html
<li><b>np.isnan:</b> https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.isnan.html
<li><b>np.array:</b> https://numpy.org/doc/stable/reference/generated/numpy.array.html

<p>Test your function out using the attached apple.csv file. Your function can assume that the file structure is (date, price).

<p>Note that simple moving average is defined as (p(t) + p(t-1) + ... + p(t-n+1))/n given that window size is n.

<b>Example:</b><br>
If the window size is 3 and the data array is:<br>
np.array([1, 2, np.NaN, 4, 8, np.NaN])
<br>
The output array should be:<br>
array([1, 1.5, 2.33333333, 4.66666667])

In [None]:
import pandas as pd

def get_sma(file_name, window_size):
    # i tried multiple ways of implementing the subsections to deepen my understanding of what we're with
    data = np.genfromtxt(file_name, delimiter=',', skip_header=1, usecols=1)

    # with open(file_name, 'r') as f:
    #     content = f.read().strip().split('\n')[1:]
    #     content = [float(cont.split(',')[1]) for cont in content if 'null' not in cont]
    #     # print(content)
        
    # clean_data = data[~np.isnan(data)] # --> ~ : bitwise NOT, operator flips the True to False and vice versa

    # data, window_size = np.array([1, 2, np.NaN, 4, 8, np.NaN]), 3 # testing if it works for the example

    clean_data = pd.DataFrame(data).dropna()

    # sma = np.zeros(len(clean_data))

    # for i in range(len(clean_data)):
    #     start_idx = max(0, i - window_size + 1)  
    #     sma[i] = np.mean(clean_data[start_idx:i + 1])

    sma = np.array([np.mean(clean_data[max(0, i - window_size + 1):i + 1]) for i in range(len(clean_data))])

    return sma

In [89]:
# Test
get_sma('apple.csv', window_size=10)

array([  0.421597  ,   0.410599  ,   0.39715667, ..., 157.8919998 ,
       156.7970001 , 156.0249999 ])

<h1>Problem 3: Moving Average Crossover Strategies - Pandas Grouping</h1>

In this question, we will conduct a simple trading analysis based on stocks' daily returns and double moving average crossover strategy.<br>

Write a program that reads timeseries pricing data from a file into a pandas dataframe and then groups the data as follows:
<li>Adds three columns, daily returns and simple moving averages (window size = 50 and 200) to the file
<li>Drop rows with Null values
<li>Groups the data into four categories:
<ul>
<li>"Buy+" if 50-day SMA is above or equal to 200-day SMA with daily return is positive or zero
<li>"Buy-" if 50-day SMA is above or equal to 200-day SMA but daily return is negative
<li>"Sell-" if 50-dat SMA is under 200-day SMA but daily return is positive or zero
<li>"Sell+" if 50-day SMA is under 200-day SMA with daily return is negative
</ul>
<li>Report the size and the mean of daily return of each group in dataframe
<li>Note the scale of the values returned by the pct_change function (You need to multiply the result by 100)

In [1]:
datafile = "apple.csv"
import pandas as pd
import numpy as np
df = pd.read_csv(datafile).dropna()

In [2]:
# Adds three columns, daily returns and simple moving averages (window size = 50 and 200)

# Both implementation of each give the same result
# df["daily_return"] = [0] + [(df["Adj Close"].iloc[i] - df["Adj Close"].iloc[i-1]) / df["Adj Close"].iloc[i-1] * 100 
#                             for i in range(1, len(df))]
df["daily_return"] = df["Adj Close"].pct_change() * 100 # 100 for percentage


# df["SMA_50"] = [None] * len(df)
# df["SMA_200"] = [None] * len(df)

# for i in range(50, len(df)):
#     df["SMA_50"].iloc[i] = sum(df["Adj Close"].iloc[i-50:i]) / 50

# for i in range(200, len(df)):
#     df["SMA_200"].iloc[i] = sum(df["Adj Close"].iloc[i-200:i]) / 200


df["SMA_50"] = df["Adj Close"].rolling(window=50).mean()
df["SMA_200"] = df["Adj Close"].rolling(window=200).mean()

In [3]:
# Drop rows with null values
df = df.dropna()

In [13]:
def GroupColFunc(row):
    # print("row : ", row)
    if row["SMA_50"] >= row["SMA_200"]:
        return "Buy+" if row["daily_return"] >= 0 else "Buy-"
    else:
        return "Sell-" if row["daily_return"] >= 0 else "Sell+"

In [14]:
# Report the size and the mean of daily return of each group in dataframe

# GROUPBY ON ROWS
df["Group"] = df.apply(GroupColFunc, axis=1)

summary = df.groupby("Group")["daily_return"].agg(["size", "mean"])

print(summary)

       size      mean
Group                
Buy+   2952  1.885759
Buy-   2645 -1.876487
Sell+  1601 -2.246323
Sell-  1880  2.150327


In [15]:
# Report the size and the mean of daily return of each group in dataframe

# GROUPBY ON ROWS
summary = df.groupby(lambda x: GroupColFunc(df.loc[x]))["daily_return"].agg(["size", "mean"])

print(summary)

       size      mean
Buy+   2952  1.885759
Buy-   2645 -1.876487
Sell+  1601 -2.246323
Sell-  1880  2.150327


Write a function that can categorize months into seasons and analyze stock return data based on seasonal pattern
<li>Adds a column Season to the file
<li>Write a function that have a dataframe and season name as input
<li>After filtering by season, use groupby to also aggregate data by week number
<li>For each week in the season, calculate: 
        <ul>
            <li> Mean, Median, and Maximum Daily Return
            <li> Rolling 4-week average of daily returns 
    </ul>
<li>Drop rows with Null values and report results as a DataFrame

In [101]:
# Change the data type of 'Date' to datetime and extract week number
# Hint: you can use dt.isocalendar().week to extract week number


# Write a function get_season() to group months into seasons
def get_season(month):
    if month in [12, 1, 2]:
        return "Winter"
    elif month in [3, 4, 5]:
        return "Spring"
    elif month in [6, 7, 8]:
        return "Summer"
    else:
        return "Fall"

# Apply get_season() to create a new 'Season' column
df["Date"] = pd.to_datetime(df["Date"])

df["Month"] = df["Date"].dt.month
df["Week"] = df["Date"].dt.isocalendar().week

df["Season"] = df["Month"].apply(get_season)
print(df["Season"].value_counts())

Season
Summer    2329
Spring    2293
Fall      2259
Winter    2197
Name: count, dtype: int64


In [108]:
def report_season(df, season_name):
    season_df = df[df["Season"] == season_name]
    
    grouped = season_df.groupby("Week")["daily_return"].agg(["mean", "median", "max"])
    
    grouped["rolling_4_week_avg"] = grouped["mean"].rolling(window=4).mean()

    return grouped.dropna()

In [109]:
# Test
report_season(df,'Spring')

Unnamed: 0_level_0,mean,median,max,rolling_4_week_avg
Week,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
12,0.186476,-0.027918,9.705293,0.211808
13,0.133297,0.0,11.194065,0.131219
14,0.188592,0.0,9.8741,0.199954
15,-0.22694,-0.062178,7.302845,0.070356
16,0.451598,0.398367,12.856433,0.136637
17,0.271123,0.0,11.755775,0.171093
18,0.140196,0.216805,9.0909,0.158994
19,0.38097,0.251973,11.349445,0.310972
20,-0.063555,-0.111511,8.988821,0.182183
21,-0.055333,-0.075983,7.142972,0.100569
