<h1>Problem 1: Product Operations within Ndarray using Numpy</h1>
<li><b>argmin</b> A numpy function that returns the indices of the minimum values along an axis
<li><b>argmax</b> A numpy function that returns the indices of the maximum values along an axis
<li><b>sort</b> A numpy function that returns a sorted copy of an array
<li><b>prod</b> A numpy function that returns the product of all elements in an array
<li><b>cumprod</b> A numpy function on an ndarray of size n that returns an array of size n where each element i is the product of all elements from 0 to i

In [47]:
import numpy as np

def min_product(ndarray):
 
    min_index = np.argmin(ndarray)
 
    elements_before_min = ndarray[:min_index + 1]
    
    product = np.prod(elements_before_min)
    
    return product
 

In [48]:
import numpy as np
min_product(np.array([5,3,6,2,9])) 

180

In [49]:
def max_cumprod(arr):
  
    max_index = np.argmax(arr)

    elements_after_max = arr[max_index:]
    
    cumprod = np.cumprod(elements_after_max)
    
    return cumprod
 
 

In [50]:
import numpy as np
max_cumprod(np.array([10,20,1,9]))  

array([ 20,  20, 180])

In [52]:
def sort_cumprod(arr):
 
    even_elements = np.sort(arr[arr % 2 == 0])
 
    cumprod = np.cumprod(even_elements)
    
    return cumprod

In [53]:
import numpy as np
sort_cumprod(np.array([10,20,4,9,5]))  

array([  4,  40, 800])

<h1>Problem 2: Simple Moving Averages</h1>
Write a function that constructs an ndarray from data in a file and returns the simple moving averages on an ndarray after removing any NaN values. 

<p>You may consider making use of the following functions:
<li><b>np.genfromtxt:</b> https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.genfromtxt.html
<li><b>np.isnan:</b> https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.isnan.html
<li><b>np.array:</b> https://numpy.org/doc/stable/reference/generated/numpy.array.html

<p>Test your function out using the attached apple.csv file. Your function can assume that the file structure is (date,price).

<p>Note that simple moving average is defined as (p(t) + p(t-1) + ... + p(t-n+1))/n given that window size is n.

<b>Example:</b><br>
If the data array is:<br>
np.array([1,2,np.NaN,4,8,np.NAN])
<br>
The output array should be:<br>
array([2.33333333, 4.66666667])

In [56]:
def get_sma(file_name, window_size):
 
    data = np.genfromtxt(file_name, delimiter=',', skip_header=1, usecols=1)
    
 
    data = data[~np.isnan(data)]
 
    sma = np.convolve(data, np.ones(window_size)/window_size, mode='valid')
    
    return sma

In [57]:
# Test
get_sma('apple.csv', window_size=10)

array([  0.4259963,   0.4366279,   0.448176 , ..., 157.8919998,
       156.7970001, 156.0249999])

<h1>Problem 3: Moving Average Crossover Strategies - Pandas Grouping</h1>
In this question, we will conduct a simple trading analysis based on stocks' daily returns and double moving average crossover strategy.<br>

Write a program that reads timeseries pricing data from a file into a pandas dataframe and then groups the data as follows:
<li>Adds three columns, daily returns and simple moving averages (window size = 50 and 200) to the file
<li>Drop rows with Null values
<li>Groups the data into four categories:
<ul>
<li>"Buy+" if 50-day SMA is above or equal to 200-day SMA with daily return is positive or zero
<li>"Buy-" if 50-day SMA is above or equal to 200-day SMA but daily return is negative
<li>"Sell-" if 50-dat SMA is under 200-day SMA but daily return is positive or zero
<li>"Sell+" if 50-day SMA is under 200-day SMA with daily return is negative
</ul>
<li>Report the size and the mean of daily return of each group in dataframe
<li>Note the scale of the values returned by the pct_change function (You need to multiply the result by 100)

In [58]:
datafile = "apple.csv"
import pandas as pd
import numpy as np


In [59]:
df = pd.read_csv(datafile).dropna()

In [60]:
def GroupColFunc(filename):
 
    df = pd.read_csv(filename)
    
     
    df['Daily_Return'] = df['Adj Close'].pct_change() * 100
    df['SMA_50'] = df['Adj Close'].rolling(window=50).mean()
    df['SMA_200'] = df['Adj Close'].rolling(window=200).mean()
    
  
    df.dropna(inplace=True)
 
    conditions = [
        (df['SMA_50'] >= df['SMA_200']) & (df['Daily_Return'] >= 0),
        (df['SMA_50'] >= df['SMA_200']) & (df['Daily_Return'] < 0),
        (df['SMA_50'] < df['SMA_200']) & (df['Daily_Return'] >= 0),
        (df['SMA_50'] < df['SMA_200']) & (df['Daily_Return'] < 0)
    ]
    choices = ['Buy+', 'Buy-', 'Sell-', 'Sell+']
    
 
    df['Group'] = np.select(conditions, choices)
    
 
    grouped_data = df.groupby('Group')
 
    group_sizes = grouped_data.size()
    group_mean_return = grouped_data['Daily_Return'].mean()
    
    return group_sizes, group_mean_return
 
group_sizes, group_mean_return = GroupColFunc('apple.csv')

In [61]:
print("Group Sizes:")
print(group_sizes)
print("\nGroup Mean Daily Return:")
print(group_mean_return)

Group Sizes:
Group
Buy+     2952
Buy-     2645
Sell+    1531
Sell-    1785
dtype: int64

Group Mean Daily Return:
Group
Buy+     1.885759
Buy-    -1.876487
Sell+   -2.225904
Sell-    2.154447
Name: Daily_Return, dtype: float64


Write a function that can choose season and report the size and the mean of each of the groups in dataframe
<li>change the data type of 'Date' to datetime
<li>write a function get_season() to group months into seasons
<li>apply get_season() to create a new 'Season' column
<li>write a function that have a dataframe and season name as input
<li>call function that you create in the first section to group the data
<li>report the size and the mean of daily return of each group of that season

In [62]:
def GroupColFunc(filename):
 
    df = filename
    
    
    df['Daily_Return'] = df['Adj Close'].pct_change() * 100
    df['SMA_50'] = df['Adj Close'].rolling(window=50).mean()
    df['SMA_200'] = df['Adj Close'].rolling(window=200).mean()
    
   
    df.dropna(inplace=True)
 
    conditions = [
        (df['SMA_50'] >= df['SMA_200']) & (df['Daily_Return'] >= 0),
        (df['SMA_50'] >= df['SMA_200']) & (df['Daily_Return'] < 0),
        (df['SMA_50'] < df['SMA_200']) & (df['Daily_Return'] >= 0),
        (df['SMA_50'] < df['SMA_200']) & (df['Daily_Return'] < 0)
    ]
    choices = ['Buy+', 'Buy-', 'Sell-', 'Sell+']
    
 
    df['Group'] = np.select(conditions, choices)
    
 
    grouped_data = df.groupby('Group')
    
 
    group_sizes = grouped_data.size()
    group_mean_return = grouped_data['Daily_Return'].mean()
    
    return group_sizes, group_mean_return

In [63]:
def convert_to_datetime(df):
    df['Date'] = pd.to_datetime(df['Date'])
    return df

 
def get_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Autumn'
 
def add_season_column(df):
    df['Month'] = df['Date'].dt.month
    df['Season'] = df['Month'].apply(get_season)
    df.drop(columns=['Month'], inplace=True)   
    return df


In [64]:
def report_season(df, season):
 
    season_df = df[df['Season'] == season]
    
    group_sizes, group_mean_return = GroupColFunc(season_df)
    
    return group_sizes, group_mean_return


In [65]:
df = pd.read_csv('apple.csv')
df = convert_to_datetime(df)
df = add_season_column(df)
spring_group_sizes, spring_group_mean_return = report_season(df, 'Spring')

 
report_season(df,'Spring')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Daily_Return'] = df['Adj Close'].pct_change() * 100
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['SMA_50'] = df['Adj Close'].rolling(window=50).mean()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['SMA_200'] = df['Adj Close'].rolling(window=200).mean()
A value is trying to be set on a 

(Group
 Buy+     789
 Buy-     696
 Sell+    330
 Sell-    342
 dtype: int64,
 Group
 Buy+     3.035015
 Buy-    -2.002003
 Sell+   -2.472196
 Sell-    2.636507
 Name: Daily_Return, dtype: float64)