# Module 5 Assignment 

A few things you should keep in mind when working on assignments:

1. Run the first code cell to import modules needed by this assignment before proceeding to problems.
2. Make sure you fill in any place that says `# YOUR CODE HERE`. Do not write your answer anywhere else other than where it says `# YOUR CODE HERE`. Anything you write elsewhere will be removed or overwritten by the autograder.
3. Each problem has an autograder cell below the answer cell. Run the autograder cell to check your answer. If there's anything wrong in your answer, the autograder cell will display error messages.
4. Before you submit your assignment, make sure everything runs as expected. Go to the menubar, select Kernel, and Restart & Run all. If the notebook runs through the last code cell without an error message, you've answered all problems correctly.
5. Make sure that you save your work (in the menubar, select File → Save and CheckPoint).

-----

# Run Me First!

In [1]:
import pandas as pd
import numpy as np

from nose.tools import assert_equal, assert_true, assert_false
from numpy.testing import assert_array_almost_equal, assert_array_equal


# Problem 1: Read in a dataset

For this problem you will read in a dataset using Pandas. In the cell below, the function *read_data* has parameter "file_path" which contains a path to a dataset.
- Use the *read_csv* function from Pandas to read in the dataset from the file path and return the resulting 
DataFrame.

In [2]:
def read_data(file_path):
    '''
    Reads in a dataset using pandas.
    
    Parameters
    ----------
    file_path : string containing path to a file
    
    Returns
    -------
    Pandas DataFrame with data read in from the file path
    '''
    # YOUR CODE HERE
    df = pd.read_csv(file_path)
    return df

In [3]:
df = read_data('data/dow_jones_index.data')
assert_equal(len(df), 750, msg="The dataset should have 750 rows. Your solution only has %s"%len(df))
assert_equal(set(df.columns.tolist()), set(['quarter', 'stock', 'date', 'open', 'high', 'low', 'close',
                                            'volume', 'percent_change_price', 'percent_change_volume_over_last_wk',
                                            'previous_weeks_volume', 'next_weeks_open', 'next_weeks_close',
                                            'percent_change_next_weeks_price', 'days_to_next_dividend',
                                            'percent_return_next_dividend']), 
             msg="Your column names do not match the solutions")

ans = df.head().values.tolist()
sol = [[1, 'AA', '1/7/2011', 15.82, 16.72, 15.78, 16.42, 239655616, 3.7926699999999998, np.nan, np.nan,
        16.71, 15.97, -4.42849, 26, 0.182704],
       [1, 'AA', '1/14/2011', 16.71, 16.71, 15.64, 15.97, 242963398, -4.42849, 1.380223028, 239655616.0,
        16.19, 15.79, -2.4706599999999996, 19, 0.187852],
       [1, 'AA', '1/21/2011', 16.19, 16.38, 15.6, 15.79, 138428495, -2.4706599999999996, -43.02495926,
        242963398.0, 15.87, 16.13, 1.63831, 12, 0.189994],
       [1, 'AA', '1/28/2011', 15.87, 16.63, 15.82, 16.13, 151379173, 1.63831, 9.355500109, 138428495.0,
        16.18, 17.14, 5.93325, 5, 0.18598900000000002],
       [1, 'AA', '2/4/2011', 16.18, 17.39, 16.18, 17.14, 154387761, 5.93325, 1.987451735, 151379173.0,
        17.33, 17.37, 0.230814, 97, 0.175029]]

assert_array_equal(ans, sol, err_msg="Your answer does not match the solution.")

print("2 random rows of the dow jones dataset:")
df.sample(2)

AssertionError: 
Arrays are not equal
Your answer does not match the solution.
Mismatched elements: 4 / 80 (5%)
 x: array([['1', 'AA', '1/7/2011', '15.82', '16.72', '15.78', '16.42',
        '239655616', '3.79267', 'nan', 'nan', '16.71', '15.97',
        '-4.42849', '26', '0.182704'],...
 y: array([['1', 'AA', '1/7/2011', '15.82', '16.72', '15.78', '16.42',
        '239655616', '3.7926699999999998', 'nan', 'nan', '16.71',
        '15.97', '-4.42849', '26', '0.182704'],...

# Problem 2: Selecting first n rows of a DataFrame

In the code cell below the function *get_head_rows* accepts 2 parameters *df* which is a DataFrame, and *n*, which is an integer.

For this problem:
- Return first *n* rows of _df_

In [4]:
def get_head_rows(df, n):
    '''    
    Parameters
    ----------
    df: Pandas DataFrame
    n: integer
    Returns
    -------
    returns first n rows of df
    '''
    # YOUR CODE HERE
    
    return df.head(n)

In [5]:
assert_equal(get_head_rows(df, 3).shape[0], 3, msg="You did not return first 3 rows")
assert_equal(get_head_rows(df, 4).shape[0], 4, msg="You did not return first 4 rows")


# Problem 3: Selecting stocks under certain price

In the code cell below the function *get_stocks_by_price* accepts two parameters: *df* which is a DataFrame, and *price_cut*, which is a float.

For this problem:
- Return all data with close price less than *price_cut*.

In [6]:
def get_stocks_by_price(df, price_cut):
    '''
    
    Parameters
    ----------
    df: Pandas DataFrame containing data from Problem 1's solution
    price_cut: float.
    Returns
    -------
    
    Return all data with close price less than price_cut.
    '''
    # YOUR CODE HERE
    
    descpercent = df["percent_change_price"]/100
    price_cut_change = df["open"] * abs(descpercent)
    price_cut = df["open"] - price_cut_change
    close_price = df["close"]
    return df[close_price < price_cut]

In [7]:
descpercent = df["percent_change_price"]/100

In [8]:
price_cut_change = df["open"] * abs(descpercent)
price_cut_change

0      0.600000
1      0.740001
2      0.400000
3      0.260000
4      0.960000
         ...   
745    2.410001
746    2.099997
747    1.149999
748    0.980000
749    1.869998
Length: 750, dtype: float64

In [9]:
price_cut = df["open"] - price_cut_change
price_cut

0      15.220000
1      15.969999
2      15.790000
3      15.610000
4      15.220000
         ...    
745    77.809999
746    81.180003
747    79.780001
748    79.020000
749    76.780002
Length: 750, dtype: float64

In [10]:
close_price = df["close"]
close_price

0      16.42
1      15.97
2      15.79
3      16.13
4      17.14
       ...  
745    82.63
746    81.18
747    79.78
748    79.02
749    76.78
Name: close, Length: 750, dtype: float64

In [11]:
df[close_price < price_cut]

Unnamed: 0,quarter,stock,date,open,high,low,close,volume,percent_change_price,percent_change_volume_over_last_wk,previous_weeks_volume,next_weeks_open,next_weeks_close,percent_change_next_weeks_price,days_to_next_dividend,percent_return_next_dividend
2,1,AA,1/21/2011,16.19,16.38,15.60,15.79,138428495,-2.470660,-43.024959,242963398.0,15.87,16.13,1.638310,12,0.189994
6,1,AA,2/18/2011,17.39,17.68,17.28,17.28,80023895,-0.632547,-30.226696,114691279.0,16.98,16.68,-1.766780,83,0.173611
7,1,AA,2/25/2011,16.98,17.15,15.96,16.68,132981863,-1.766780,66.177694,80023895.0,16.81,16.58,-1.368230,76,0.179856
8,1,AA,3/4/2011,16.81,16.94,16.13,16.58,109493077,-1.368230,-17.663150,132981863.0,16.58,16.03,-3.317250,69,0.180941
15,1,AXP,1/28/2011,46.05,46.27,43.42,43.86,51427274,-4.755700,32.460101,38824728.0,44.13,43.82,-0.702470,68,0.410397
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
735,2,WMT,6/17/2011,52.91,53.29,51.79,52.82,68996550,-0.170100,17.448141,58746396.0,52.70,52.41,-0.550285,54,0.700492
743,2,XOM,5/13/2011,83.01,83.76,79.42,80.87,99678100,-2.578000,-12.413910,113805856.0,80.22,81.57,1.682870,89,0.581180
746,2,XOM,6/3/2011,83.28,83.75,80.18,81.18,78616295,-2.521610,15.221032,68230855.0,80.93,79.78,-1.420980,68,0.578960
747,2,XOM,6/10/2011,80.93,81.87,79.72,79.78,92380844,-1.420980,17.508519,78616295.0,80.00,79.02,-1.225000,61,0.589120


In [12]:
(close_price < price_cut).value_counts()

False    561
True     189
dtype: int64

In [13]:
assert_equal(get_stocks_by_price(df, 12).shape[0], 7, msg="You did not return a correct Pandas DataFrame object")
assert_true(get_stocks_by_price(df, 12).close.max()<12, msg="You did not return a correct Pandas DataFrame object")


AssertionError: 189 != 7 : You did not return a correct Pandas DataFrame object

# Problem 4: Get highest and lowest close price for a stock

In the code cell below the function *get_high_low_close* accepts two parameters: *df* which is a DataFrame, and *symbol*, which is a string.

For this problem:
- Return highest and lowest close price of stock represented by *symbol*.

In [14]:
def get_high_low_close(df, symbol):
    '''
    Get highest and lowest close price of a stock.
    
    Parameters
    ----------
    df: Pandas DataFrame containing data from Problem 1's solution
    symbol: stock symbol
    
    Returns
    -------
    returns two values, highest and lowest close price of the stock.
    '''
    # YOUR CODE HERE
    
    dsym = df[df['stock'] == symbol]

    highest_close = dsym.close.max()

    lowest_close = dsym.close.min()

    return highest_close, lowest_close


In [15]:
assert_equal(get_high_low_close(df, 'AA'), (17.92, 14.72), msg="You did not return correct highest and lowest close of AA")
assert_equal(get_high_low_close(df, 'IBM'), (170.58, 147.93), msg="You did not return correct highest and lowest close of IBM")
