# Analysis of Daily Stock Price Data

KATE expects your code to define variables with specific names that correspond to certain things we are interested in.

KATE will run your notebook from top to bottom and check the latest value of those variables, so make sure you don't overwrite them.

* Remember to uncomment the line assigning the variable to your answer and don't change the variable or function names.
* Use copies of the original or previous DataFrames to make sure you do not overwrite them by mistake.

You will find instructions below about how to define each variable.

Once you're happy with your code, upload your notebook to KATE to check your feedback.

In [1]:
import pandas as pd

First, we will load the dataset from `data/AAPL.csv` into a DataFrame.

In [3]:
df = pd.read_csv('AAPL.csv')
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2015-06-30,125.57,126.120003,124.860001,125.43,115.597382,44370700
1,2015-07-01,126.900002,126.940002,125.989998,126.599998,116.675667,30238800
2,2015-07-02,126.43,126.690002,125.769997,126.440002,116.528198,27211000
3,2015-07-06,124.940002,126.230003,124.849998,126.0,116.122704,28060400
4,2015-07-07,125.889999,126.150002,123.769997,125.690002,115.837006,46946800


This data, in its raw format, is the same as that which can be retrieved from a number of financial websites.

Before starting the exercise, let's add some additional data columns, calculated from the raw data. Don't worry if you aren't familiar with the methods used in the following cell.

In [4]:
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['Weekday'] = df['Date'].dt.day_name()
df['Change %'] = (df['Adj Close'].pct_change() * 100)

In [5]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Year,Month,Day,Weekday,Change %
0,2015-06-30,125.57,126.120003,124.860001,125.43,115.597382,44370700,2015,6,30,Tuesday,
1,2015-07-01,126.900002,126.940002,125.989998,126.599998,116.675667,30238800,2015,7,1,Wednesday,0.932794
2,2015-07-02,126.43,126.690002,125.769997,126.440002,116.528198,27211000,2015,7,2,Thursday,-0.126392
3,2015-07-06,124.940002,126.230003,124.849998,126.0,116.122704,28060400,2015,7,6,Monday,-0.347979
4,2015-07-07,125.889999,126.150002,123.769997,125.690002,115.837006,46946800,2015,7,7,Tuesday,-0.246031


Avoid modifying `df` itself in the subsequent questions.

## Dataset stats

#### 1. What's the mean of the values in the `Adj Close` column?

Store the answer in a variable called `mean_adj_close`

In [6]:
# Add your code below
df1 = df.copy()
df1
mean_adj_close = df1['Adj Close'].mean()
mean_adj_close


167.04975667513898

#### 2. What's the minimum value in the `Low` column?

Store the answer in a variable called `min_low`

In [7]:
# Add your code below
min_low = df1['Low'].min()
min_low

89.470001

#### 3. What's the maximum value in the `High` column?

Store the answer in a variable called `max_high`

In [8]:
# Add your code below
max_high = df1['High'].max()
max_high

372.380005

#### 4. What's the difference between `min_low` and `max_high`?  

Store the answer in a variable called `price_range`

In [9]:
# Add your code below
price_range = max_high-min_low
price_range

282.91000399999996

#### 5. How many rows are there in the DataFrame?

Store the answer in a variable called `entries`

In [10]:
# Add your code below
entries = df1.shape[0]
entries


1259

#### 6. On how many days (i.e. number of rows) was `Change %` greater than zero?

Store the answer in a variable called `positive_days`

In [11]:
# Add your code below
df2 = df.copy()
positive = df2['Change %'] >0
df3 = df2[positive]
positive_days = df3.shape[0]
positive_days

671

#### 7. On how many days (i.e. number of rows) has `Adj Close` been greater than the value in the final row?

Store the answer in a variable called `days_higher`

*Hint: we can use list indexing with `.iloc` e.g. `.iloc[-1]` to get the last value in a Series, such as a specific column of a DataFrame*

In [12]:
# Add your code below
#df2.iloc[-1:,5:6]
df3 = df1['Adj Close']
df4 = df3.iloc[-1]
days = df1['Adj Close']>df4
df5 = df2[days]
days_higher = df5.shape[0]
days_higher

2

## Dataset sorting and filtering

#### 8. Create a new DataFrame called `df_2020` which is the same as `df` but contains only the rows where `Year == 2020`. 

Use `set_index('Date', inplace=True)` to set the `Date` column as the row index.

In [13]:
# Add your code below
df2=df.copy()
date=df2.set_index('Date', inplace=True)
df2
date_mask = df2['Year']==2020
df_2020 = df2[date_mask]
df_2020



Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Year,Month,Day,Weekday,Change %
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-01-02,296.239990,300.600006,295.190002,300.350006,298.829956,33870100,2020,1,2,Thursday,2.281644
2020-01-03,297.149994,300.579987,296.500000,297.429993,295.924713,36580700,2020,1,3,Friday,-0.972206
2020-01-06,293.790009,299.959991,292.750000,299.799988,298.282715,29596800,2020,1,6,Monday,0.796825
2020-01-07,299.839996,300.899994,297.480011,298.390015,296.879883,27218000,2020,1,7,Tuesday,-0.470303
2020-01-08,297.160004,304.440002,297.160004,303.190002,301.655548,33019800,2020,1,8,Wednesday,1.608619
...,...,...,...,...,...,...,...,...,...,...,...
2020-06-23,364.000000,372.380005,362.269989,366.529999,366.529999,53038900,2020,6,23,Tuesday,2.134479
2020-06-24,365.000000,368.790009,358.519989,360.059998,360.059998,48155800,2020,6,24,Wednesday,-1.765204
2020-06-25,360.700012,365.000000,357.570007,364.839996,364.839996,34380600,2020,6,25,Thursday,1.327556
2020-06-26,364.410004,365.320007,353.019989,353.630005,353.630005,51314200,2020,6,26,Friday,-3.072577


#### 9. Continuing with `df_2020`, calculate the `.mean()` of `Change %` for entries where `Weekday == Monday`.

Store the value in a variable called `mean_change_mon_2020`.

In [14]:
# Add your code below
#df_2020
weekday_mask = df_2020['Weekday']=='Monday'
monday = df_2020[weekday_mask]
monday
mean_change_mon_2020 = monday['Change %'].mean()


When you have calculated `mean_change_mon_2020`, uncomment and run the cell below to view its value:

In [15]:
mean_change_mon_2020

0.2918877852311579

#### 10. Calculate the sum of the `Volume` column in `df_2020` for entries where `Month == 3`.

Store the value in a variable called `total_volume_march_2020`.

In [16]:
# Add your code below
march_mask = df_2020['Month']==3
march = df_2020[march_mask]
march
total_volume_march_2020 = march['Volume'].sum()

When you have calculated `total_volume_march_2020`, uncomment and run the cell below to view its value:

In [17]:
total_volume_march_2020

1570018100

#### 11. Using `df_2020`, determine when `Adj Close` was the highest.

- look at the [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmax.html) for the `.idxmax()` method and use it for this task 
- this will only work if the row index has been set to the `Date` as instructed earlier in the assignment

Store the value in a variable called `year_high_timestamp`


In [18]:
# Add your code below
high=df_2020['Adj Close']
year_high_timestamp=high.idxmax()
year_high_timestamp

Timestamp('2020-06-23 00:00:00')

#### 12. Create a DataFrame called `df_top_10` which contains the 10 entries from `df` with the highest positive `Change %` values.
- consider all entries in `df` rather than `df_2020`
- remember to avoid modifying `df` or any other stored DataFrames 
- `.copy()` can be used to copy a DataFrame to a new variable

In [19]:
# Add your code below
df3=df.copy()
sort=df3.sort_values(by='Change %', ascending=False)
df_top_10 = sort.head(10)
df_top_10

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Year,Month,Day,Weekday,Change %
1184,2020-03-13,264.890015,279.920013,252.949997,277.970001,277.219574,92683000,2020,3,13,Friday,11.980825
1191,2020-03-24,236.360001,247.690002,234.300003,246.880005,246.213516,71882800,2020,3,24,Tuesday,10.032544
1175,2020-03-02,282.279999,301.440002,277.720001,298.809998,298.003296,85349300,2020,3,2,Monday,9.310065
1200,2020-04-06,250.899994,263.109985,249.380005,262.470001,261.761414,50455100,2020,4,6,Monday,8.723748
1181,2020-03-10,277.140015,286.440002,269.369995,285.339996,284.569672,71322500,2020,3,10,Tuesday,7.202155
879,2018-12-26,148.300003,157.229996,146.720001,157.169998,154.059814,58582500,2018,12,26,Wednesday,7.042139
902,2019-01-30,163.25,166.149994,160.229996,165.25,161.979935,61109800,2019,1,30,Wednesday,6.833477
271,2016-07-27,104.269997,104.349998,102.75,102.949997,96.822357,92344800,2016,7,27,Wednesday,6.49631
401,2017-02-01,127.029999,130.490005,127.010002,128.75,122.367752,111985000,2017,2,1,Wednesday,6.098075
778,2018-08-01,199.130005,201.759995,197.309998,201.5,196.137955,67935700,2018,8,1,Wednesday,5.891019


#### 13. How many entries in `df_top_10` were *not* on a Monday?
Store the value in a variable called `top_10_not_mon`

In [20]:
# Add your code below
day_mask=df_top_10['Weekday']!='Monday'
frame=df_top_10[day_mask]
#frame
top_10_not_mon=len(frame)



When you have calculate `top_10_not_mon`, uncomment and run the cell below to inspect it:

In [21]:
top_10_not_mon

8

## Dataset manipulation

#### 14. Create a new DataFrame called `df_var`, which the same as `df` but with an additional column `Variation %`, which is equal to:

((`High` - `Low`) / `Close`) * 100

- be sure to use `Close` rather than `Adj Close` in this question
- do not modify `df` but create a copy: `df_var = df.copy()`

In [22]:
# Add your code below
df_var=df.copy()
df_var['Variation %']=(df_var['High']-df_var['Low'])/df_var['Close'] * 100
df_var

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Year,Month,Day,Weekday,Change %,Variation %
0,2015-06-30,125.570000,126.120003,124.860001,125.430000,115.597382,44370700,2015,6,30,Tuesday,,1.004546
1,2015-07-01,126.900002,126.940002,125.989998,126.599998,116.675667,30238800,2015,7,1,Wednesday,0.932794,0.750398
2,2015-07-02,126.430000,126.690002,125.769997,126.440002,116.528198,27211000,2015,7,2,Thursday,-0.126392,0.727622
3,2015-07-06,124.940002,126.230003,124.849998,126.000000,116.122704,28060400,2015,7,6,Monday,-0.347979,1.095242
4,2015-07-07,125.889999,126.150002,123.769997,125.690002,115.837006,46946800,2015,7,7,Tuesday,-0.246031,1.893552
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1254,2020-06-23,364.000000,372.380005,362.269989,366.529999,366.529999,53038900,2020,6,23,Tuesday,2.134479,2.758305
1255,2020-06-24,365.000000,368.790009,358.519989,360.059998,360.059998,48155800,2020,6,24,Wednesday,-1.765204,2.852308
1256,2020-06-25,360.700012,365.000000,357.570007,364.839996,364.839996,34380600,2020,6,25,Thursday,1.327556,2.036507
1257,2020-06-26,364.410004,365.320007,353.019989,353.630005,353.630005,51314200,2020,6,26,Friday,-3.072577,3.478217


Once you have calculated `df_var`, you can uncomment and run the cell below to inspect it:

In [23]:
df_var.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Year,Month,Day,Weekday,Change %,Variation %
0,2015-06-30,125.57,126.120003,124.860001,125.43,115.597382,44370700,2015,6,30,Tuesday,,1.004546
1,2015-07-01,126.900002,126.940002,125.989998,126.599998,116.675667,30238800,2015,7,1,Wednesday,0.932794,0.750398
2,2015-07-02,126.43,126.690002,125.769997,126.440002,116.528198,27211000,2015,7,2,Thursday,-0.126392,0.727622
3,2015-07-06,124.940002,126.230003,124.849998,126.0,116.122704,28060400,2015,7,6,Monday,-0.347979,1.095242
4,2015-07-07,125.889999,126.150002,123.769997,125.690002,115.837006,46946800,2015,7,7,Tuesday,-0.246031,1.893552


#### 15. Create a new DataFrame called `df_var_value`, which the same as `df_var` but with an additional column `Traded Value`, equal to:
`Volume * Adj Close`

- do not modify `df_var` but create a copy: `df_var_value = df_var.copy()`

In [24]:
# Add your code below
df_var_value=df_var.copy()
df_var_value['Traded Value']=df_var_value['Volume']*df_var_value['Adj Close']
df_var_value

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Year,Month,Day,Weekday,Change %,Variation %,Traded Value
0,2015-06-30,125.570000,126.120003,124.860001,125.430000,115.597382,44370700,2015,6,30,Tuesday,,1.004546,5.129137e+09
1,2015-07-01,126.900002,126.940002,125.989998,126.599998,116.675667,30238800,2015,7,1,Wednesday,0.932794,0.750398,3.528132e+09
2,2015-07-02,126.430000,126.690002,125.769997,126.440002,116.528198,27211000,2015,7,2,Thursday,-0.126392,0.727622,3.170849e+09
3,2015-07-06,124.940002,126.230003,124.849998,126.000000,116.122704,28060400,2015,7,6,Monday,-0.347979,1.095242,3.258450e+09
4,2015-07-07,125.889999,126.150002,123.769997,125.690002,115.837006,46946800,2015,7,7,Tuesday,-0.246031,1.893552,5.438177e+09
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1254,2020-06-23,364.000000,372.380005,362.269989,366.529999,366.529999,53038900,2020,6,23,Tuesday,2.134479,2.758305,1.944035e+10
1255,2020-06-24,365.000000,368.790009,358.519989,360.059998,360.059998,48155800,2020,6,24,Wednesday,-1.765204,2.852308,1.733898e+10
1256,2020-06-25,360.700012,365.000000,357.570007,364.839996,364.839996,34380600,2020,6,25,Thursday,1.327556,2.036507,1.254342e+10
1257,2020-06-26,364.410004,365.320007,353.019989,353.630005,353.630005,51314200,2020,6,26,Friday,-3.072577,3.478217,1.814624e+10


Now uncomment and run the cell below to view `df_var_value`:

In [25]:
df_var_value.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Year,Month,Day,Weekday,Change %,Variation %,Traded Value
0,2015-06-30,125.57,126.120003,124.860001,125.43,115.597382,44370700,2015,6,30,Tuesday,,1.004546,5129137000.0
1,2015-07-01,126.900002,126.940002,125.989998,126.599998,116.675667,30238800,2015,7,1,Wednesday,0.932794,0.750398,3528132000.0
2,2015-07-02,126.43,126.690002,125.769997,126.440002,116.528198,27211000,2015,7,2,Thursday,-0.126392,0.727622,3170849000.0
3,2015-07-06,124.940002,126.230003,124.849998,126.0,116.122704,28060400,2015,7,6,Monday,-0.347979,1.095242,3258450000.0
4,2015-07-07,125.889999,126.150002,123.769997,125.690002,115.837006,46946800,2015,7,7,Tuesday,-0.246031,1.893552,5438177000.0
