# Analysis of Daily Stock Price Data

KATE expects your code to define variables with specific names that correspond to certain things we are interested in.

KATE will run your notebook from top to bottom and check the latest value of those variables, so make sure you don't overwrite them.

* Remember to uncomment the line assigning the variable to your answer and don't change the variable or function names.
* Use copies of the original or previous DataFrames to make sure you do not overwrite them by mistake.

You will find instructions below about how to define each variable.

Once you're happy with your code, upload your notebook to KATE to check your feedback.

In [1]:
import pandas as pd

First, we will load the dataset from `data/AAPL.csv` into a DataFrame.

In [2]:
df = pd.read_csv('data/AAPL.csv')
df.head()

This data, in its raw format, is the same as that which can be retrieved from a number of financial websites.

Before starting the exercise, let's add some additional data columns, calculated from the raw data. Don't worry if you aren't familiar with the methods used in the following cell.

In [3]:
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['Weekday'] = df['Date'].dt.day_name()
df['Change %'] = (df['Adj Close'].pct_change() * 100)

In [4]:
df.head()

Avoid modifying `df` itself in the subsequent questions.

## Dataset stats

#### 1. What's the mean of the values in the `Adj Close` column?

Store the answer in a variable called `mean_adj_close`

In [5]:
# Add your code below
# mean_adj_close = ...
mean_adj_close = df['Adj Close'].mean()
mean_adj_close

#### 2. What's the minimum value in the `Low` column?

Store the answer in a variable called `min_low`

In [6]:
# Add your code below
# min_low = ...
min_low = df['Low'].min()
min_low

#### 3. What's the maximum value in the `High` column?

Store the answer in a variable called `max_high`

In [7]:
# Add your code below
# max_high = ...
max_high = df['High'].max()
max_high


#### 4. What's the difference between `min_low` and `max_high`?  

Store the answer in a variable called `price_range`

In [8]:
# Add your code below
# price_range = ...
price_range = max_high - min_low
price_range



#### 5. How many rows are there in the DataFrame?

Store the answer in a variable called `entries`

In [9]:
# Add your code below
# entries = ...
entries = df.shape[0]
entries


#### 6. On how many days (i.e. number of rows) was `Change %` greater than zero?

Store the answer in a variable called `positive_days`

In [10]:
# Add your code below
# positive_days = ...
change = df['Change %']
positive_days = change[change > 0].count()
positive_days

#### 7. On how many days (i.e. number of rows) has `Adj Close` been greater than the value in the final row?

Store the answer in a variable called `days_higher`

*Hint: we can use list indexing with `.iloc` e.g. `.iloc[-1]` to get the last value in a Series, such as a specific column of a DataFrame*

In [11]:
# Add your code below
# days_higher = ...
days_higher = (df.iloc[-6] > df.iloc[-1]).sum()
days_higher

## Dataset sorting and filtering

#### 8. Create a new DataFrame called `df_2020` which is the same as `df` but contains only the rows where `Year == 2020`. 

Use `set_index('Date', inplace=True)` to set the `Date` column as the row index.

In [12]:
# Add your code below
# df_2020 = ...
df_2020 = df[df['Year'] == 2020]
df_2020.set_index('Date', inplace=True)
print(df_2020)


#### 9. Continuing with `df_2020`, calculate the `.mean()` of `Change %` for entries where `Weekday == Monday`.

Store the value in a variable called `mean_change_mon_2020`.

In [26]:
# Add your code below
mean_change_mon_2020 = df_2020[df_2020['Weekday'] == 'Monday']
mean_change_mon_2020 = mean_change_mon_2020['Change %'].mean()


When you have calculated `mean_change_mon_2020`, uncomment and run the cell below to view its value:

In [27]:
mean_change_mon_2020

#### 10. Calculate the sum of the `Volume` column in `df_2020` for entries where `Month == 3`.

Store the value in a variable called `total_volume_march_2020`.

In [28]:
# Add your code below
total_volume_march_2020 = df_2020[df_2020['Month'] == 3]
total_volume_march_2020 = total_volume_march_2020['Volume'].sum()



When you have calculated `total_volume_march_2020`, uncomment and run the cell below to view its value:

In [29]:
total_volume_march_2020

#### 11. Using `df_2020`, determine when `Adj Close` was the highest.

- look at the [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmax.html) for the `.idxmax()` method and use it for this task 
- this will only work if the row index has been set to the `Date` as instructed earlier in the assignment

Store the value in a variable called `year_high_timestamp`


In [30]:
# Add your code below
year_high_timestamp = df_2020['Adj Close'].idxmax()
year_high_timestamp


#### 12. Create a DataFrame called `df_top_10` which contains the 10 entries from `df` with the highest positive `Change %` values.
- consider all entries in `df` rather than `df_2020`
- remember to avoid modifying `df` or any other stored DataFrames 
- `.copy()` can be used to copy a DataFrame to a new variable

In [31]:
# Add your code below
# Create a copy of the original DataFrame
df_top_10 = df.copy()

# Sort the copy in descending order based on 'Change %'
df_top_10 = df_top_10.sort_values(by='Change %', ascending=False)

# Select the top 10 entries with the highest positive 'Change %' values
df_top_10 = df_top_10.head(10)

# Display the new DataFrame
print(df_top_10)


#### 13. How many entries in `df_top_10` were *not* on a Monday?
Store the value in a variable called `top_10_not_mon`

In [32]:
# Add your code below
# Convert the 'Date' column to a datetime object
df_top_10['Date'] = pd.to_datetime(df_top_10['Date'])

# Count how many entries in df_top_10 were not on a Monday
top_10_not_mon = len(df_top_10[df_top_10['Date'].dt.weekday != 0])

print("Entries in df_top_10 that were not on a Monday:", top_10_not_mon)


When you have calculate `top_10_not_mon`, uncomment and run the cell below to inspect it:

In [33]:
top_10_not_mon

## Dataset manipulation

#### 14. Create a new DataFrame called `df_var`, which the same as `df` but with an additional column `Variation %`, which is equal to:

((`High` - `Low`) / `Close`) * 100

- be sure to use `Close` rather than `Adj Close` in this question
- do not modify `df` but create a copy: `df_var = df.copy()`

In [34]:
# Add your code below
# Create a copy of the original DataFrame
df_var = df.copy()

# Calculate the 'Variation %' and add it as a new column
df_var['Variation %'] = ((df_var['High'] - df_var['Low']) / df_var['Close']) * 100

# Display the new DataFrame
print(df_var)

Once you have calculated `df_var`, you can uncomment and run the cell below to inspect it:

In [35]:
df_var.head()

#### 15. Create a new DataFrame called `df_var_value`, which the same as `df_var` but with an additional column `Traded Value`, equal to:
`Volume * Adj Close`

- do not modify `df_var` but create a copy: `df_var_value = df_var.copy()`

In [36]:
# Add your code below
# Create a copy of the df_var DataFrame
df_var_value = df_var.copy()

# Calculate the 'Traded Value' and add it as a new column
df_var_value['Traded Value'] = df_var_value['Volume'] * df_var_value['Adj Close']

# Display the new DataFrame
print(df_var_value)

Now uncomment and run the cell below to view `df_var_value`:

In [37]:
df_var_value.head()