In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("gla09.ipynb")

<img src="./ccsf.png" alt="CCSF Logo" width=200px style="margin:0px -5px">

# Guided Learning Activity 09: Parameter Estimation

This Guided Learning Activity is designed for you to complete alongside a Data Ambassador from the course. You might find that it feels like a combination of the lectures and lab assignment. Whether you are participating live or watching the recording of the live meeting, let the Data Ambassador guide you through the following tasks. There will be moments for you to reflect and explore your own ideas as a way to solidify concepts and skills introduced by your instructor. Keep in mind that this is not a graded assignment for MATH 108 by default. If you have any concerns about participation, reach out to your instructor.

---

## Learning Objectives

1. Outline the process of estimating a parameter using a bootstrap confidence interval.
2. Explore various topics in finance and stock portfolio management.
3. Access stock market data from Yahoo Finance using `yfinance`.
4. Generate a bootstrap confidence interval for various parameters.
5. Interpret confidence intervals.

---

## Configure the Notebook

Run the following code cell to set up the notebook.

In [None]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

---

## Parameters vs. Statistics

* A **parameter** is a measurement of the population
    * From a philosophical perspective, we consider the parameter to be a fixed value
        * Considering a population that changes is a more realistic scenario
* A **statistic** is some measurement on a sample
    * The value of the statistic varies with the samples
    * Random samples give random statistics
* Our goal is to have you estimate a parameter using sample data

---

## A Parameter Estimation Process

```mermaid
graph TD;
    A["What is the value of some\n parameter for a given population?"]
    A --> B{"Do we have all the related\n data on the population?"}
    B -->|"Yes"| C["Calculate the parameter."]
    B -->|"No"| D["Generate a random sample\n from the population."]
    D --> E["Calculate a related statistic on the sample."]
    E --> F["Store the statistic value."]
    F --> G["Resample (with replacement) from the sample."]
    G --> H["Calculate the same statistic on the resample."]
    H --> I["Store the statistic value."]
    I -->|"Repeat Many Times"| G
    I --> J["Generate a distribution from stored statistics."]
    J --> k["Use the center and spread of the\n distribution to estimate the parameter."]
```

---

### Task 01 📍🔎

<!-- BEGIN QUESTION -->

In the parameter estimation process outlined above, list some pros and cons of resampling from the original random sample from the population.

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---

## Stock Data

* A **portfolio manager** makes investment decisions for individuals or institutions  
* An **investment strategy** is a plan for how to choose those investments  
* For example, a portfolio manager might:  
    * Focus on large, stable companies  
    * Target fast-growing tech startups  
    * Invest in funds that track the entire stock market  
    * Prioritize sustainable or ethical companies  
    * etc.  
* A portfolio manager can choose between strategies by estimating expected returns using historical data
    * **Returns** are the profits from an investment 

---

### Yahoo Finance

<center><img src="Yahoo!_Finance_logo_2021.png" width=150px alt="Yahoo Finance logo"></center>

*  [Yahoo Finance](https://finance.yahoo.com/) is a financial news and data platform that provides financial news, data, and commentary
*  [`yfinance`](https://yfinance-python.org/) is a Python library used to access data from Yahoo Finance

---

### Task 02 📍

Run the command `!pip install yfinance` to install `yfinance` on this system, and import `yfinance` as `yf`.

In [None]:
...

In [None]:
grader.check("task_02")

---

### Downloading Stock Data

* A **stock** (or **share**) is a unit of ownership of a company
* Stocks are traded on stock exchanges, meaning that individuals or organizations come together to buy and sell stock
    *  The price of a stock varies over any given trading day
* The `yfinance` function `download` allows you to download stock information for an organization over a specified period of time
* The `download` function has several parameters. We will focus on a few for this activity:
    * `tickers` (`str`) allows you to specify the ticker for the company that you want to download information for
        * An organization is referenced by its ticker, a unique alphabetic name that identifies the stock
        * For example, the ticker for Apple is `'AAPL'`
    * `start` (`str`) allows you to specify the start date for the stock data and uses the format `'YYYY-MM-DD'`
    * `end` (`str`) allows you to specify the end date for the stock data and uses the format `'YYYY-MM-DD'`
    

---

### Task 03 📍

Using the `yf.download` function, download Apple's daily stock information for the last 5 years as `AAPL_df`, and use `Table.from_df` to create a `Table` called `AAPL` containing the downloaded information.

**Notes:** 
* We've utilized the parameters `progress`, `auto_adjust`, and `multi_level_index` to clean up the response you get from the `download` function.
* This function will download the data as a [Pandas](https://pandas.pydata.org/) `DataFrame`.
* We don't work directly with Pandas in MATH 108, so `Table.from_df` converts that `DataFrame` into a `Table`.

In [None]:
AAPL_df = yf.download(tickers=..., 
                      start=..., 
                      end=...,
                      progress=False, # Turn off the download progress bar
                      auto_adjust=True, # Adjust all Open, High, Low, and Close values automatically
                      multi_level_index=False) # To make the DataFrame to Table conversion easier
AAPL = Table.from_df(AAPL_df.reset_index()) # Needed to include the dates in the Table
AAPL

In [None]:
grader.check("task_03")

---

### Reading the Data

The data from `yfinance` in the `AAPL` table should show you:
* `'Date'`: The date of the stock trading information
* `'Close'`: The closing price of the stock for the given day
* `'High'`: The highest price of the stock for the given day
* `'Low'`: The lowest price of the stock for the given day
* `'Open'`: The opening price of the stock for the given day
* `'Volume'`: The total number of shares of the stock traded for the day

---

### Task 04 📍

Download Tesla's (`'TSLA'`) daily stock information in the same way you did for Apple to create the `Table` `TSLA`.

In [None]:
TSLA_df = yf.download(tickers=..., 
                      start=..., 
                      end=...,
                      progress=False, 
                      auto_adjust=True, 
                      multi_level_index=False)
TSLA = Table.from_df(TSLA_df.reset_index())
TSLA

In [None]:
grader.check("task_04")

---

### Task 05 📍

Create a table called `closings` which contains all the closing prices for the Apple and Tesla stocks for the last five years. The table should have the following 3 columns, in the presented order:
* `'Date'`: The date
* `'AAPL'`: Apple's closing stock price
* `'TSLA'`:  Tesla's closing stock price

In [None]:
...

In [None]:
grader.check("task_05")

---

## Choosing Between Investment Strategies

Suppose that you are a portfolio manager who is trying to decide between the following two strategies:

1. Invest your client's money based on Apple's stock.
2. Invest your client's money based on Tesla's stock.

You can only speculate what will happen in the future, but you can use historical data to inform your speculation.

---

### Task 06 📍🔎

<!-- BEGIN QUESTION -->

Run the following line plot to see the trends in closing prices for Apple and Tesla stock over the last five years.

In [None]:
closings.iplot('Date', title='Closing Prices Over Time')

Compare and contrast the trend of Apple and Tesla's closing stock prices over the provided time period. In particular, how does this graphic inform your choice of using an Apple- or Tesla-based strategy for your client's investments? 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---

### Daily Returns

* A simple **return** for a stock from one time period to the next is defined by subtracting the previous closing price from the current closing price, then dividing by the previous closing price. 
* Analyzing returns lets us focus on the relative changes in stock prices over time, rather than being distracted by the fact that Apple and Tesla have different price scales.

---

### Task 07 📍

Create a table called `returns` that contains the daily returns for Apple and Tesla stock based on the data in `closings` to show the trend in the returns for both stocks over the last five years. `returns` should have the following 3 columns, in the presented order:

* `'Date'`: The date
* `'AAPL'`: Apple's return for the given date
* '`TSLA'`: Tesla's return for the given date

Observe the way the returns trend for both stocks.

**Notes**:
* If you have an array called `arr`, then
    * `arr[1:]` would provide you with a new array that contains everything in `arr` except the first item.
    * `arr[:-1]` would provide you with a new array that contains everything in `arr` except the last item.
* In the template code, we've used `iplot` to provide a better default image for the trends. 

In [None]:
AAPL_closings = ...
TSLA_closings = ...
AAPL_returns = ...
TSLA_returns = ...
returns = ...
returns

returns.iplot('Date', title='Returns Over Time')

In [None]:
grader.check("task_07")

---

### Parameters

* Considering the graph and your own experience, you might have an opinion on which stock to base the investments on
* Data Science involves using data to make decisions
* Since the returns vary for both stocks over time, you need some measurement to assess
    * An **average return** could be a useful measure to capture the expected return
    * Another consideration is the **standard deviation (volatility)** of the returns to capture the spread of the return values (risk)
* Estimate the average return and volatility of each stock and use those estimates to make your strategy decision.

---

### Task 08 📍

Calculate the average return and volatility of the Apple and Tesla stocks over the last five years using the following assignments.

In [None]:
...

In [None]:
grader.check("task_08")

---

### Estimating the Parameters

* We want to understand an investment's true long-term behavior (the **population**)
* But we only have access to historical data (a **sample**)
* Simply using historical averages as fixed values **falsely** assumes the past fully represents the future
    * Past data gives us a **guess** of future return and volatility, not a guarantee
    * Estimating helps account for **uncertainty** in using a sample to predict the future
* The sample average return and sample volatility could have turned out differently if we used a different time period, used opening prices, etc
* To reduce the bias in these estimates, it's standard to create an interval estimate based on them
* Resampling returns from the past five years using sampling with replacement gives us a structured way to introduce variability in the statistics

---

### Task 09 📍

Create a function `bootstrap_average_returns(tbl, values_label, n)` that creates `n` bootstrap resamples from the original sample of return data in a provided table `tbl` where the return data is in the column `values_label` and returns an array of average returns based on the resamples.

In [None]:
def bootstrap_average_returns(tbl, values_label, n):
    ...

# Visualize the distribution of resampled Apple returns with n=500
Table().with_column('Average Returns', bootstrap_average_returns(returns, 'AAPL', 500)).hist()
plt.title('Distribution of Resampled Average Returns')
plt.show()

In [None]:
grader.check("task_09")

---

### Task 10 📍

Create a function `bootstrap_volatilities(tbl, values_label, n)` that creates `n` bootstrap resamples from the original sample of return data in a provided table `tbl` where the return data is in the column `values_label` and returns an array of average returns based on the resamples.

In [None]:
def bootstrap_volatilities(tbl, values_label, n):
    ...

# Visualize the distribution of resampled Apple returns with n=500
Table().with_column('Volatilities', bootstrap_volatilities(returns, 'AAPL', 500)).hist()
plt.title('Distribution of Resampled Volatilities')
plt.show()

In [None]:
grader.check("task_10")

---

### Confidence Interval

* A confidence interval is an interval estimate for a parameter based on sample data
* In MATH 108, you create a confidence interval by considering the range of the middle of the resampled statistics
* The range depends on the confidence level
    * The standard level is 95%
    * A 95% confidence interval will include the middle 95% of the resampled statistics
    * A larger (smaller) confidence level will lead to a wider (narrower) confidence interval.

---

### Task 11 📍

Create 95% confidence interval estimates for Apple and Tesla's average returns using `bootstrap_average_return` and `n=5000`.

**Notes**: 
* This code will take about a minute to run.
* Your intervals should include the average return values from [Task 08](#Task-08-📍)

In [None]:
AAPL_returns = ...
AAPL_ave_return_ci_lower = ...
AAPL_ave_return_ci_upper = ...

print(f"95% Bootstrap CI for AAPL average return:\
({AAPL_ave_return_ci_lower:.5f}, {AAPL_ave_return_ci_upper:.5f})")

TSLA_returns = ...
TSLA_ave_return_ci_lower = ...
TSLA_ave_return_ci_upper = ...

print(f"95% Bootstrap CI for TSLA average return:\
({TSLA_ave_return_ci_lower:.5f}, {TSLA_ave_return_ci_upper:.5f})")

In [None]:
grader.check("task_11")

---

### Task 12 📍

Create 95% confidence interval estimates for Apple and Tesla's volatility using `bootstrap_volatilities` and `n=5000`.

**Notes**: 
* This code will take about a minute to run.
* Your intervals should include the volatility values from [Task 08](#Task-08-📍)

In [None]:
AAPL_volatilites = ...
AAPL_volatility_ci_lower = = ...
AAPL_volatility_ci_upper = = ...

print(f"95% Bootstrap CI for AAPL volatility:\
({AAPL_volatility_ci_lower:.5f}, {AAPL_volatility_ci_upper:.5f})")

TSLA_volatilites = ...
TSLA_volatility_ci_lower = ...
TSLA_volatility_ci_upper = ...

print(f"95% Bootstrap CI for TSLA volatility:\
({TSLA_volatility_ci_lower:.5f}, {TSLA_volatility_ci_upper:.5f})")

In [None]:
grader.check("task_12")

---

### Interpreting Confidence

* Remember, we are assuming there is a true, fixed value for the average return and volatility of Apple and Tesla stocks.  
* These "true" parameters can't actually be calculated — not unless we had all the data from now until the end of time (or until Apple and Tesla cease to exist).  
* While this assumption may seem unhelpful, it provides a philosophical foundation that allows us to estimate and make informed decisions based on partial data.
* What Does "Confidence" Mean? A 95% confidence interval means:  
  > "If we repeated this analysis many times, 95% of the calculated intervals would contain the true value."  
  It's not a guarantee — it's a statement about the method, not a single outcome.
* Average Return CI  
  * This interval estimates the range for the true average daily return.  
  * Example: `(0.00004, 0.00206)` means we're 95% confident the true average return is between 0.0% and 0.2%.
* Volatility CI  
  * This interval estimates how much the return varies from day to day.  
  * Example: `(0.01711, 0.01932)` means daily returns typically vary by 1.7% to 1.9%.
* CI Width and Reliability  
  * A narrow CI implies a more precise estimate  
  * A wide CI implies greater uncertainty  

---

### Task 13 📍🔎

<!-- BEGIN QUESTION -->

Based on the estimates for average return and volatility, which strategy would you recommend?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---

## Reflection

In this notebook, you explored parameter estimation by distinguishing between parameters and statistics, then learning to estimate a parameter using bootstrap resampling. After reviewing a conceptual process for parameter estimation, you apply the method in a practical context by analyzing real-world stock market data retrieved from Yahoo Finance using the `yfinance` Python library. You generated bootstrap confidence intervals and interpreted the results, deepening your understanding of statistical inference.

---

## License

This content is licensed under the <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)</a>.

<img src="./by-nc-sa.png" width=100px>