**Table of contents**<a id='toc0_'></a>    
- [Meeting with Adam: Parameter Studies in RL](#toc1_)    
  - [Summary](#toc1_1_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Meeting with Adam: Parameter Studies in RL](#toc0_)

This is it, the final week of the capstone project. After completing this week's notebook you will have successfully implemented a complete reinforcement learning system, from problem specification to careful selection of meta-parameters. Today we'll discuss that careful selection of meta-parameters part. Recall all the key performance parameters that we identified in an earlier part of the capstone. We need to pick good values of each of these if we have any hope of building a successful learning agent. In practice we will likely have to use rules of thumb for many of these choices. However, in some cases we have the opportunity to study the impacts of the parameters to gain insight. For example, when learning in our simulator we can test our agent with many different configurations of the parameters. This can help us identify a good range for a particular parameter. This might also help us set the parameter when we deploy the agent on the moon. In research we might test our algorithms in a variety of simulated problems with many different parameter settings. This can provide insight into how our algorithms might behave in general.
Running such scientific studies is not just useful for scientists, but it's also useful in industry too. In both cases it is important to truly understand the methods you deploy.
Let's think about how we might better understand how our algorithm behaves with different parameters. We can pick a range of each parameter and test several values in that range. We can visualize the results with a parameter sensitivity curve.

- On the y-axis we have some performance measure. For example, if we ran the agent for 50 episodes this measure could be the sum of the returns over these 50 episodes. We call this the total return. We then average this across multiple runs (e.g. 30) to get an average total return. 
- On the x-axis we have the values of the parameter we are testing. We do a complete run for each value of the meta-parameter. This means we run the agent for the allocated number of steps, say 10,000 steps, for the desired number of runs, say 30 rounds.
- We compute the total return for each run and average those numbers over 30 runs. We plot these averages for each chosen meta-parameter to obtain the sensitivity curve.

<p align="center">
  <img width="700" height="400" src="imgs/c4m6-sensitivity-curve.png">
</p>

This curve provides insight into how the algorithm behaves for a range of its meta-parameters, as well as how difficult it might be to pick those parameters.
If the curve is very pointed, then it indicates there is a narrow range of good parameters. If you did not know this ahead of time, then it is unlikely you would find this good parameter setting. Even if you managed to pick a meta-parameter very near the best one, the performance could be much worse.
So even though in the best case the iron can perform well in practice, it could perform significantly worse. You might think, well, once I have found this good setting I can just use that meta-parameter and get the good performance. Unfortunately, for a new problem the best meta-parameter is likely different. Rather this analysis suggests you might have a hard time picking the meta-parameter. So we need to be careful in deployment. On the other hand, if the range of good parameters is broad, then it is more likely you will be successful in choosing a good one. Further, it might indicate that your algorithm is not too sensitive to its meta-parameter in general. This is even more likely to be true if you observe similar sensitivity to the meta-parameter across problems. We do have to pay attention to a couple factors to produce meaningful curves. First, we have to test a sufficient number of values for the meta-parameters. Otherwise our approximation to the true parameter sensitivity curve will be poor. To see why, imagine that the true curve with respect to alpha looks like this.
If we subsample only a few points to test, we may accidentally jump over the best value of those parameters. We also need to test a sufficiently wide range of the parameters. If we choose a range that the best value is at one end of the range, we may miss out on better values of the meta-parameter. Don't worry, we will not ask you to exhaustively test each combination of the meta-parameters. You will only get to sweep over one of the parameters for your expected SARSA agent. That way you will gain some experience doing a parameter study, but you won't have to wait hours for your program to run. We will suggest good values for the remaining parameters.
It is important to note that we do not use parameter sweeps to actually select parameters for real problems. Rather this is a strategy for understanding our algorithms in simplified settings. It is typically not feasible to systematically test the agent with many meta-parameters. How can we test landing the module on the moon over and over with bad meta-parameter settings that might cause repeated crashes? You would be fired in no time. It is so important to understand our algorithms and how they might behave this is especially true in deployment in the real world, or in this case, in outer space.

## <a id='toc1_1_'></a>[Summary](#toc0_)

Choosing the right hyperparameters (meta-parameters) is crucial for a successful Reinforcement Learning agent.

**Why Study Hyperparameters?**

- **Performance**: Good values lead to a successful agent.
- **Insight**: Understand how your algorithm behaves and its sensitivity to different settings.
- **Deployment**: Helps estimate good parameter ranges for real-world scenarios (e.g., deploying on the moon).

**How to Study Hyperparameters: Parameter Sensitivity Curve**

1. **Select a Parameter**: Choose one hyperparameter to test (e.g., learning rate, discount factor).
2. **Define a Range**: Pick several values for that parameter across a wide range.
3. **Run Experiments**: For each parameter value:
  - Run the agent for a set number of steps/episodes (e.g., 10,000 steps).
  - Repeat this process multiple times (e.g., 30 runs) to get average results.
4. **Measure Performance**: Calculate the total return (sum of returns over all episodes) for each run.
5. **Average & Plot**: Average the total returns across your repeated runs for each parameter value. Plot these averages (Y-axis) against the parameter values (X-axis).

**Interpreting the Curve**

- **Pointed Curve**: Indicates a narrow range of good parameters. This means:
  - It's hard to find the optimal setting.
  - Even slightly off-optimal settings can lead to much worse performance.
  - The algorithm is sensitive to this parameter.
- **Broad Curve**: Indicates a wide range of good parameters. This means:
  - It's easier to pick a good setting.
  - The algorithm is less sensitive to this parameter.
  - This is generally desirable for robustness.

**Important Considerations**

- **Sufficient Values**: Test enough points to accurately represent the curve.
- **Sufficient Range**: Cover a wide enough range to find the optimal point.
- **Purpose**: Parameter sweeps are for understanding algorithms in simplified settings, not for directly selecting parameters for complex, real-world deployment (where crashes are costly!).
- **Real-World Deployment**: Requires deep understanding of the algorithm's behavior, as systematic testing with risky parameters isn't feasible.