# CPSC 330 Lecture 20

Outline:

- 👋
- **Turn on recording**
- Announcements (5 min)
- Activity: explaining `GridSearchCV` (15 min)
- Principles of good explanations (15 min)
- Break (5 min)
- ML and decision-making (5 min)
- Decision-making activity (15 min)

## Learning objectives

- When communicating about applied ML, tailor an explanation to the intended audience.
- Apply best practices of technical communication, such as bottom-up explanations and reader-centric writing.
- Given an ML problem, analyze the decision being made and the objectives.
- Avoid the pitfall of thinking about ML as coding in isolation; build the habit of relating your work to the surrounding context and stakeholders.

## Announcements (5 min)

- hw5 grades posted this morning, hw6 grades coming very soon.
- hw8 posted, due Monday 11:59pm.
  - If your dataset is too big, feel free to subset; see https://piazza.com/class/kb2e6nwu3uj23?cid=564
  - If you're having trouble finding datasets, feel free to start a Piazza thread; it's fine if you use the same dataset as another classmate/group
- I will drop your lowest two assignment grades: https://piazza.com/class/kb2e6nwu3uj23?cid=560
- The next 3 lectures will involve activities based on Google Docs
  - I understand from the initial survey in September that there are a (very small) number of you who are unable to access Google Docs
  - Sorry about that! You will still be able to see the results because I'll be sharing my screen, you just won't be able to post to the document.
- Almost there, hang in there!

## Attribution

The content of this lecture is adapted from [DSCI 542](https://github.com/UBC-MDS/DSCI_542_comm-arg), created by [David Laing](https://davidklaing.com/).

## Why should I care about effective communication?

- Most ML practitioners work in an organization with >1 people.
- There will very likely be stakeholders other than yourself.
- Those people need to understand what you're doing because:
  - their state of mind may change the way you do things (see below)
  - your state of mind may change the way they do things (interpreting your results)
- In my experience, ML suffers from some particular communication issues:
  - overstating one's results / unable to articulate the limitations
  - unable to explain the predictions
  - and the reason is: these things are actually very hard to explain!
    - Why did CatBoost make that prediction?
    - Can we trust test error?
    - What does it mean if `predict_proba` outputs 0.9?
    - Etc.

## Activity: explaining `GridSearchCV` (15 min)


Below are two possible explanations of `GridSearchCV` pitched to different audiences. Read them both and then follow the instructions at the end.

#### Explanation 1

Machine learning algorithms, like an airplane's cockpit, typically involve a bunch of knobs and switches that need to be set.

![](https://i.pinimg.com/236x/ea/43/f3/ea43f3c7f3a8c92d884ce012c77628fd--cockpit-gauges.jpg)

For example, check out the documentation of the popular random forest algorithm [here](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html). Here's a list of the function arguments, along with their default values (from the documentation):

> class sklearn.ensemble.RandomForestClassifier(n_estimators=100, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, class_weight=None, ccp_alpha=0.0, max_samples=None)

Holy cow, that's a lot of knobs and switches! As a machine learning practitioner, how am I supposed to choose `n_estimators`? Should I leave it at the default of 100? Or try 1000? What about `criterion` or `class_weight` for that matter? Should I trust the defaults?

Enter [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) to save the day. The general strategy here is the choose the settings that perform best on the specific task of interest. So I can't say `n_estimators=100` is better than `n_estimators=1000` without knowing what problem I'm working on. For a specific problem, you usually have a numerical score that measures performance. `GridSearchCV` is part of the popular [scikit-learn](https://scikit-learn.org/) Python machine learning library. It works by searching over various settings and tells you which one worked best on your problem. 

The "grid" in "grid search" comes from the fact that tries all possible combinations on a grid. For example, if you want it to consider setting `n_estimators` to 100, 150 or 200, and you want it to consider setting `criterion` to `'gini'` or `'entropy'`, then it will search over all 6 possible combinations in a grid of 3 possible values by 2 possible values: 

|                    | `criterion='gini'` | `criterion='entropy'` |
|----------------------|--------|---------|
| `n_estimators=100` |    1     |    2     |
| `n_estimators=150` |    3     |    4     |
| `n_estimators=200` |    5     |    6     |

Here is a code sample that uses `GridSearchCV` to select from the 6 options we just mentioned. The problem being solved is classifying images of handwritten digits into the 10 digit categories (0-9). I chose this because the dataset is conveniently built in to scikit-learn:

In [2]:
# imports
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets

# load a dataset
data = datasets.load_digits()
X = data['data']
y = data['target']

# set up the grid search
grid_search = GridSearchCV(RandomForestClassifier(random_state=123),
                           param_grid={
                                'n_estimators': [100, 150, 200],
                                'criterion': ['gini', 'entropy']
                           })

# run the grid search
grid_search.fit(X, y)
grid_search.best_params_

{'criterion': 'gini', 'n_estimators': 100}

As we can see from the output above, the grid search selected `criterion='gini', n_estimators=100`, which was one of our 6 options above (specifically Option 1).

By the way, these "knobs" we've been setting are called [_hyperparameters_](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning) and the process of setting these hyperparameters automatically is called [_hyperparameter optimization_](https://en.wikipedia.org/wiki/Hyperparameter_optimization) or _hyperparameter tuning_.

~400 words, not including code.

<br><br><br><br><br><br>

#### Explanation 2

https://medium.com/datadriveninvestor/an-introduction-to-grid-search-ff57adcc0998

~400 words, not including code.

<br><br><br><br><br><br>

#### Discussion questions:

- What do you like about each explanation?
- What do you dislike about each explanation?
- What do you think is the intended audience for each explanation?
- Which explanation do you think is more effective overall for someone on Day 1 of CPSC 330?
- Each explanation has an image. Which one is more effective? What are the pros/cons?
- Each explanation has some sample code. Which one is more effective? What are the pros/cons?

After you're done reading, take ~5 min to consider the discussion questions above. Paste your answer to **at least one** of the above questions in the Google doc [link to be posted in the Zoom chat] under the appropriate question heading.

## Principles of good explanations (15 min)

#### Concepts *then* labels, not the other way around

The first explanation start with an analogy for the concept (and the label is left until the very end):

> Machine learning algorithms, like an airplane's cockpit, typically involve a bunch of knobs and switches that need to be set.

In the second explanation, the first sentence is wasted on anyone who doesn't already know what "hyperparameter tuning" means:

> Grid search is the process of performing hyper parameter tuning in order to determine the optimal values for a given model. 

The effectiveness of these different statements depend on your audience. 

See [this video](https://twitter.com/ProfFeynman/status/899963856549625858?s=20): "I learned very early the difference between knowing the name of something and knowing something." -Richard Feynman.

#### Bottom-up explanations

The [Curse of Knowledge](https://en.wikipedia.org/wiki/Curse_of_knowledge) leads to *top-down* explanations:

![](img/top_down.png)

- When you know something well, you think about things in the context of all your knowledge. 
- Those lacking the context, or frame of mind, cannot easily understand. 

There is another way: *bottom-up* explanations:

![](img/bottom_up.png)

When you're brand new to a concept, you benefit from analogies, concrete examples and familiar patterns.


#### New ideas in small chunks

The first explanation has a hidden conceptual skeleton:

1. The concept of setting a bunch of values.
2. Random forest example.
3. The problem / pain point.
4. The solution.
5. How it works - high level.
6. How it works - written example.
7. How it works - code example.
8. The name of what we were discussing all this time.

#### Approach from all angles

When we're trying to draw mental boundaries around a concept, it's helpful to see examples on all sides of those boundaries. If we were writing a longer explanation, it might have been better to show more, e.g.

- Performance with and without hyperparameter tuning. 
- Other types of hyperparameter tuning (e.g. `RandomizedSearchCV`).

#### Reuse your running examples

Effective explanations often use the same example throughout the text and code. This helps readers follow the line of reasoning.

#### When experimenting, show the results asap

The first explanation shows the output of the code, whereas the second does not. This is easy to do and makes a big difference.

#### Interesting to you != useful to the reader (aka it's not about you)

Here is something I deleted from my explanation:

> Some hyperparameters, like `n_estimators` are numeric. Numeric hyperparameters are like the knobs in the cockpit: you can tune them continuously. `n_estimators` is numeric. Categorical hyperparameters are like the switches in the cockpit: they can take on (two or more) distinct values. `criterion` is categorical. 

It's a very elegant analogy! But is it helpful?

And furthermore, what is my hidden motivation for wanting to include it? Elegance, art, and the pursuit of higher beauty? Or _making myself look smart_? So maybe another name for this principle could be **It's not about you.**

## Break (5 min)

REMINDER TO RESUME RECORDING

## ML and decision-making (5 min)

- There is often a wide gap between what people care about and what ML can do.
- To understand what ML can do, let's think about what **decisions** will be made using ML. 


#### Decisions involve a few key pieces

- The **decision variable**: the variable that is manipulated through the decision.
  - E.g. how much should I sell my house for? (numeric)
  - E.g. should I sell my house? (categorical)
- The decision-maker's **objectives**: the variables that the decision-maker ultimately cares about, and wishes to manipulate indirectly through the decision variable.
  - E.g. my total profit, time to sale, etc.
- The **context**: the variables that mediate the relationship between the decision variable and the objectives.
  - E.g. the housing market, cost of marketing it, my timeline, etc.

#### How does this inform you as an ML practitioner?

Questions you have to answer:

- Who is the decision maker?
- What are their objectives?
- What are their alternatives?
- What is their context?
- What data do I need?

## Decision-making activity (15 min)

Consider the avocado price dataset from hw7. Let's say you work for Whole Foods, and they are wondering whether they should order more avocados this week or wait until next week.

Answer the following questions:

1. What are your decision variable(s) here?
2. Is the decision numeric or categorical? What are the alternatives? 
3. What are the objective(s)?

and then

4. What data do you need here?
5. What output might you show them from the model you trained in hw7?
6. How does the output connect to the decisions?
7. How would you present your results? What would you advise?

Take 5-10 min for this activity, and then we'll discuss afterwards. Paste your answer to **at least one** of the above questions in the Google doc [link to be posted in the Zoom chat] under the appropriate question heading.

## Summary

- Principles of effective communication
  - Concepts then labels, not the other way around.
  - Bottom-up explanations.
  - New ideas in small chunks.
  - Examples from all angles.
  - Reuse your running examples.
  - When experimenting, show the results asap.
  - It's not about you.
- Decision-making
  - Decision variables, objectives, and context.
  - How does ML fit in?
  
Next class we'll talk about communicating probabilities in your predictions, and we'll also talk about principles of effective visualizations. 

Note: if we have extra time, we can do some parts of Lecture 24.