# CPSC 330 Lecture 20

Outline:

- Announcements
- Gradient boosted trees, `Catboost` (15 min)
- Model comparison (15 min)
- Break (5 min)
- Big data sets: `SGDClassifier` and `SGDRegressor` (10 min)
- Combining multiple tables (10 min)
- True/False questions (15 min)

Reminder to self: **turn on recording!**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.linear_model import Ridge, LogisticRegression, SGDClassifier, SGDRegressor
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.svm import SVC
from sklearn.dummy import DummyClassifier, DummyRegressor
import sklearn.datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_selection import RFE

In [3]:
from sklearn.metrics import plot_confusion_matrix

In [4]:
from pandas_profiling import ProfileReport

- These next 3 cells could just be replaced with `import load_process_data`.
- But this way I can make changes to that file and not need to restart my kernel each time.
- So it's convenient for development.

In [5]:
%load_ext autoreload

In [6]:
%autoreload 1

In [7]:
%aimport load_process_data

ModuleNotFoundError: No module named 'load_process_data'

In [None]:
from plot_classifier import plot_classifier

In [None]:
plt.rcParams['font.size'] = 16

# NOTE

- this year we already introduced these in lecture 9
- but we still shoud talk about how CatBoost handles categoricals? And some other details below.

In [None]:
from catboost import CatBoostClassifier, CatBoostRegressor, Pool

In [None]:
def show_scores_factory(X_train, y_train, X_test, y_test):
    def show_scores(model, **fit_kwargs):
        model.fit(X_train, y_train, **fit_kwargs);
        return model.score(X_test, y_test)
    return show_scores

## Announcements

Same as last class:

- hw7 has been posted, due Sunday evening.
  - I tried to make it shorter than previous assignments.
- Tutorials are still happening as scheduled, on Collaborate Ultra.
- Change to course grading scheme per Dean's directive; see [here](https://piazza.com/class/k1gx4b3djbv3ph?cid=319).

Today's class:

- A bunch of random things I wanted to cover at some point.
- Next week: communication and ethics (hopefully).

## Gradient boosted trees (15 min)

Recently a lot of the winning models have been one of the following:

|  Name   |  GitHub |  web/docs | Year | GitHub stars |
|---------|---------|---------|---------|--------------|
|  XGBoost | [link](https://github.com/dmlc/xgboost) | [link](https://xgboost.ai/) | 2016 | 19k
| LightGBM |  [link](https://github.com/microsoft/LightGBM) | [link](https://lightgbm.readthedocs.io/en/latest/) |  2017 | 11k |
| CatBoost | [link](https://github.com/catboost/catboost) | [link](https://catboost.ai/) | 2017 | 5k |

When I checked, all repos updated in the last 24 hrs (i.e., active development).

- All of these implement [gradient boosting](https://en.wikipedia.org/wiki/Gradient_boosting), which is beyond the scope of the course.
  - And not covered in 340, but it probably should be!
- These 3 packages are fairly similar. Today I'll focus on CatBoost.
- I believe CatBoost and LightGBM seems to be performing better than XGBoost these days, but they are similar.

- I avoided these earlier in the course because I thought they'd be hard to install.
- It seems a lot of issues were solved since I tried XGBoost 1-2 years ago - it was easy!
- I think next time I might put these a lot earlier and use CatBoost instead of sklearn random forests everywhere, because I think it's a better choice. 

#### Categorical variables

- Of these 3, CatBoost is special because of how it handles categorical variables - hence "Cat".
- Thinking back to decision trees, we talked about splits like `Humidity > 1.2` or `Longitude > 90`. 
- Because we OHE categorical variables, it will give splits like `Location_Sydney > 0.5`, where that number 0.5 could be anywhere between 0 and 1. 
- (First off, there is no reason sklearn's decision trees or random forests need to require OHE; they could have easily learned rules like `Location == 'Sydney'`, which is equivalent to `Location_Sydney > 0.5` after OHE. I believe they didn't do this to be consistent with other sklearn estimators. However, I wonder if this makes performance worse, in particular for random forests.)
- However, is OHE the best encoding? Do we want 100 columns if there are 100 categories? What about `drop='first'` and `handle_error='ignore'`. It seems like a lot of hassle.
- CatBoost uses a more sophisticated encoding from categorical to numeric; see [here](https://catboost.ai/docs/concepts/algorithm-main-stages_cat-to-numberic.html#algorithm-main-stages_cat-to-numberic).
  - From the documentation: "Before each split is selected in the tree (see Choosing the tree structure), categorical features are transformed to numerical. This is done using various statistics on combinations of categorical features and combinations of categorical and numerical features."
  - This makes me think the conversion is done separately _at each split_.
  - So we can't just do this transformation in advance, it is entangled deeply with the algorithm. 
  - And also this makes sense, since the popularity of each value will change as your split the data.
  - Importantly, if there are 100 possible categories, you may not end up with 100 columns.
  - (From my reading of the documentation it looks like you end up with some smaller number of columns, depending on the type of _targets_: one per class for classification, and for regression this is controlled by a hyperparameter $k$.)
  - (Interestingly, the number of columns does not seem to depend on the number of possible categories of the categorical variable!)


In [None]:
X_train_housing, X_valid_housing, X_test_housing, y_train_housing, y_valid_housing, y_test_housing = load_process_data.load_housing()

In [None]:
X_train_housing.head()

In [None]:
y_train_housing_log = np.log(y_train_housing)
y_valid_housing_log = np.log(y_valid_housing)
y_test_housing_log  = np.log(y_test_housing)

In [None]:
show_scores_housing = show_scores_factory(X_train_housing, y_train_housing_log, X_valid_housing, y_valid_housing_log)

In [None]:
show_scores_housing(Ridge())

In [None]:
show_scores_housing(RandomForestRegressor())

In [None]:
show_scores_housing(xgb.XGBRegressor())

In [None]:
show_scores_housing(lgb.LGBMRegressor())

In [None]:
show_scores_housing(CatBoostRegressor(), verbose=0)

And now we try with proper handling of categorical variables:

In [None]:
X_train_housing_cat, X_valid_housing_cat, X_test_housing_cat, y_train_housing, y_valid_housing, y_test_housing, categorical_features = load_process_data.load_housing(ohe=False)

In [None]:
X_train_housing_cat.head()

In [None]:
X_train_housing_cat[categorical_features].head()

In [None]:
print(", ".join(categorical_features))

In [None]:
cat = CatBoostRegressor(cat_features=categorical_features)
cat.fit(X_train_housing_cat, y_train_housing_log, verbose=100);

In [None]:
cat.score(X_valid_housing_cat, y_valid_housing_log)

In this case the code ran slower and it didn't seem to help, but I suspect it will in some cases!

#### Feature importances

- The syntax for getting feature importances is a bit clunky.
- You first need to create a `catboost.Pool` object storing the data:

In [None]:
train_data_pool = Pool(X_train_housing_cat, y_train_housing_log, cat_features=categorical_features)

And then you can get feature importances:

In [None]:
importances = cat.get_feature_importance(train_data_pool)

And we can display them as usual:

In [None]:
importances = pd.DataFrame(data=importances, index=X_train_housing_cat.columns, columns=["Importance"])
importances.sort_values(by="Importance", ascending=False)

In [None]:
print(", ".join(categorical_features))

- You can also get SHAP values, both through the SHAP package or built-in to CatBoost.
- With proper handling of categorical variables, we can see an importance for the entire categorical variable, not just one possible value of it, which is kind of nice.

## Model comparison (15 min)

Very rough rule of thumb, when working on a problem with **tabular data**, try the following, in order: 

1. `DummyRegressor` or `DummyClassifer`
2. `HuberRegressor` or `LogisticRegression`
3. `CatBoostRegressor` or `CatBoostClassifier`

We'll now do a "bake-off" between different models and different datasets in the course, following the outlier scavenger hunt from L19.

#### Census data (lecture 7)

In [None]:
X_train_adult, y_train_adult, X_test_adult, y_test_adult, _ = load_process_data.load_adult()
X_train_adult.shape

In [None]:
X_train_adult.head()

In [None]:
show_scores_adult = show_scores_factory(X_train_adult, y_train_adult, X_test_adult, y_test_adult)

In [None]:
show_scores_adult(DummyClassifier(strategy="prior"))

In [None]:
show_scores_adult(LogisticRegression(max_iter=1000))

In [None]:
show_scores_adult(xgb.XGBClassifier())

In [None]:
show_scores_adult(lgb.LGBMClassifier())

In [None]:
show_scores_adult(CatBoostClassifier(), verbose=100)

In [None]:
X_train_adult_cat, y_train_adult, X_test_adult_cat, y_test_adult, cat_features_adult = load_process_data.load_adult(ohe=False)

In [None]:
X_train_adult_cat.head()

In [None]:
cat = CatBoostClassifier(cat_features=cat_features_adult)
cat.fit(X_train_adult_cat, y_train_adult, verbose=100);

In [None]:
cat.score(X_test_adult_cat, y_test_adult)

# TODO

compare on AUC as well, accuracy might not be as meaningful

#### Movie review data (lecture 5)

In [None]:
X_train_imdb, y_train_imdb, X_test_imdb, y_test_imdb = load_process_data.load_movie()
X_train_imdb.shape

In [None]:
type(X_train_imdb)

In [None]:
show_scores_movie = show_scores_factory(X_train_imdb, y_train_imdb, X_test_imdb, y_test_imdb)

In [None]:
show_scores_movie(DummyClassifier(strategy="prior"))

In [None]:
show_scores_movie(LogisticRegression(max_iter=1000))

In [None]:
show_scores_movie(xgb.XGBClassifier())

LightGBM seems to kill my kernel in this case... too many features?? Or maybe it can't handle sparse matrices as input? That seems surprising though.

In [None]:
# show_scores_movie(lgb.LGBMClassifier())

In [None]:
show_scores_movie(CatBoostClassifier(), verbose=100)

There are no categorical features here (they are all word counts), so no need to do another experiment.

#### Rain in Australia data (lecture 16)

In [None]:
X_train_rain, y_train_rain, X_test_rain, y_test_rain, _ = load_process_data.load_rain()
X_train_rain.shape

In [None]:
show_scores_rain = show_scores_factory(X_train_rain, y_train_rain, X_test_rain, y_test_rain)

In [None]:
show_scores_rain(DummyClassifier(strategy="prior"))

In [None]:
show_scores_rain(LogisticRegression(max_iter=1000))

In [None]:
show_scores_rain(xgb.XGBClassifier(verbosity=1))

In [None]:
show_scores_rain(lgb.LGBMClassifier())

In [None]:
show_scores_rain(CatBoostClassifier(), verbose=100)

In [None]:
X_train_rain_cat, _, X_test_rain_cat, _, cat_features_rain = load_process_data.load_rain(ohe=False)
X_train_rain_cat.shape

In [None]:
X_train_rain_cat.head()

In [None]:
cat = CatBoostClassifier(iterations=500, cat_features=cat_features_rain)
cat.fit(X_train_rain_cat, y_train_rain, verbose=50);

In [None]:
cat.score(X_test_rain_cat, y_test_rain)

#### Discussion

- Overall CatBoost looks like a good bet. 
- We didn't do any hyperparameter tuning, this might change the results. Maybe they just have better defaults.
  - In fact, we did not discuss hyperparameters at all for these fancy models.
  - There are a ton of them!
  - For CatBoost, a main hyperparameter controlling the fundamental tradeoff is `iterations`. 
    - This is similar to `n_estimators` in the sklearn random forests: larger = more complex models.
  - Another main hyperparameter is `learning_rate` but its interpretation is a bit beyond our scope.
  - These models should all be compatible with `RandomizedSearchCV`, etc.
  - But CatBoost, at least, does not work out of the box for feature selection with `RFE`/`RFECV`
- But I think it's pretty good. A bit slow though.
  - Now that speed is an issue, we should note that increasing complexity via `iterations` also increases runtime.
  - This is not always true, e.g. `gamma` in SVM and (for the most part) `C` in `LogisticRegression`.
  - But in fact often we did get increased runtime with more complexity, e.g. more features from `CountVectorizer`. 
- I'm surprising that feeding in the raw categorical features doesn't seem to help in many cases. 
  - Isn't that the point of CatBoost?!
  - It does seem to help with the Rain data.
  - I will need to look into this more at some point.
    - Maybe the code needs to be run for longer in these cases?
    - Hyperparameter tuning should be done in each case, at least a bit.

## Break (5 min)

## Big data sets: `SGDClassifier` and `SGDRegressor` (10 min)


- Something we haven't discussed in this course is huge data sets.
- There could be a couple problems arising from huge data sets:

1. The code is too slow.
2. The dataset doesn't fit in memory - I can't even load it with `pd.read_csv`. 

- "SGD" (stochastic gradient descent) can help with both of these problems.
- But we'll focus on using it to solve problem (1).
- There is a fancy way to implement `fit` that can be a lot faster for big data sets.
  - You can think of it as quickly finding "approximately" the best coefficients when calling `fit`.
  - That is not quite true but it may be a useful way of thinking.
  - Much more on this in CPSC 340 and much, much more on this in CPSC 440/540.

- SGD can be used in many contexts.
- In sklearn, it's built in as follows:

In [8]:
from sklearn.linear_model import SGDClassifier, SGDRegressor

- `SGDRegressor` is basically equivalent to `Ridge`.
- `SGDRegressor(loss='huber')` is basically equivalent to `HuberRegressor`.
- `SGDClassifier(loss='log')` is basically equivalent to `LogisticRegression`, except the parameter is called `alpha` instead of `C` (like `Ridge`).
- With other settings they are equivalent to other models, but this is good enough.

- For regular sized datasets, use the original functions, as these ones can be a bit more finicky. 

Let's load the [Sentiment140 dataset](http://help.sentiment140.com/home), which contains tweets labeled with sentiment associated with a brand, product, or topic. (I don't think we've looked at this dataset before - using it here because it's large.) You can download the data from [here](https://www.kaggle.com/ferno2/training1600000processednoemoticoncsv).

In [10]:
tweets_df = pd.read_csv('data/training.1600000.processed.noemoticon.csv', 
                        encoding = "ISO-8859-1",
                        names=["label","id", "date", "no_query", "name", "text"])
tweets_df['label'] = tweets_df['label'].map({0: 'neg', 4: 'pos'})
tweets_df = tweets_df[tweets_df['label'].str.startswith(('pos','neg'))]

In [11]:
tweets_df_train, tweets_df_test = train_test_split(tweets_df)

In [12]:
tweets_df_train.head()

Unnamed: 0,label,id,date,no_query,name,text
794225,neg,2326914448,Thu Jun 25 07:40:39 PDT 2009,NO_QUERY,richardchisholm,"@Spacefrog29 alright then, im clean i swear! ..."
292094,neg,1995895449,Mon Jun 01 14:12:55 PDT 2009,NO_QUERY,jolizevette,@strawintogold I had class this morning
500684,neg,2186927863,Mon Jun 15 19:32:22 PDT 2009,NO_QUERY,_AlexaJordan,@Ryan_ADT Yea totally~ Just can't make it toni...
517786,neg,2191384005,Tue Jun 16 05:09:40 PDT 2009,NO_QUERY,Iam_Angela,Where is everybody this morning? I've got not ...
470474,neg,2176430433,Mon Jun 15 03:51:16 PDT 2009,NO_QUERY,mileythomson,leaving for college soon. i have a communicati...


In [13]:
tweets_df_train.shape

(1200000, 6)

Holy cow!

In [None]:
vec = CountVectorizer(stop_words='english')

X_train = vec.fit_transform(tweets_df_train['text']) 
y_train = tweets_df_train['label']

X_test = vec.transform(tweets_df_test['text']) 
y_test = tweets_df_test['label']

In [None]:
X_train.shape

In [None]:
type(X_train)

Here is the fraction of elements that are nonzero. Having a sparse matrix really helps!!

In [None]:
X_train.nnz/np.prod(X_train.shape)

Now let's train a classifier. I'll use `time` instead of `%timeit` because I want to keep the output, and it gets lost with `%timeit`.

In [None]:
lr = LogisticRegression()

In [None]:
%%time

lr.fit(X_train, y_train);

In [None]:
lr.score(X_train, y_train)

In [None]:
lr.score(X_test, y_test)

In [None]:
lr_sgd = SGDClassifier(loss="log")

In [None]:
%%time

lr_sgd.fit(X_train, y_train);

In [None]:
lr_sgd.n_iter_

In [None]:
lr_sgd.score(X_train, y_train)

In [None]:
lr_sgd.score(X_test, y_test)

- `SGDClassifier` was about 10x faster than `LogisticRegression`, but the accuracy is slightly lower.
- In fact, we can control the speed vs. _train_ accuracy tradeoff in both cases using the hyperparameters.
  - The main ones are `max_iter` (higher is slower) and/or `tol` (lower is slower)
  - (This is the same for both `LogisticRegression` and `SGDClassifier`)
- In general, `LogisticRegression` will get slightly higher _train_ accuracy (may or may not correspond to better validation/test)
- But in some cases your dataset is so big that `LogisticRegression` is not feasible, and then `SGDClassifier` can save the day.

Random comment: last time I tried this was with sklearn 0.21.3 and scipy 1.3.1 a few months ago, now with skearln 0.22.1 and scipy 1.4.1. I wonder if `LogisticRegression`, or the underlying optimizer in scipy, was improved at all in these recent upgrades, because it seems to be faster now than last time.

## Combining multiple tables (10 min)


- Take a look at the [Formula 1 race data set](https://www.kaggle.com/cjgdev/formula-1-race-data-19502017) from Kaggle. 
- The dataset contains **multiple CSV files**.
- Let's read in one of them:

In [None]:
racing_results_df = pd.read_csv("data/formula-1-race-data-19502017/results.csv", index_col=0)
racing_results_df

- Let's say we want to predict the `milliseconds` column, namely the total length of time it takes a driver to finish a race. 
- In that case, we should not have access to most of these other columns. 
- But we would have the `raceId` and `driverId`:

In [None]:
racing_results_df_subset = racing_results_df[['raceId', 'driverId', 'milliseconds']]
racing_results_df_subset

In [None]:
racing_results_df_subset.sort_values(by="driverId")

# TODO

for fun, try to predict milliseconds just based on raceId and driverId

- Now we need some features to predict the race time. 
- Enter the other tables:

In [None]:
racing_drivers_df = pd.read_csv("data/formula-1-race-data-19502017/drivers.csv", 
                                encoding='latin-1', index_col=0,
                               parse_dates=['dob'])
racing_drivers_df

- Can we use the driver's nationality and age as features?
- `pd.merge` can take care of this for us.

In [None]:
merged_df = pd.merge(racing_results_df_subset, racing_drivers_df, on="driverId")
merged_df

- The `on` keyword told it which column to use to match up the rows of the two dataframes.
- Note that the first 5 rows have the same `driverId`, so they pulled the same data from `racing_drivers_df`.
- Now we could keep only the columns we plan to encode:

In [None]:
merged_df_subset = merged_df[['raceId', 'driverId', 'milliseconds', 'dob', 'nationality']]
merged_df_subset

Now we can process the `dob` column to get age:

In [None]:
ages = (pd.Timestamp.now() - merged_df_subset["dob"]).apply(lambda x: x.total_seconds()/3600/24/365)
merged_df_age = merged_df_subset.assign(age=ages)
merged_df_age

- So far we got information for each driver.
- Likewise, we can get information about the races, and use those as well.

In [None]:
racing_races_df = pd.read_csv("data/formula-1-race-data-19502017/races.csv", encoding='latin-1', index_col=0)
racing_races_df

Etc.

- For those who have taken CPSC 304 or some have other database training, you'll recognize this type of multi-table situation, with foreign keys connecting the tables.
- `pd.merge` supports several types of joins, see the documentation:

In [None]:
?pd.merge

## True/False questions (15 min)

1. `CatBoost` is effective because, when increasing `iterations`, you lower the training error and the approximation error. 
2. `CatBoost` is likely to be popular in 10 years.
3. The primary motivation for using `SGDClassifier` or `SGDRegressor` is speed.
4. In multi-class logistic regression, if the coefficient for feature 10, class 2 is positive, that means increasing the value of feature 10 _decreases_ the predicted probability of class 1 (a different class).
5. If we are dealing with data from multiple sources, our strategy is to first combine them as a preprocessing step, and then build a model.


## Course review (15 min)

#### Learning objectives

Here are the course learning outcomes I came up with when proposing this new course:

1. Identify problems that may be addressed with machine learning.
2. Select the appropriate machine learning tool for a problem.
3. Transform data of various types into usable features.
4. Apply standard tools implementing supervised and unsupervised learning techniques.
5. Describe core differences between training, validation, and testing regimes.
6. Effectively communicate the results of a machine learning pipeline.
7. Be realistic about the limitations of individual approaches and machine learning as a whole. 
8. Create reproducible workflows and pipelines.

- How did we do? 
- Hopefully OK, except we skipped the last point (that will likely be its own new course).
- I would also add:

9. Identify and avoid scenarios in which training and testing data are accidentally mixed (the "Golden Rule").
10. Employ good habits for applying ML, such as starting an analysis with a baseline estimator.

because I think they are important enough to make it to the course-level list.

#### What did we cover?

I see the course roughly like this (not in order):

Part 1: Supervised learning on tabular data

- Overfitting, train/validation/test/deployment, cross-validation
- Feature preprocessing, pipelines, imputation, OHE, etc
- The Golden Rule, various ways to accidentally violate it
- Classification metrics: confusion matrix, precision/recall, ROC, AUC
- Regression metrics: MSE, MAPE
- Regression: transforming the targets
- Feature importances, feature selection

Part 2: Other data types (non-tabular)

- Time series
- Right-censored data / survival analysis
- Computer vision with deep learning
- Language data, text preprocessing
- Ratings data

Part 3: Other stuff

- Some Python (numpy, pandas, scipy sparse matrices)
- Hyperparameter optimization
- Ensembles
- Outlier detection
- Clustering
- A bunch of models: 
  - Dummy*
  - linear models (ridge, lasso, huber, logistic regression, SGD*)
  - tree-based models (random forest, gradient boosted trees)
  - KNN classifier/regressor
  - pre-trained deep learning models
- Communicating your results (including visualizations)
- ML skepticism
- Ethics for ML

#### What would I do differently?

Lots of things, of course! Here are some important ones:

- Introduce `Pipeline` earlier.
- Throughout the course, default to cross-validation instead of train/valid split.
- Find a dataset with multi-class classification for a section of the course.
- Spend more time on quantifying the uncertainty in one's results (scores, predictions, feature importances, etc).
- Add a lecture on deploying a trained model.
- Skip some of the content on text preprocessing.
- Skip some of the content on SVMs.
- Skip `Lasso`?

I'm sure you have other suggestions - feel free to drop me an email, submit my contact form anonymously at mikegelbart.com, or drop them in the course evaluations.

#### 330 vs. 340

- Just talked about it - see recording.

# TODO - add this

#### What was not covered

- Deployment
- Big data, distributed computing
- How ML methods work (CPSC 340)
- Probabilistic methods
- A lot of unsupervised learning, semi-supervised

## Unsolicited advice: working with others (20 min)

- I sometimes end my courses with "unsolicited life advice".
- I won't repeat myself here because some of you took CPSC 340 with me. But if you're interested [it's on YouTube](https://www.youtube.com/watch?v=_7zYxpzrKmQ&list=PLWmXHcz_53Q02ZLeAxigki1JZFfCO6M-b&index=34&t=0s).
- Instead of general life advice I'll try a different topic this time: unsolicited advice on _working with others_.
- These are just my opinions. They not be complete, or correct. Follow my advice at your own risk!

<br><br>

#### Don't lead with blame - investigate first

Leading with blame:

> Hey Malcolm, you were supposed to submit this form by the deadline - why didn't you?

Instead, try this:

> Hey Malcolm, from my end it looks like the form hasn't been submitted - can you shed some light on the situation?

- Blaming others is very embarrassing and damaging if the blame is not deserved.
- And also not great if the blame _is_ deserved.

<br><br>

#### The fundamental attribution error

- https://en.wikipedia.org/wiki/Fundamental_attribution_error
- If you miss a deadline: “I was too busy moving apartments.”
- If your teammate misses a deadline: “They are incompetent.”
- This is a known psychological phenomenon, so try to correct for this. Are you sure you know why they missed the deadline?

<br><br>

#### Don't procrastinate on disappointing others

- This can be highly damaging, and is a serious form of procrastination.
- If you need to break a commitment, communicate this right away. 
 - Can't get your work done on time.
 - Need to pull out of a project.
 - Need to move your organization to another city.
- Consider how much better this is for the person being disappointed.

<br><br>

#### Your opinion is not special

- If you disagree with someone, why do you think you're more likely to be right than the other person?
  - After all, there's a symmetry to the situation.
- I think most people are in denial about this.
  - That is, if you take an issue (e.g. "will lowering taxes improve the economy?", or religious beliefs), the credence of opposing sides are likely both above 50%. 
- A good question to ask yourself: is there data? E.g. if you are always on time and your co-worker is always late, then OK to trust your opinion on scheduling.
- My approach:
  - For critical decisions: try to "average" different opinions, including my own, based on trustworthiness.
  - For most decisions: do it my way because life is more fun that way.
  
<br><br>

## Conclusion

That's all, folks! Thank you for your active participation and supportive attitude.