# Into the Great Unknown...
So far, we've learned about a wealth of machine learning tools for answering all kinds of questions automatically. But all of the experiments so far have been "safe". The datasets and techniques were pre-selected and had predictable results. But now it is time to journey into the wild. The final project is described below, but first, we should talk about some of the non-technical things that can go wrong with bringing machine learning into the real world.

## 1. Dataset Overfitting
One of the biggest ways that machine learning can go horribly wrong is when the deployment environment is significantly different than the training environment. Some classic examples:
- Self-driving car data was predominantly collected during the day with favorable weather conditions. Night-time snow could lead to fatal accidents.
- Training images for classification were collected during different seasons, leading to spurious correlations with background features

## 2. Leaked Labels
Supervised training data can accidentally provide information that is not normally available at test-time. A common mistake is including future data in a sequence prediction/forecasting task. (e.g., Instead of predicting $t_n$ from $\{t_1, ..., t_{n-1}\}$, the algorithm has been trained to solve the trivial problem of $t_n$ from $\{t_1, ..., t_{n}\}$.)

## 3. Non-Stationary Data/Non-Stationary Problems
When forces outside the scope of the model alter the distribution or outputs of different input data, performance can be worse than expected. For example:
- A model is trained to predict ocean temperature based on depth, latitude, longitude, and day of the year, but the model doesn't account for global climate change over multiple years.
- Diseases occur in a diagnostic dataset with roughly equal probability (not representative of the natural distribution), causing anthrax to be diagnosed on the evidence of a runny nose.
- All diagnostic data are for patients with known diseases, so no patient is ever diagnosed as having nothing wrong with them.

## 4. Feedback Effects
Sometimes the trained models cause a shift in the input data they collect. For example, prediction of interesting phenomena (say, the presence of whales) causes deployment of additional resources (say, boats and drones), in turn affecting the presence of the phenomenon (say, driving the whales away). These effects are generally captured by reinforcement learning problems, but many are unfortunately very big, difficult to model, and not easily solved by existing techniques. This means the best tools for prevention are common sense and vigilance.

## 5. Wrong Objectives/Metrics
One of the subtlest problems with deployment of machine learning is measuring the wrong objective. This can show up in a lot of devious ways:
- Minimizing mean squared error doesn't model bimodal distributions well.
- Data reconstruction may not be caputring the task-relevant features.
- Optimizing to minimize cost may not account for safety or environmental impact.
- Minimizing Euclidean distance assumes features have equal weight and are translation-invariant.
- Perhaps not all cases/training points should have equal weight.
- Perhaps not all forms of error should have equal weight.
- Reinforcement Learning can game the system

[Credit to OpenAI for the GIF](https://openai.com/blog/faulty-reward-functions/)

![RL Fail Video](./rl_fail.gif "Poorly Specified Rewards lead to Bad Behavior")


# Assignment:
1. Pick a Dataset/Problem.
2. Ask a question that can be answered using Machine Learning (prediction, interpolation, interpretation, decision, etc.).
3. Identify the structure of the inputs and outputs, and choose an evaluation metric.
4. Apply a baseline (e.g. random guessing, or always outputting the same value) and measure performance.
5. Apply one of the techniques from the course, or a combination.
6. Visualize the model performance, model output, or model application.
7. Summarize (and present) your findings (5 minute presentation).

Summary: Find Data, Pick Model, Do Science

# Stretch Goals:
- Test Multiple Techniques/Models and Compare Performance.
- Explore Failure Cases of a Trained Model.
- Look for Cutting-Edge techniques for related problems, or test some modifications to improve performance.
- Export Outputs for use by others, or provide an API to the trained model. (Serialize either the outputs or the model.)
- Explore Additional Problems that can be answered with the same data, or identify shortcomings of the existing data.