# Exercise: Big Picture, Decision Making, Mini Project

We have revisited what we have learned so far and discussed the importance of model assessment and parameter optimization.

The goal of this exercise is to take our learnings a little bit further.

Answer below questions.
The code snippeds already contained in the notebook will provide you with hints.

- **For each question, give the answer by adding it to this cell.**
- **Submit your answers through this [form](https://forms.gle/qzdiHYvuhFZoZHUn8).**


## Questions
### Big Picture
You are responsible for the implementation/rollout of a ML based system.

1. Based on what you have learned so far, go through this [project checklist](https://tdgunes.com/COMP6246-2018Fall/lab1/extra1_3.pdf) (taken from the Hands On ML book) and select the three single items (not headlines but specific actions) you consider most important over all checks listed, i.e. three in total.
   **In addition, add your choice to this [form](https://forms.gle/nZ8rzFKh72irCfn86).**

### Decision Making
A ML system should be used to automate a classification problem. Perform a simple 
analysis by computing the [expected value](https://en.wikipedia.org/wiki/Expected_value) as $\textrm{expected value} = \sum_{\textrm{all possible events}} \textrm{probability of event} * \textrm{value of event}$ (side note: this is similar to [risk assessment](https://en.wikipedia.org/wiki/Risk#Risk_assessment_and_analysis)).
   
As an example, revisit our toy classification problem for the moons data set and consider the [confusion matrix](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html) (or see [Wikipedia](https://en.wikipedia.org/wiki/Confusion_matrix)) below.
For each possible outcome, we expect the following value, i.e. profit or loss:
- The model correctly predicts a positive outcome: 300
- The model correctly predicts a negative outcome: 100
- The model falsely predicts a positive outcome: -2000
- The model falsely predicts a negative outcome: -200


2. What is the expected value?
3. Based on your assessment, do you recommend to use the ML system or not? Provide an explanation.

### Mini Project
You are tasked with predicting the value of assets.

As an example, we consider "[california housing prices](https://scikit-learn.org/stable/datasets/index.html#california-housing-dataset)".

4. List three specific tasks you consider important in tackling this problem.
5. Perform three specific tasks (not necessarily the same) to approach a solution and describe your findings.

## Answers
### Big Picture
1. TBA

### Decision Making
2. TBA
3. TBA

### Mini Project
4. TBA
5. TBA

## Examples

### Big Picture

*see above*

### Decision Making

In [1]:
from sklearn.datasets import make_moons
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions
import numpy as np

plt.rcParams['figure.figsize'] = (10, 6)

%matplotlib inline

In [2]:
# given data
X, y = make_moons(n_samples=200, noise=0.4, random_state=123)
# train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=123)
# model
# the focus of this exercise is **not** to optimize the model
# you can leave the parameters as is
model = DecisionTreeClassifier(random_state=123)
model.fit(X_train, y_train)
# prediction
yp_test = model.predict(X_test)

In [3]:
help(confusion_matrix)

Help on function confusion_matrix in module sklearn.metrics.classification:

confusion_matrix(y_true, y_pred, labels=None, sample_weight=None)
    Compute confusion matrix to evaluate the accuracy of a classification
    
    By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}`
    is equal to the number of observations known to be in group :math:`i` but
    predicted to be in group :math:`j`.
    
    Thus in binary classification, the count of true negatives is
    :math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is
    :math:`C_{1,1}` and false positives is :math:`C_{0,1}`.
    
    Read more in the :ref:`User Guide <confusion_matrix>`.
    
    Parameters
    ----------
    y_true : array, shape = [n_samples]
        Ground truth (correct) target values.
    
    y_pred : array, shape = [n_samples]
        Estimated targets as returned by a classifier.
    
    labels : array, shape = [n_classes], optional
        List of labels to index the m

In [4]:
# please note that the confusion matrix on Wikipedia is transposed (rows and columns swapped)
conf_mat = confusion_matrix(y_test, yp_test)
conf_mat

array([[17,  7],
       [ 1, 25]])

### Mini Project

In [5]:
import pandas as pd
from sklearn.datasets import fetch_california_housing

data = fetch_california_housing()
data

# as dataframe
df = pd.DataFrame(data['data'], columns=data['feature_names'])
df['price'] = data['target']

# as plain arrays
X = data['data']
y = data['target']