# Naive Bayes

The skiing season is open. To reliably decide when to go skiing and when not, you could use a classifier such as Naive Bayes. The classifier will be trained with your observations from the last year. Your notes include the following attributes:

- *The weather:* The attribute `weather` can have the following three values:
    - `sunny`
    - `rainy`
    - `snow`
- *The snow level:* The attribute `snow level` can have the following three values:
    - $\geq 50$ (There are at least 50 cm of snow)
    - $< 50$ (There are less than 50 cm of snow)
    
Assume you wanted to go skiing 8 times during the previous year. Here is the table with your decisions:

In [1]:
import pandas as pd
df = pd.DataFrame([
    ["sunny", "< 50", "no"],
    ["rainy", "< 50", "no"],
    ["rainy", ">= 50", "no"],
    ["snow", ">= 50", "yes"],
    ["snow", "< 50", "no"],
    ["sunny", ">= 50", "yes"],
    ["snow", ">= 50", "yes"],
    ["rainy", "< 50", "yes"]],
    columns = ["weather","snow level", "ski?"])
df

Unnamed: 0,weather,snow level,ski?
0,sunny,< 50,no
1,rainy,< 50,no
2,rainy,>= 50,no
3,snow,>= 50,yes
4,snow,< 50,no
5,sunny,>= 50,yes
6,snow,>= 50,yes
7,rainy,< 50,yes


a) Compute the *a priori* probabilities for both classes `ski = yes` and `ski = no` (on the training set)!

#### TODO

$\text{Count}_{total}=8$

$\text{Count}_{no}=4 \Rightarrow P_{no}=0.5$

$\text{Count}_{yes}=4 \Rightarrow P_{yes}=0.5$

In [5]:
import numpy as np

p_no = np.mean([df["ski?"]=="no"])
p_yes = np.mean([df["ski?"]=="yes"])

print(p_no)
print(p_yes)

0.5
0.5


b) Compute the conditional distributions for the two classes for each attribute.

#### TODO

$P(\text{weather=sunny | yes}) = \frac{\text{Anzahl sunny Tage mit ski=yes}}{\text{Anzahl Tage mit ski=yes}} =\frac{1}{4}$

$P(\text{weather=sunny | no}) = \frac{1}{4}$

$P(\text{weather=snow | yes}) = \frac{2}{4} = \frac{1}{2}$

$P(\text{weather=snow | no}) = \frac{1}{4}$

$P(\text{weather=rainy | yes}) = \frac{1}{4}$

$P(\text{weather=rainy | no}) = \frac{2}{4} = \frac{1}{2}$

$P(\text{snow=<50 | yes}) = \frac{1}{4}$

$P(\text{weather=<50 | no}) = \frac{3}{4}$

$P(\text{snow=>=50 | yes}) = \frac{3}{4}$

$P(\text{weather=>=50 | no}) = \frac{1}{4}$

In [26]:
def Cond_prob(df,col,val,label):
    return np.sum((np.array([df[col]==val])*np.array([df["ski?"]==label]))) / np.sum([df["ski?"]==label])

prob_cond=dict()
prob_cond["no"]=p_no
prob_cond["yes"]=p_yes
for col in df.keys()[:-1]:
    for val in set(df[col]):
        for label in set(df["ski?"]):
            text = val + label
            prob_cond[text] = Cond_prob(df,col,val,label)
            print(col,val,label,":",prob_cond[text])

weather sunny no : 0.25
weather sunny yes : 0.25
weather rainy no : 0.5
weather rainy yes : 0.25
weather snow no : 0.25
weather snow yes : 0.5
snow level < 50 no : 0.75
snow level < 50 yes : 0.25
snow level >= 50 no : 0.25
snow level >= 50 yes : 0.75


c) Decide for the following weather and snow conditions, whether to go skiing or not! Use the Naive Bayes classificator for finding the decision.

In [27]:
dfq = pd.DataFrame([
    ["sunny",">= 50"],
    ["rainy","< 50"],
    ["snow","< 50"]],
    columns=df.columns[:2],
    index = ["day "+x for x in ["A","B","C"]])
dfq

Unnamed: 0,weather,snow level
day A,sunny,>= 50
day B,rainy,< 50
day C,snow,< 50


#### TODO

In [30]:
prob_new=[]
for i in dfq.to_numpy():
    dump = []
    texty = [i[0]+"yes",i[1]+"yes"]
    textn = [i[0]+"no",i[1]+"no"]
    dump.append((prob_cond[texty[0]]*prob_cond[texty[1]])*prob_cond["yes"])
    dump.append((prob_cond[textn[0]]*prob_cond[textn[1]])*prob_cond["no"])
    prob_new.append(dump)

for i in range(len(prob_new)):
    if np.argmax(prob_new[i]) == 0:
        print("Für Index",i,"ist Skifahren ok.")
    else:
        print("Für Index",i,"ist Skifahren nicht ok.")

Für Index 0 ist Skifahren ok.
Für Index 1 ist Skifahren nicht ok.
Für Index 2 ist Skifahren nicht ok.


# Support Vector Machines

Given the following data points in $\mathbb{R}^2$ and their associated class labels:

In [None]:
import pandas as pd
from plotly import graph_objects as go
df = pd.DataFrame([
    [2.0, 1.0, -1],
    [3.5, 0.5, -1],
    [3.5, 2.0, -1],
    [1.0, 3.0, +1],
    [2.0, 3.0, +1],
    [3.0, 4.0, +1]],
    columns=["$X_1$", "$X_2$", "$Y$"],
    index=range(1,7))
fig = go.Figure()
fig.add_trace(go.Scatter(x=df["$X_1$"][:3],y=df["$X_2$"][:3],text=list(range(1,4)),mode="markers",marker=dict(color="red")))
fig.add_trace(go.Scatter(x=df["$X_1$"][3:],y=df["$X_2$"][3:],text=list(range(4,7)),mode="markers",marker=dict(color="blue")))
fig.show()
df

a) Geometrically identify the support vectors for the maximum margin hyperplane (MMH) that linearly separates this data set into the classes given by $Y$.
Based on the support vectors, clearly show how you computed the exact values of $\vec{w}$, $b$, and the width of the margin $d$ (do not only give the result).

Hint: verify your result: your support vectors must score $+1$ respectively $-1$.

#### TODO
*(You can add the hyperplane to the plot by adding another trace to the figure above.)*

b) Compute the class $y_i$ for the new points $x_1 = (1,3.5), x_2 = (3,0.5), x_3 = (0,0)$ on the basis of your derived MMH $(w,b)$. Show how you arrived at your solution.

#### TODO

c) Given the following data, perform an analysis with respect to linear separability.

In [None]:
dfq = pd.DataFrame([
    [-1.5 , -3.  ,  1.  ],
    [-1.  ,  0.  ,  1.  ],
    [-0.5 ,  1.  ,  1.  ],
    [ 0.  ,  2.  ,  1.  ],
    [ 0.5 ,  1.  ,  1.  ],
    [ 1.  ,  2.  ,  1.  ],
    [ 1.5 ,  3.75,  1.  ],
    [-1.  , -2.  , -1.  ],
    [-0.5 , -1.  , -1.  ],
    [ 0.  , -3.  , -1.  ],
    [ 0.5 , -0.5 , -1.  ],
    [ 1.  , -2.  , -1.  ],
    [ 1.5 ,  1.5 , -1.  ]],
    columns = df.columns)
dfq

First, plot the data as a colored scatter plot in python to verify that the data is not linearly separable.

Then consider the following transformations:

 - $\phi_1(x_1, x_2) = (x^2_1, x_2)$
 - $\phi_2(x_1, x_2) = (x^3_1 - 2x_1, x_2)$
 - $\phi_3(x_1, x_2) = (x^3_1, x_2)$
 
For each transformation, create a new scatter plot.


Include all four plots in the jupyter notebook and discuss which of the transformations can be used to make the data linearly separable.
For those transformations that make the data linearly separable, add a hyperplane (not necessarily the MMH!) to the plot that separates the data.
All plots must be labeled with the appropriate transformation.

In [None]:
# TODO