## Additional Learning Resources
Refer to [scikit-learn documentation](https://scikit-learn.org/stable/) and the [Pandas user guide](https://pandas.pydata.org/docs/) for detailed explanations of the functions used in this notebook.
For a quick refresher on splitting data:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```


# Visualization of the Loss Function for Logistic Regression

## Additional Learning Resources
Refer to [scikit-learn documentation](https://scikit-learn.org/stable/) and the [Pandas user guide](https://pandas.pydata.org/docs/) for detailed explanations of the functions used in this notebook.
For a quick refresher on splitting data:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```


In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
import plotly.graph_objects as go

ModuleNotFoundError: No module named 'plotly.graph_objects'

In [None]:
df = pd.read_csv('all_penguins_clean.csv')
df = df.loc[df['Species'] != 'Chinstrap', ['Species', 'Culmen Length (mm)']]
df = df.dropna()
df.head()

In [None]:
X = df[['Culmen Length (mm)']]
y = df['Species']

In [None]:
plt.scatter(X['Culmen Length (mm)'], y)

In [None]:
from sklearn.linear_model import LogisticRegression
# convert y to 0,1
y = (y == 'Gentoo').astype(int)

# scale the X values to have sd of 1 and mean of range 0,1
X = (X - X.mean())/X.std()

# check out the optimal solution (minimum of the loss function)
m = LogisticRegression(penalty='none').fit(X, y)
print(f'Bias/ w0: {m.intercept_}')
print(f'Feature Weight/ w1: {m.coef_}')

In [None]:
def sigmoid(a):
    return 1/(1+np.exp(-a))

In [None]:
def predict(w0, w1, x):
    return sigmoid(w0+w1*x)    

In [None]:
def log_loss(y, pred):
    return -np.mean(y*np.log(pred) + (1-y)*np.log(1-pred))

In [None]:
# define a parameter grid for w1 and w2
w0 = np.linspace(-1, 0, 50)
w1 = np.linspace(5, 8, 50)

losses = []

# for every combination of w1 and w2 calculate the log loss
for i in w0:
    for j in w1:
        ypred = predict(i, j, X['Culmen Length (mm)'])
        losses.append(log_loss(y, ypred))

# transform the list into a two dimensional grid
losses = np.array(losses).reshape((len(w0), len(w1)), order='F')

In [None]:
fig = go.Figure(data=[go.Surface(x=w0, y=w1, z=losses, colorscale="ice", cmin=0, cmax=losses.max())])

fig.update_layout(
    autosize=True,
    scene=dict(
        zaxis_title='Log Loss',
        yaxis_title='w1',
        xaxis_title='w0'
    ),
    title=dict(
        text="Log Loss of a Logistic Regression with 2 Paramaters",
        font=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    )
)

fig.write_html("ml.html", 
               include_plotlyjs='cdn',
               full_html=False,
               default_height='700px', 
               config=dict(displayModeBar=False)
)

In [None]:
# Practice: implement the steps discussed above
