<center><img src='https://drive.google.com/uc?id=1_utx_ZGclmCwNttSe40kYA6VHzNocdET' height="60"></center>

AI TECH - Akademia Innowacyjnych Zastosowań Technologii Cyfrowych. Programu Operacyjnego Polska Cyfrowa na lata 2014-2020
<hr>

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>

<center>
Projekt współfinansowany ze środków Unii Europejskiej w ramach Europejskiego Funduszu Rozwoju Regionalnego
Program Operacyjny Polska Cyfrowa na lata 2014-2020,
Oś Priorytetowa nr 3 "Cyfrowe kompetencje społeczeństwa" Działanie  nr 3.2 "Innowacyjne rozwiązania na rzecz aktywizacji cyfrowej"
Tytuł projektu:  „Akademia Innowacyjnych Zastosowań Technologii Cyfrowych (AI Tech)”
    </center>

# Logistic regression

In this exercise you will train a logistic regression model via gradient descent in two simple scenarios.

The general setup is as follows:
* we are given a set of pairs $(x, y)$, where $x \in R^D$ is a vector of real numbers representing the features, and $y \in \{0,1\}$ is the target,
* for a given $x$ we model the probability of $y=1$ by $h(x):=g(w^Tx)$, where $g$ is the sigmoid function: $g(z) = \frac{1}{1+e^{-z}}$,
* to find the right $w$ we will optimize the so called logarithmic loss: $J(w) = -\frac{1}{n}\sum_{i=1}^n y_i \log{h(x_i)} + (1-y_i) \log{(1-h(x_i))}$,
* with the loss function in hand we can improve our guesses iteratively:
    * $w_j^{t+1} = w_j^{t} - \eta \cdot \frac{\partial J(w)}{\partial w_j}$

* we can end the process after some predefined number of epochs (or when the changes are no longer meaningful).

Let's start with the simplest example - linear separated points on a plane.

In [20]:
import numpy as np

np.random.seed(123)

# these parametrize the line
a = 0.3
b = -0.2
c = 0.001

# True/False mapping
def lin_rule(x, noise=0.):
    return a * x[0] + b * x[1] + c + noise < 0.

# Just for plotting
def get_y_fun(a, b, c):
    def y(x):
        return - x * a / b - c / b
    return y

lin_fun = get_y_fun(a, b, c)

In [21]:
# Training data

n = 500
range_points = 1
sigma = 0.05

X = range_points * 2 * (np.random.rand(n, 2) - 0.5)
y = [lin_rule(x, sigma * np.random.normal()) for x in X]

print(X[:10])
print(y[:10])

[[ 0.39293837 -0.42772133]
 [-0.54629709  0.10262954]
 [ 0.43893794 -0.15378708]
 [ 0.9615284   0.36965948]
 [-0.0381362  -0.21576496]
 [-0.31364397  0.45809941]
 [-0.12285551 -0.88064421]
 [-0.20391149  0.47599081]
 [-0.63501654 -0.64909649]
 [ 0.06310275  0.06365517]]
[np.False_, np.True_, np.False_, np.False_, np.False_, np.True_, np.False_, np.True_, np.True_, np.False_]


Let's plot the data.

In [22]:
import plotly.express as px

# plotly has a problem with coloring boolean values, hence stringify
# see https://community.plotly.com/t/plotly-express-scatter-color-not-showing/25962
fig = px.scatter(x=X[:, 0], y=X[:, 1], color=list(map(str, y)))
x_range = [np.min(X[:, 0]), np.max(X[:, 1])]
fig.add_scatter(x=x_range, y=list(map(lin_fun, x_range)), name='ground truth border')
fig.show()

Now, let's implement and train a logistic regression model. You should obtain accuracy of at least 96%.

In [23]:
################################################################
# TODO: Implement logistic regression and compute its accuracy #
################################################################

Let's visually asses our model. We can do this by using our estimates for $a,b,c$.

In [24]:
#################################################################
# TODO: Pass your estimates for a,b,c to the get_y_fun function #
#################################################################
lin_fun2 = get_y_fun(...)

fig = px.scatter(x=X[:, 0], y=X[:, 1], color=list(map(str, y)))
x_range = [np.min(X[:, 0]), np.max(X[:, 1])]
fig.add_scatter(x=x_range, y=list(map(lin_fun, x_range)), name='ground truth border')
fig.add_scatter(x=x_range, y=list(map(lin_fun2, x_range)), name='estimated border')
fig.show()

TypeError: get_y_fun() missing 2 required positional arguments: 'b' and 'c'

Let's now complicate the things a little bit and make our next problem nonlinear.

In [25]:
# Parameters of the ellipse
s1 = 1.
s2 = 2.
r = 0.75
m1 = 0.15
m2 = 0.125

# 0/1 mapping, checks whether we are inside the ellipse
def circle_rule(x, y, noise=0.):
    return 1 if s1 * (x - m1) ** 2 + s2 * (y - m2) ** 2 + noise < r ** 2 else 0

In [27]:
# Training data

n = 500
range_points = 1

sigma = 0.1

X = range_points * 2 * (np.random.rand(n, 2) - 0.5)

y = [circle_rule(x, y, sigma * np.random.normal()) for x, y in X]

print(X[:10])
print(y[:10])

[[ 0.18633789  0.87560968]
 [-0.81999293  0.61838609]
 [ 0.22604784  0.28001611]
 [ 0.9846182  -0.35783437]
 [-0.27962406  0.07170775]
 [ 0.2501677  -0.37650776]
 [ 0.41264707 -0.8357508 ]
 [-0.61039043 -0.97349628]
 [ 0.49924022  0.89579621]
 [ 0.537422   -0.65425777]]
[0, 0, 1, 0, 1, 1, 0, 0, 0, 0]


Let's plot the data.

In [28]:
import plotly.graph_objects as go

fig = px.scatter(x=X[:, 0], y=X[:, 1], color=list(map(str, y)))

xgrid = np.arange(np.min(X[:, 0]), np.max(X[:, 0]), 0.003)
ygrid = np.arange(np.min(X[:, 1]), np.max(X[:, 1]), 0.003)
contour =  go.Contour(
        z=np.vectorize(circle_rule)(*np.meshgrid(xgrid, ygrid, indexing="ij")),
        x=xgrid,
        y=ygrid
    )
fig.add_trace(contour)
fig.show()

Now, let's train a logistic regression model to tackle this problem. Note that we now need a nonlinear decision boundary. You should obtain accuracy of at least 90%.

Hint:
<sub><sup><sub><sup><sub><sup>
Use feature engineering.
</sup></sub></sup></sub></sup></sub>

In [60]:

# esentially, a training loop: So for each epoch,for each input pair X[0], X[1], we produce a forward pass, and then calculate the loss, and then calculate the gradient for each parameter, and at end of
# loop, divide each gradient, and divide loss by number of inputs. Update parametrers, and start next epoch:


# PARAMETERS:
a,b,c,d,e = 0.01,-0.01,0.02,-0.04,0.03 # let's try to break symmetry
lr = 0.2

def produceF(x1,x2):
  firstTerm = a * (x1 - b)**2
  secondTerm = c * (x2 - d)**2
  thirdTerm = e

  return firstTerm + secondTerm + thirdTerm


def sigmoid(f):
  return 1 / (1 + np.exp(-f))

def logistic_regression(_X):
  _F = produceF(_X[:,0],_X[:,1])
  _G = sigmoid(_F)
  return _G

EPOCHS = 200
loss = 0.0
localGrad = 0.0

gA,gB,gC,gD,gE = 0.0,0.0,0.0,0.0,0.0


for epoch in range(EPOCHS):
  loss = 0
  localGrad = 0.0
  for i in range(n):
    x1 = X[i][0]
    x2 = X[i][1]
    label = y[i]
    #print("For these inputs: ", x1, " and ", x2, " it should output this: ", label)
    # now produce the f and g
    f = produceF(x1,x2)
    g = sigmoid(f)
    # now compute loss and gradients for input x1 x2
    loss += (label * np.log(g)) + ((1 - label)*np.log(1 - g))

    gToG = (label / g) - ((1 - label) / (1 - g))
    gToSigm = g * (1 - g)
    localG = gToG * gToSigm
    gA += localG * (x1 - b)**2
    gB += localG * 2 * (x1 - b) * a
    gC += localG * (x2 - d)**2
    gD += localG * 2 * (x2 - d) * c
    gE += localG
  # now we divide the gradients and the loss by -n
  gA /= -n
  gB /= -n
  gC /= -n
  gD /= -n
  gE /= -n
  loss /= -n

  a -= lr * gA
  b -= lr * gB
  c -= lr * gC
  d -= lr * gD
  e -= lr * gE



  print("For epoch: ", epoch, " loss is: ", loss, " with params: ", a, b, c, d, e)



For epoch:  0  loss is:  0.703436695370978  with params:  -0.013715646390303047 -0.009788430369321184 -0.008293489311747611 -0.039671870288120456 -0.01320590469356913
For epoch:  1  loss is:  0.6877634799313747  with params:  -0.036262229375259115 -0.010078866337498887 -0.03539237217784479 -0.039810616833451926 -0.05328680777597788
For epoch:  2  loss is:  0.6739055390631746  with params:  -0.057751045156886165 -0.010843215431076038 -0.061419794869812874 -0.04040643758115202 -0.09053334444108777
For epoch:  3  loss is:  0.6616199298101237  with params:  -0.07823919281670455 -0.012055519890572384 -0.08643471187988422 -0.041449045965445575 -0.12512856300545805
For epoch:  4  loss is:  0.6507230162372818  with params:  -0.09778166609944153 -0.013690382745784714 -0.11049335110438964 -0.04292639251555531 -0.15724714957177546
For epoch:  5  loss is:  0.6410521460057298  with params:  -0.11643086988726098 -0.01572367205022073 -0.13364879273627972 -0.04482571103272997 -0.1870545038603119
For e

Let's visually asses our model.

Contrary to the previous scenario, converting our weights to parameters of the ground truth curve may not be straightforward. It's easier to just provide predictions for a set of points in $R^2$.

In [61]:
h = .02

xgrid = np.arange(np.min(X[:, 0]), np.max(X[:, 0]), h)
ygrid = np.arange(np.min(X[:, 1]), np.max(X[:, 1]), h)

xx, yy = np.meshgrid(xgrid, ygrid, indexing="ij")
X_plot = np.c_[xx.ravel(), yy.ravel()]

print(X_plot.shape)

_X = np.concatenate([X_plot, X_plot**2], axis=1)

preds = logistic_regression(_X)
print(preds.shape)


(10000, 2)
(10000,)


In [62]:
fig = px.scatter(x=X[:, 0], y=X[:, 1], color=list(map(str, y)))

xx, yy = np.meshgrid(xgrid, ygrid, indexing="ij")

contour = go.Contour(z=preds.reshape(len(xgrid), len(ygrid)), x=xgrid, y=ygrid)
fig.add_trace(contour)
fig.show()

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>