# Linear and logistic regression

As ever, let's first load the data:

In [None]:
import pandas as pd
import numpy as np
import sklearn.datasets as ds
import matplotlib.pyplot as plt

# Regression data
dataset_reg = ds.fetch_california_housing(as_frame=True)
# description
print(dataset_reg.DESCR)
X_reg = dataset_reg['data']
y_reg = dataset_reg['target']
print(X_reg.head())
print(y_reg.head())

In [None]:
# Classification data
dataset_class = ds.load_breast_cancer()
X_class = pd.DataFrame(data = dataset_class['data'], columns = dataset_class['feature_names'])
y_class = pd.DataFrame(data = dataset_class['target'], columns = ['target'])
print(dataset_class.DESCR)
print(X_class.head())
print(y_class.head())

## Linear regression

In [None]:
from sklearn.linear_model import LinearRegression
   
# We single out 1 independent variable 
x_val = X_reg[['MedInc']]

# Creation of the LinearRegression object
lin_r = LinearRegression()

# Fitting the data
lin_r.fit(x_val, y_reg)

# Obtaining predictions
prediction = lin_r.predict(x_val)

# Plotting the predicted and actual values
plt.scatter(x_val, y_reg, color = 'blue')
plt.scatter(x_val, prediction, color = 'red')
plt.show()

Notice how we obtain a straight line for the prediction. This one minimises the distances with all observations.
We can obtain the parameter estimates for $\beta_0$ and $\beta_1$ as follows:

In [None]:
# beta_1 / the slope
print(lin_r.coef_)

# beta_0 / the intercept
print(lin_r.intercept_)

## Logistic regression

In [None]:
from sklearn.linear_model import LogisticRegression

# Again, we select a particular independent variable for our analysis
x_val = X_class[['mean perimeter']]

# Creating the LogisticRegression object
log_r = LogisticRegression(solver='liblinear')

# Fitting the data
log_r.fit(x_val, y_class.values.reshape(-1,))

# And... prediction
prediction = log_r.predict(x_val)

In this case, plotting would not work out well since we only have two possible outcomes. We can calculate the number of correct predictions:

In [None]:
correct = 0
for pred, actual in zip(prediction, y_class.values):
    if pred == actual:
        correct +=1
    
print("#Correct: ", correct, " out of ", len(prediction))

There are many evaluation criteria we can use to evaluate both a regression and classification, so this was an initial visual and simple attempt at verifying our results.

Notice how similar the creation, fitting, and prediction of both models is. In general, the art is not in coding the models, but selecting the right one for the right occasion, and comparison.