<a href="https://colab.research.google.com/github/MicahMeadows/CSC-781-GoogleColab/blob/main/Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Logistic Regression
Logistic regression is a way to figure out if one thing can predict if something else will happen. It's used in lots of different areas like money, health, and marketing. In this report, we'll use the "sklearn breast cancer" dataset to learn more about logistic regression and how it can be useful.

Difference between Logistic Regression and Linear Regression:

Linear regression and logistic regression are different because they look at different kinds of things. Linear regression is used when we want to predict a number, like how much money someone will make. Logistic regression is used when we want to predict if something will happen or not, like if someone will get cancer or not. Linear regression gives us a straight line, but logistic regression gives us a curve that looks like an S. Also, linear regression has something called "normal distribution", which is different than what logistic regression has.

In [None]:
from sklearn import datasets # dataset
from sklearn.metrics import roc_auc_score # scoring
from sklearn.model_selection import train_test_split # data manipulation
import numpy as np # data manipulation

#### Sigmoid
Our sigmoid function will be used in order to determine the cost value of an input, the sigmoid function is the S shaped function that grades our input between 0 and 1. Generally, a result towards 0 will be 0 and 1 will be 1, however it is also possible to change the boundary so for instance anything about .7 will result in 1 and anything below will be 0.

In [None]:
def sigmoid(x):
  return 1 / (1 + np.exp(-x))

## Logistic Regression Model
Here we have our custom Logistic Regression Model, using this model we can fit our logistic regression to give us our accurate (hopefully) prediction.

In [None]:
class LogisticRegression:
  def __init__(self, lr=.001, n_iters=1000):
    self.lr = lr
    self.n_iters = n_iters
    self.weights = None
    self.bias = None
  
  def fit(self, X, y):
    n_samples, n_features = X.shape
    self.weights = np.zeros(n_features)
    self.bias = 0

    for _ in range(self.n_iters):
      linear_preds = np.dot(X, self.weights) + self.bias
      preds = sigmoid(linear_preds)

      dw = (1 / n_samples) * np.dot(X.T, (preds - y))
      db = (1 / n_samples) * np.sum(preds - y)

      self.weights = self.weights - self.lr * dw
      self.bias = self.bias - self.lr * db
    
  def predict(self, X):
    linear_pred = np.dot(X, self.weights) + self.bias
    y_pred = sigmoid(linear_pred)
    class_pred = [0 if y < .5 else 1 for y in y_pred]
    return class_pred


#### Data setup
Here we will import the breast cancer dataset from sklearn and split it into training and testing data

In [None]:
bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=1234)

  return 1 / (1 + np.exp(-x))


#### Regression!
This is where we use our regression model to predict our test data!

In [None]:
clf = LogisticRegression(lr=.01)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

#### Scoring Methods
accuracy(): used in order to check how many of the predictions are correct, the first input is the predicted values, the second input is the actual values.

auc_score(): used to check the Area Under Curve test, the first input is the predicted values, the second input is the actual values.

In [None]:
def accuracy(y_pred, y_test):
  return np.sum(y_pred == y_test) / len(y_test)

def auc_score(y_pred, y_test):
  score = roc_auc_score(y_pred, y_test)
  return score

#### Running Scoring
Below we can see our testing data scored decently well for predicting the breast cancer. We can see with our regular scoring we achieved a 92.1% accuracy and with the AUC test we get 92.3%!

These results are decent for our simple test however with about an 8% chance for innacuracy I would likely not use this method to actually determine if patients have breast cancer.

In [None]:
acc = accuracy(y_pred, y_test)
print(f'accuracy: {acc}')

auc = auc_score(y_pred, y_test)
print(f'auc score: {auc}')

accuracy: 0.9210526315789473
auc score: 0.9226190476190476


## Conclusion
In conclusion, we have explored logistic regression and its use in predicting binary outcomes. By using the sklearn breast cancer dataset, we can understand how logistic regression can help us identify potential cases of breast cancer based on various factors. Logistic regression is a powerful tool with a wide range of applications in fields like finance, healthcare, marketing, and more. Understanding logistic regression and its capabilities can help us make better-informed decisions in a variety of contexts.