# Theory

Logistic Regression is a statistical technique used for classification tasks. It is used to predict the probability of an event ocurring based on one or more predictor variables. In machine learning, logistic regression is often used to predict a binary outcome, such as yes/no response or a 0/1 label.

Here is how logistic regression works:

* The model is trained on a labeled dataset, where the goal is to predict a bìnary outcome base on the values of the predictor values

* The models estimates the probability that the event will occurs, using a logistic function that maps the predictor variables to a probability between 0 and 1.

* The model makes a prediction based on a decision threshold. For example, if the probability of the event ocurring is greater than 05, the model might predict that the event will occur(e.g a "yes" response), while if the probability is less than 0.5, the model might predict that the event will now occur(e.g. a "no" response).

One advantage of logistic regression is that it is a relatively simple and efficient algorithm, and it is easy to interpret the result. It can be used for both linear and non-linear classification, depending on the choice of the solver and the regularization strength.

These are also several variations of logistic regression that can be used for more specialized tasks, suchs as multinomial logistic regression (for predicting a categorical outcine with more than two classes) and ordinal logistic regression(for predicting an ordinal outcome, where the classes have a meaningfull order).

![lregression.png](attachment:024fbc0e-dcf4-4598-b981-8c762a75fca8.png)


## Scikit-learn

Logistic Regression is implemented as a linear model that predicts the probability of a binary outcome. It is used to model tthe relathionship between a dependent variable and cone or more independent variables by fitting a logistic curve to the data.

Here is a general outline of the steps for using logistic regression in scikit-learn:

* Load The data: You will need to load your data into a Numpy array or pandas DataFrame.

* Preprocess the data: You may need to clean and transform the data before fitting the model. Theis may include tasks such as imputing missing values, encoding categorical variables, and scalling the features.

* Splitt the data into training and test sets: You will need to split the data into a training set and a test set in order to evaluate the performance of the model.

* Train the model: Use the training data to fit a logistic regression model using the 'LogisticRegression' class. You will need to specify hyperparameters, such as the regularization strength and the solver to use.

* Make predictions: Use the trained model to make predictions on the test data.

* Evaluate the model: Use metrics such as accuracy, precision, and recall to assess the performance of the model.

Here is an example of how you might use logistic regression in scikit-learn to fit a model a make predictions:


In [32]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
import pandas as pd

In [33]:
data = load_breast_cancer()
dataset = pd.DataFrame(data=data['data'], columns=data['feature_names'])

In [40]:
x = dataset.copy()
y = data['target']

In [41]:
x.shape

(569, 30)

In [19]:
y.shape

(569,)

In [42]:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=.2)

In [78]:
model = LogisticRegression(C=1.0, solver='lbfgs', max_iter=209)

In [79]:
model.fit(x_train, y_train)

LogisticRegression(max_iter=209)

In [87]:
y_pred = model.predict(x_test)

In [88]:
accuracy = model.score(x_test, y_test)

In [89]:
f"Test accuracy: {accuracy:.2f}"

'Test accuracy: 0.95'