# PythonLab

![img-xacLdjti8mF3b09esg9tBKdb.png](attachment:img-xacLdjti8mF3b09esg9tBKdb.png)

## Regression

In this notebook, we will learn how to implement linear and logistic regression models using the Scikit-learn library in Python. We will start by exploring the dataset and then move on to building the models.

Linear regression is a type of statistical modeling that aims to establish a relationship between a dependent variable (also known as the response variable) and one or more independent variables (also known as predictors or explanatory variables).

The goal of linear regression is to find the best linear equation that describes the relationship between the dependent variable and the independent variable(s). The linear equation takes the form:

y = mx + b

Where:

y is the dependent variable
x is the independent variable
m is the slope of the line
b is the y-intercept
The slope (m) represents the change in y for each unit change in x, while the y-intercept (b) is the value of y when x is equal to zero.

Logistic regression is a statistical model used to predict the probability of a binary outcome, such as whether a customer will buy a product or not, whether a patient has a disease or not, whether a student will pass or fail an exam, etc. It is similar to linear regression, but instead of predicting a continuous outcome variable, it predicts the probability of an event occurring.

The logistic regression model uses a sigmoid function to transform a linear combination of input variables into a value between 0 and 1, which represents the probability of the event occurring. The sigmoid function is an S-shaped curve that maps any real-valued number to a value between 0 and 1. The equation for the sigmoid function is:

$$f(z) = \frac{1}{1+e^{-z}}$$

where $z$ is a linear combination of input variables, called the logit function, which is given by:


$$z = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \beta_px_p$$
 

Here, $x_1, x_2, \ldots, x_p$ are the input variables, $\beta_0, \beta_1, \beta_2, \ldots, \beta_p$ are the coefficients or weights assigned to each input variable, and $p$ is the number of input variables.

The logistic regression model estimates the values of the coefficients such that the logit function best fits the training data. The estimation is done using maximum likelihood estimation, which involves finding the values of the coefficients that maximize the likelihood of the observed data given the model.

The process of building a linear/logistic regression model involves:

1) Collecting and preparing the data: This involves identifying the dependent and independent variables, gathering the data, and cleaning and preparing the data for analysis.

2) Exploratory Data Analysis (EDA): This involves visualizing and exploring the data to identify patterns, trends, and relationships between the variables.

3) Splitting the data: The data is split into a training set and a test set to evaluate the performance of the model.

4) Building the model: The model is built by finding the values of the slope and y-intercept that minimize the difference between the predicted values of y and the actual values of y in the training set.

5) Evaluating the model: The model is evaluated using metrics such as the coefficient of determination (R-squared) and the Root Mean Squared Error (RMSE) to assess its performance on the test set.

6) Making predictions: Once the model is built and evaluated, it can be used to make predictions on new data by plugging in values for the independent variable(s) to calculate the predicted value of the dependent variable.


### Step 1: Importing the Required Libraries
The first step is to import the required libraries. In this notebook, we will be using NumPy, Pandas, Matplotlib, and Scikit-learn.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score
from sklearn.model_selection import train_test_split

### Step 2: Loading the Dataset
The next step is to load the dataset. In this notebook, we will be using the Breast Cancer dataset. We can load the dataset using the load_breast cancer functions from Scikit-learn.

In [2]:
from sklearn.datasets import load_breast_cancer, load_iris

# Loading the Boston Housing dataset for Linear Regression
breast_cancer = load_breast_cancer()
X_breast_cancer, y_breast_cancer = pd.DataFrame(breast_cancer.data), pd.DataFrame(breast_cancer.target)

### Step 3: Exploring the Dataset
Before we move on to building the models, let's explore the dataset to gain a better understanding of it.

In [3]:
# Displaying the first five rows of the Boston Housing dataset
print(X_breast_cancer.head())

# Displaying the descriptive statistics of the Boston Housing dataset
print(X_breast_cancer.describe())

      0      1       2       3        4        5       6        7       8   \
0  17.99  10.38  122.80  1001.0  0.11840  0.27760  0.3001  0.14710  0.2419   
1  20.57  17.77  132.90  1326.0  0.08474  0.07864  0.0869  0.07017  0.1812   
2  19.69  21.25  130.00  1203.0  0.10960  0.15990  0.1974  0.12790  0.2069   
3  11.42  20.38   77.58   386.1  0.14250  0.28390  0.2414  0.10520  0.2597   
4  20.29  14.34  135.10  1297.0  0.10030  0.13280  0.1980  0.10430  0.1809   

        9   ...     20     21      22      23      24      25      26      27  \
0  0.07871  ...  25.38  17.33  184.60  2019.0  0.1622  0.6656  0.7119  0.2654   
1  0.05667  ...  24.99  23.41  158.80  1956.0  0.1238  0.1866  0.2416  0.1860   
2  0.05999  ...  23.57  25.53  152.50  1709.0  0.1444  0.4245  0.4504  0.2430   
3  0.09744  ...  14.91  26.50   98.87   567.7  0.2098  0.8663  0.6869  0.2575   
4  0.05883  ...  22.54  16.67  152.20  1575.0  0.1374  0.2050  0.4000  0.1625   

       28       29  
0  0.4601  0.11890  
1 

### Step 4: Building the Linear Regression Model
The next step is to build the linear regression model. We will use the LinearRegression class from Scikit-learn for this purpose.

In [4]:
# Splitting the Boston Housing dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_breast_cancer, y_breast_cancer, test_size=0.2, random_state=42)

# Creating a linear regression object and fitting it on the training set
linear_reg = LinearRegression()
linear_reg.fit(X_train, y_train)

# Predicting the target values on the testing set
y_pred = linear_reg.predict(X_test)

# Calculating the mean squared error and R2 score
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Displaying the mean squared error and R2 score
# The R2 Score is a statistical measure that indicates the goodness of fit of the model. 
# The value of R2 ranges from 0 to 1, with 1 indicating a perfect fit and 0 indicating no fit at all.
print('Mean Squared Error:', mse)
print('R2 Score:', r2)

Mean Squared Error: 0.0641088624702944
R2 Score: 0.7271016126223564


###  Step 5: Building the Logistic Regression Model
The final step is to build the logistic regression model. We will use the LogisticRegression class from Scikit-learn for this purpose.

In [5]:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(random_state=42, max_iter=10000)
clf.fit(X_train, y_train)


  y = column_or_1d(y, warn=True)


LogisticRegression(max_iter=10000, random_state=42)

In [6]:
from sklearn.metrics import accuracy_score

y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")


Accuracy: 0.96
