# Logistic regression is a popular statistical model used for binary classification tasks in machine learning. It is named after the logistic function, also known as the sigmoid function, which is used to transform the linear regression output into a probability value between 0 and 1.



In logistic regression, the goal is to estimate the probability that an instance belongs to a particular class (e.g., class 1 or class 0). The model calculates a weighted sum of the input features, applies the logistic function to the result, and produces a probability score. If the probability is above a certain threshold (e.g., 0.5), the instance is classified as belonging to class 1; otherwise, it is classified as belonging to class 0.



Logistic regression assumes a linear relationship between the input features and the log-odds of the target variable. Despite the name "regression," logistic regression is a classification algorithm, not a regression algorithm.

# Logistic regression can be used in various scenarios, including:

(1) Binary classification: When you have a binary outcome variable and want to predict the probability of an instance belonging to a particular class. For example, predicting whether an email is spam or not spam.

(2) Customer churn prediction: Identifying customers who are likely to cancel their subscription or stop using a service.

(3) Credit risk assessment: Determining the likelihood of a customer defaulting on a loan or credit card payment.

(4) Medical diagnosis: Predicting the probability of a patient having a specific disease based on their symptoms and medical history.

(5) Sentiment analysis: Classifying text or reviews as positive or negative based on the sentiment expressed.
    

It is important to note that logistic regression assumes that the relationship between the input features and the log-odds of the target variable is linear. If the relationship is nonlinear, more complex models like decision trees or neural networks may be more appropriate.

In [1]:
import numpy as np 

In [2]:
np.random.seed(0) 
X = np.random.randn(100,2) 
y = np.random.randint(0,2,100) 

In [3]:
from sklearn.model_selection import train_test_split 

In [4]:
X_train , X_test , y_train , y_test = train_test_split(X,y,test_size = 0.2 , random_state = 0) 

In [5]:
from sklearn.linear_model import LogisticRegression 

In [6]:
lr = LogisticRegression() 

In [7]:
lr.fit(X_train , y_train) 

LogisticRegression()

In [8]:
y_pred = lr.predict(X_test) 

In [9]:
from sklearn.metrics import r2_score , mean_squared_error  , accuracy_score

In [10]:
print("r2-score:" , r2_score(y_test , y_pred))

r2-score: -1.6262626262626263


In [11]:
print("accuracy score:" , accuracy_score(y_test , y_pred)) 

accuracy score: 0.35


# project -2 

In [12]:
import numpy as np 
import pandas  as pd

In [13]:
from sklearn.datasets import load_iris 
from sklearn.linear_model import LogisticRegression 

In [14]:
iris = load_iris() 

In [15]:
df = pd.DataFrame(data = iris.data , columns = iris.feature_names)

In [16]:
df['target'] = iris.target

In [17]:
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


In [18]:
x = df.drop(columns = ['target'] , axis = 1)  
y = df['target'] 

In [19]:
from sklearn.model_selection import train_test_split 

In [20]:
x_train , x_test , y_train ,y_test = train_test_split(x,y,test_size = 0.2,random_state = 0) 

In [21]:
from sklearn.linear_model import LogisticRegression 

In [22]:
lr = LogisticRegression() 

In [23]:
lr.fit(x_train , y_train) 

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


LogisticRegression()

In [24]:
y_pred = lr.predict(x_test) 

In [25]:
from sklearn.metrics import accuracy_score , r2_score

In [26]:
print("r2_score :" , r2_score(y_test , y_pred)) 

r2_score : 1.0


In [27]:
print("Accuracy_score:" , accuracy_score(y_test , y_pred))

Accuracy_score: 1.0


# project -3 (predicting customer churn in a telecommunications company)

In [28]:
import numpy as np 
import pandas as pd 

In [29]:
df = pd.read_csv('C:\\Users\\hp\\Downloads\\telecom.csv')
df.head()
df=df.drop(columns=['customerID','TotalCharges',"gender"])
df

Unnamed: 0,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,Churn
0,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,No
1,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,No
2,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,Yes
3,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.30,No
4,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.70,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,0,Yes,Yes,24,Yes,Yes,DSL,Yes,No,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,No
7039,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,Yes,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,No
7040,0,Yes,Yes,11,No,No phone service,DSL,Yes,No,No,No,No,No,Month-to-month,Yes,Electronic check,29.60,No
7041,1,Yes,No,4,Yes,Yes,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Mailed check,74.40,Yes


In [30]:
df['MultipleLines'].value_counts()

No                  3390
Yes                 2971
No phone service     682
Name: MultipleLines, dtype: int64

In [31]:
df['InternetService'].value_counts()

Fiber optic    3096
DSL            2421
No             1526
Name: InternetService, dtype: int64

In [32]:
df['OnlineSecurity'].value_counts()

No                     3498
Yes                    2019
No internet service    1526
Name: OnlineSecurity, dtype: int64

In [33]:
df['DeviceProtection'].value_counts()

No                     3095
Yes                    2422
No internet service    1526
Name: DeviceProtection, dtype: int64

In [34]:
df['Contract'].value_counts()

Month-to-month    3875
Two year          1695
One year          1473
Name: Contract, dtype: int64

In [35]:
df['PaymentMethod'].value_counts()

Electronic check             2365
Mailed check                 1612
Bank transfer (automatic)    1544
Credit card (automatic)      1522
Name: PaymentMethod, dtype: int64

In [36]:
x= df.drop(columns = ['Churn'])
y= df['Churn']

In [37]:
from sklearn.model_selection import train_test_split

In [38]:
x_train, x_test, y_train, y_test = train_test_split(x,y , test_size = 0.2 , random_state = 42)


In [39]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, LabelEncoder

In [40]:
transformer = ColumnTransformer(transformers = [
    ('tnf2', LabelEncoder(),['Churn']),
    ('tnf1',OneHotEncoder(sparse = False), [ 'Partner', 'Dependents','PhoneService', 'MultipleLines','InternetService','OnlineSecurity','OnlineBackup','DeviceProtection','TechSupport','StreamingTV','StreamingMovies','Contract','PaperlessBilling','PaymentMethod'] )
    
#     ("ordi",OrdinalEncoder(categories=[["Male","Female"]]),["gender"])
], remainder = 'passthrough')

In [41]:
transformer

ColumnTransformer(remainder='passthrough',
                  transformers=[('tnf2', LabelEncoder(), ['Churn']),
                                ('tnf1', OneHotEncoder(sparse=False),
                                 ['Partner', 'Dependents', 'PhoneService',
                                  'MultipleLines', 'InternetService',
                                  'OnlineSecurity', 'OnlineBackup',
                                  'DeviceProtection', 'TechSupport',
                                  'StreamingTV', 'StreamingMovies', 'Contract',
                                  'PaperlessBilling', 'PaymentMethod'])])

In [42]:
from sklearn.linear_model import LogisticRegression

In [43]:
lr = LogisticRegression()

In [44]:
lr.fit(x_train, y_train)


ValueError: could not convert string to float: 'No'

In [None]:
df.info()