###  sklearn.linear_model.LogisticRegression

* class sklearn.linear_model.LogisticRegression(penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)

Parameters:

**penalty**{‘l1’, ‘l2’, ‘elasticnet’, None}, default=’l2’

Specify the norm of the penalty:

-   `None`: no penalty is added;
    
-   `'l2'`: add a L2 penalty term and it is the default choice;
    
-   `'l1'`: add a L1 penalty term;
    
-   `'elasticnet'`: both L1 and L2 penalty terms are added.
    

Warning

Some penalties may not work with some solvers. See the parameter  `solver`  below, to know the compatibility between the penalty and solver.

New in version 0.19: l1 penalty with SAGA solver (allowing ‘multinomial’ + L1)

Deprecated since version 1.2: The ‘none’ option was deprecated in version 1.2, and will be removed in 1.4. Use  `None`  instead.

**dual**bool, default=False

Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

**tol**float, default=1e-4

Tolerance for stopping criteria.

**C**float, default=1.0

Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

**fit_intercept**bool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

**intercept_scaling**float, default=1

Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes  `intercept_scaling  *  synthetic_feature_weight`.

Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

**class_weight**dict or ‘balanced’, default=None

Weights associated with classes in the form  `{class_label:  weight}`. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as  `n_samples  /  (n_classes  *  np.bincount(y))`.

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

New in version 0.17: _class_weight=’balanced’_

**random_state**int, RandomState instance, default=None

Used when  `solver`  == ‘sag’, ‘saga’ or ‘liblinear’ to shuffle the data. See  [Glossary](https://scikit-learn.org/stable/glossary.html#term-random_state)  for details.

**solver**{‘lbfgs’, ‘liblinear’, ‘newton-cg’, ‘newton-cholesky’, ‘sag’, ‘saga’}, default=’lbfgs’

Algorithm to use in the optimization problem. Default is ‘lbfgs’. To choose a solver, you might want to consider the following aspects:

In [28]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression

# 위스콘신 유방암 데이터 불러오기
cancer = load_breast_cancer()

In [29]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# StandardScaler () 로 평균이 0 분산이 1로 데이터 분포 변환
scaler = StandardScaler()
data_scaled = scaler.fit_transform(cancer.data)

x_train,x_test,y_train,y_test = train_test_split(data_scaled,cancer.target,test_size=0.3,random_state=0)


In [30]:
from sklearn.metrics import accuracy_score, roc_auc_score

# 로지스틱 회귀를 이용하여 학습 및 예측 수행
lr_clf = LogisticRegression(intercept_scaling=True)

lr_clf.fit(x_train,y_train)

lr_pred = lr_clf.predict(x_test)
lr_pred_proba = lr_clf.predict_proba(x_test)[:,1].reshape(-1,1)

# accuracy와 roc_auc 측정
print(f'accuracy: {accuracy_score(y_test,lr_pred)}')
print(f'roc_aur : {roc_auc_score(y_test,lr_pred_proba)}')

accuracy: 0.9766081871345029
roc_aur : 0.9947089947089947


In [31]:
from sklearn.model_selection import GridSearchCV

params = {'penalty' : ['l2','l1'],
         'C' : [0.01, 0.1, 1, 5, 10]}

grid_clf = GridSearchCV(lr_clf,param_grid=params,cv=3,scoring='accuracy',verbose=False)
grid_clf.fit(x_train,y_train)
print(f'최적의 하이퍼파라미터{grid_clf.best_params_},최적의 평균 정확도:{grid_clf.best_score_}')

최적의 하이퍼파라미터{'C': 1, 'penalty': 'l2'},최적의 평균 정확도:0.982437153489785


15 fits failed out of a total of 30.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
15 fits failed with the following error:
Traceback (most recent call last):
  File "C:\Users\msi\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\msi\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1162, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "C:\Users\msi\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 54, in _check_solver
    raise ValueError(
ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

 0.97491836        nan 0.97241209  