<a href="https://colab.research.google.com/github/ChoeTaeBin/Machine-Learnig/blob/main/LogisticRgression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

로지스틱 회기를 이용하여 분류를 할 수 있고 각 클래스일 확률을 계산할 수 있다.

In [1]:
import pandas as pd
fish = pd.read_csv('https://bit.ly/fish_csv_data')
fish.head() #처음 5개행 출력

Unnamed: 0,Species,Weight,Length,Diagonal,Height,Width
0,Bream,242.0,25.4,30.0,11.52,4.02
1,Bream,290.0,26.3,31.2,12.48,4.3056
2,Bream,340.0,26.5,31.1,12.3778,4.6961
3,Bream,363.0,29.0,33.5,12.73,4.4555
4,Bream,430.0,29.0,34.0,12.444,5.134


In [4]:
fish_input = fish[['Weight','Length', 'Diagonal', 'Height', 'Width']].to_numpy()
fish_target = fish['Species'].to_numpy()

print(fish_input[:5])
print(fish_target[:5])

[[242.      25.4     30.      11.52     4.02  ]
 [290.      26.3     31.2     12.48     4.3056]
 [340.      26.5     31.1     12.3778   4.6961]
 [363.      29.      33.5     12.73     4.4555]
 [430.      29.      34.      12.444    5.134 ]]
['Bream' 'Bream' 'Bream' 'Bream' 'Bream']


In [6]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

train_input, test_input, train_target, test_target = train_test_split(fish_input, fish_target, random_state = 42) #세트 분할
#정규화
ss = StandardScaler()
ss.fit(train_input)
train_scaled = ss.transform(train_input)
test_scaled = ss.transform(test_input)

In [7]:
#이진 분류
import numpy as np

bream_smelt_indexes = (train_target == 'Bream') | (train_target == 'Smelt') #breamr과 smelt데이터만 고르기위해 boolean indexing사용
train_input_bream_smelt = train_scaled[bream_smelt_indexes]
train_target_bream_smelt = train_target[bream_smelt_indexes]

bream_smelt_indexes = (test_target == 'Bream') | (test_target == 'Smelt')
test_input_bream_smelt = test_scaled[bream_smelt_indexes]
test_target_bream_smelt = test_target[bream_smelt_indexes]

In [13]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(train_input_bream_smelt, train_target_bream_smelt)

print('<예측>\n', lr.predict(test_input_bream_smelt[:5]))
print('<확률>\n', lr.predict_proba(test_input_bream_smelt[:5]))
print('<정답>\n', test_target_bream_smelt[:5])

<예측>
 ['Smelt' 'Bream' 'Smelt' 'Bream' 'Bream']
<확률>
 [[3.95673649e-02 9.60432635e-01]
 [9.99418084e-01 5.81915885e-04]
 [2.57680368e-02 9.74231963e-01]
 [9.94091561e-01 5.90843851e-03]
 [9.93797733e-01 6.20226656e-03]]
<정답>
 ['Smelt' 'Bream' 'Smelt' 'Bream' 'Bream']


직접 시그모이드함수(로지스틱함수)로 확률계산 해보기

In [14]:
from scipy.special import expit

decisions = lr.decision_function(test_input_bream_smelt[:5])
print(expit(decisions))

[9.60432635e-01 5.81915885e-04 9.74231963e-01 5.90843851e-03
 6.20226656e-03]


In [15]:
# 다중 분류

lr = LogisticRegression(C=20, max_iter = 1000) #C가 클수록 규제가 약함
lr.fit(train_scaled, train_target)
print('train score: ',lr.score(train_scaled,train_target))
print('test score: ',lr.score(test_scaled, test_target))

train score:  0.9327731092436975
test score:  0.925


In [16]:
print('<예측>\n', lr.predict(test_scaled[:5]))
print('<확률>\n', lr.predict_proba(test_scaled[:5]))
print('<정답>\n', test_target[:5])

<예측>
 ['Perch' 'Smelt' 'Pike' 'Roach' 'Perch']
<확률>
 [[7.24962119e-06 1.35114379e-02 8.41283507e-01 3.14327385e-04
  1.35660262e-01 6.67136890e-03 2.55184724e-03]
 [7.14996969e-09 2.55552480e-03 4.39079515e-02 3.37984180e-05
  7.30920812e-03 9.46188255e-01 5.25482718e-06]
 [1.86538035e-05 2.79615816e-06 3.40568865e-02 9.34811277e-01
  1.50466238e-02 1.60341031e-02 2.96594224e-05]
 [1.09323066e-02 3.40497502e-02 3.05540604e-01 6.60901798e-03
  5.66580634e-01 6.87213267e-05 7.62189659e-02]
 [4.48967205e-06 3.67276045e-04 9.04006272e-01 2.41283576e-03
  8.94706737e-02 2.40961911e-03 1.32883359e-03]]
<정답>
 ['Perch' 'Smelt' 'Pike' 'Whitefish' 'Perch']


직접 소프트멕스 함수로 확률 구해보기

In [17]:
from scipy.special import softmax
decision = lr.decision_function(test_scaled[:5])
proba = softmax(decision, axis=1)
print(proba)

[[7.24962119e-06 1.35114379e-02 8.41283507e-01 3.14327385e-04
  1.35660262e-01 6.67136890e-03 2.55184724e-03]
 [7.14996969e-09 2.55552480e-03 4.39079515e-02 3.37984180e-05
  7.30920812e-03 9.46188255e-01 5.25482718e-06]
 [1.86538035e-05 2.79615816e-06 3.40568865e-02 9.34811277e-01
  1.50466238e-02 1.60341031e-02 2.96594224e-05]
 [1.09323066e-02 3.40497502e-02 3.05540604e-01 6.60901798e-03
  5.66580634e-01 6.87213267e-05 7.62189659e-02]
 [4.48967205e-06 3.67276045e-04 9.04006272e-01 2.41283576e-03
  8.94706737e-02 2.40961911e-03 1.32883359e-03]]
