<a href="https://colab.research.google.com/github/a22106/ImageClassification/blob/main/DeeplearningPyTorch/Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear Classification Theory
* Regression: line to be close to the data points
* Classification: separate data points

## Basic Steps
1. load in some data</br>
X, Y = get_data()
2. instantiate model</br>
model = MyLinearClassifier()
3. train model</br>
model.fit(X, Y)
4. make predictions</br>
model.predict(X_test)
5. evaluate accuracy</br>
model.score(X, Y)

# Accuracy
* Predictions can only be right or wrong
* train accuracy and test accuracy

$$accuracy = \frac{correct}{total}$$

# Line
* seperates data points

$$w_1x_1 + w_2x_2 + b=0$$

![](http://i.imgur.com/aYznYzr.png)

# why make things complicated?
* y is the target: it now represents the color of the data points

# Exercise
* Rearrange into y = mx + b'format(use b' to differentiate it from original b)

$$w_1x_1 + w_2x_2 + b=0\\↓$$

$$x_2=(\frac{-w_1}{w_2})x_w+(\frac{-b}{w_2})=mx_1+b'$$



#The Decision Rule
* how do I use the line to classify

$w_1x_1 + w_2x_2 + b=a$
라 하면</br>
$if\ a\geq0 → predict 1\\
if\ a<0→ predict 0
$

* 이 선(방정식)을 이용하면 예측 모델을 만들기 쉬워짐
* x를 입력하고 선 위면 1 아래면 0으로 </br>

![](http://i.imgur.com/aYznYzr.png)

* can be written more compactly as
$$\hat{y}=u(a),\ a=w_1x_1 + w_2x_2+b$$
* 딥러닝에서 sigmoid로 표현 시
$$\hat{y}=σ(a),\ a=w_1x_1+w_2x_2+b$$

* 위 식들을 activation(활성화 함수)라고 함
![](https://miro.medium.com/max/666/1*nrxtwp6rzqdFhgYh0x-eVw.png)

## 확률적 해석(Probabilistic Interpretation)
* 일반적으로 이러한 output을 "주어진 x에 대하여 y=1로 만들 확률"이라 함
* the probability that y =1 given x
$$p(y=1|x)=\sigma(w_1x_1+w_2x_2+b)$$
</br>
* 0이하 -> 0, 0 이상 -> 1 이를 식으로 나타내면</br>
$if\ p(y=1|x) \geq 50\% → predict\ 1, else\ 0$</br>
* 여기에 `sigmoid`를 적용
* 이러한 모델을 논리 분류(Logistic regression)라 함
* sigmoid = logistic function
* 이를 줄여서 `logit` 또는 `activation`이라 함







Logistic Regression with > 2 Inputs
* 2 개 이상의 input을 가지면?
<li>* 혹은 수 백 수 천 개 이상이면?</li>
* 지금까지 배운 matrice or vector를 사용
</br>
$$p(y=1|x)=\sigma(w^Tx+b)=\sigma(\sum^D_{d=1}w_dx_d+b)$$
</br>
<img width="200" src= https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FI1HTv%2FbtrGlhfxtUl%2FOYFG3okB5dHvIDSJRPFEs0%2Fimg.png>


# Loss
* linear regression 문제에서 target과 prediction은 실제 숫자 값임(Mean Squared Error)
* Classification 문제에서 target은 category(class)임
* For classification of many # of classes: cross-entropy loss
* binary classification: binary cross-entropy loss

In [1]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

In [2]:
from sklearn.datasets import load_breast_cancer

In [3]:
data = load_breast_cancer()

In [4]:
type(data)

sklearn.utils.Bunch

In [5]:
data.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [6]:
data.data.shape

(569, 30)

In [7]:
data.target_names

array(['malignant', 'benign'], dtype='<U9')

In [8]:
data.target.shape

(569,)

In [None]:
data.data.cla