# loss categorical_crossentropy

2가지 crossentropy 사용 방법
- categorical_crossentropy
- sparse_categorical_crossentropy

## categorical_crossentropy
y의 값이 one hot encoding인 경우
```
1,0,0
0,1,0
0,0,1
```

출력 레이어 설정
```
model.add(Dense(3, activation="softmax")) # 출력 레이어
```

loss 설정
```
model.compile(..., loss='categorical_crossentropy')
```


## sparse_categorical_crossentropy
y의 값이 one hot encoding인 경우
```
0
1
2
```

출력 레이어 설정
```
model.add(Dense(3, activation="softmax")) # 출력 레이어. 1이 아니라 클래스 수 3
```

loss 설정
```
model.compile(..., loss='sparse_categorical_crossentropy')
```





# iris_dnn with category index

아래의 코드는 dnn_iris_and_optimizer.ipynb의 코드를 기반으로 한다.


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import optimizers
from tensorflow.keras.layers import Dense

In [2]:
# !wget https://raw.githubusercontent.com/dhrim/MDC_2021/master/material/deep_learning/iris_with_category_index.csv
  

--2021-11-26 06:32:44--  https://raw.githubusercontent.com/dhrim/MDC_2021/master/material/deep_learning/iris_with_category_index.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2218 (2.2K) [text/plain]
Saving to: ‘iris_with_category_index.csv’


2021-11-26 06:32:44 (34.8 MB/s) - ‘iris_with_category_index.csv’ saved [2218/2218]



### 실습용 데이터 받기

In [5]:
!wget https://raw.githubusercontent.com/dhrim/MDC_2021/master/material/library/flawed_iris.csv

--2021-11-26 06:39:31--  https://raw.githubusercontent.com/dhrim/MDC_2021/master/material/library/flawed_iris.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2782 (2.7K) [text/plain]
Saving to: ‘flawed_iris.csv’


2021-11-26 06:39:32 (48.0 MB/s) - ‘flawed_iris.csv’ saved [2782/2782]



```
추가된 컬럼 삭제 or 카테고리성 데이터 처리
결측치, 이상치 제거
```

In [39]:
flawed_iris = pd.read_csv("flawed_iris.csv")
flawed_iris.head()

Unnamed: 0,septal_length,septal_width,petal_length,petal_width,color,class
0,6.4,2.8,5.6,2.2,light,2.0
1,5.0,2.3,3.3,1.0,medium,1.0
2,4.9,2.5,4.5,1.7,medium,2.0
3,4.9,3.1,1.5,0.1,dark,0.0
4,5.7,3.8,1.7,0.3,dark,0.0


In [40]:
flawed_iris.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120 entries, 0 to 119
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   septal_length  117 non-null    float64
 1   septal_width   118 non-null    object 
 2   petal_length   117 non-null    float64
 3   petal_width    118 non-null    float64
 4   color          117 non-null    object 
 5   class          119 non-null    float64
dtypes: float64(4), object(2)
memory usage: 5.8+ KB


In [41]:
flawed_iris.describe()

Unnamed: 0,septal_length,petal_length,petal_width,class
count,117.0,117.0,118.0,119.0
mean,5.809402,3.523077,1.683051,0.957983
std,1.597735,2.102682,3.172567,0.817136
min,-5.8,-6.1,-1.0,0.0
25%,5.0,1.5,0.3,0.0
50%,5.8,4.2,1.35,1.0
75%,6.4,5.1,1.9,2.0
max,14.5,6.9,23.3,2.0


In [42]:
flawed_iris.isnull().any()

septal_length    True
septal_width     True
petal_length     True
petal_width      True
color            True
class            True
dtype: bool

In [43]:
flawed_iris.isnull().sum()

septal_length    3
septal_width     2
petal_length     3
petal_width      2
color            3
class            1
dtype: int64

In [44]:
# color 카테고리 컬럼 삭제
flawed_iris.drop(columns='color', inplace=True)
flawed_iris.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120 entries, 0 to 119
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   septal_length  117 non-null    float64
 1   septal_width   118 non-null    object 
 2   petal_length   117 non-null    float64
 3   petal_width    118 non-null    float64
 4   class          119 non-null    float64
dtypes: float64(4), object(1)
memory usage: 4.8+ KB


In [45]:
# null 데이터 전체 삭제
flawed_iris.dropna(inplace=True)
flawed_iris.isnull().any()

septal_length    False
septal_width     False
petal_length     False
petal_width      False
class            False
dtype: bool

In [46]:
flawed_iris.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 109 entries, 0 to 119
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   septal_length  109 non-null    float64
 1   septal_width   109 non-null    object 
 2   petal_length   109 non-null    float64
 3   petal_width    109 non-null    float64
 4   class          109 non-null    float64
dtypes: float64(4), object(1)
memory usage: 5.1+ KB


In [49]:
flawed_iris.septal_width = flawed_iris.septal_width.astype(float)
flawed_iris.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 109 entries, 0 to 119
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   septal_length  109 non-null    float64
 1   septal_width   109 non-null    float64
 2   petal_length   109 non-null    float64
 3   petal_width    109 non-null    float64
 4   class          109 non-null    float64
dtypes: float64(5)
memory usage: 5.1 KB


In [50]:
flawed_iris['class'] = flawed_iris['class'].astype(int)
flawed_iris.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 109 entries, 0 to 119
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   septal_length  109 non-null    float64
 1   septal_width   109 non-null    float64
 2   petal_length   109 non-null    float64
 3   petal_width    109 non-null    float64
 4   class          109 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 5.1 KB


### ML


In [51]:
data = flawed_iris.to_numpy()
print(data.shape)
print(data[:5])

(109, 5)
[[6.4 2.8 5.6 2.2 2. ]
 [5.  2.3 3.3 1.  1. ]
 [4.9 2.5 4.5 1.7 2. ]
 [4.9 3.1 1.5 0.1 0. ]
 [5.7 3.8 1.7 0.3 0. ]]


In [52]:
x = data[:,:4]
y = data[:,4:]

split_index = 100

train_x, test_x = x[:split_index], x[split_index:]
train_y, test_y = y[:split_index], y[split_index:]

print(train_x.shape)
print(train_y.shape)
print(test_x.shape)
print(test_y.shape)

(100, 4)
(100, 1)
(9, 4)
(9, 1)


In [57]:
model = keras.Sequential()
model.add(Dense(10, activation='relu', input_shape=(4,)))
model.add(Dense(10, activation='relu'))
model.add(Dense(3, activation="softmax")) # 1이 아니고 클래스 수 3이다

# model.compile(optimizer="SGD", loss="categorical_crossentropy", metrics=["accuracy"])
model.compile(optimizer="SGD", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.summary()

model.fit(train_x, train_y, epochs=1000, verbose=0, batch_size=20)

loss, acc = model.evaluate(test_x, test_y)
print("loss=", loss)
print("acc=", acc)
              


Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_18 (Dense)            (None, 10)                50        
                                                                 
 dense_19 (Dense)            (None, 10)                110       
                                                                 
 dense_20 (Dense)            (None, 3)                 33        
                                                                 
Total params: 193
Trainable params: 193
Non-trainable params: 0
_________________________________________________________________
loss= 0.13655050098896027
acc= 0.8888888955116272


In [55]:
y_ = model.predict(test_x)
print(y_)
print(np.argmax(y_, axis=1))

[[2.0619889e-06 3.7133968e-01 6.2865829e-01]
 [6.2921448e-05 9.6769321e-01 3.2243997e-02]
 [9.9377257e-01 1.5139990e-03 4.7134534e-03]
 [5.1070299e-09 1.4416112e-02 9.8558390e-01]
 [9.9822420e-01 4.8085640e-04 1.2949773e-03]
 [1.6331625e-05 8.1785947e-01 1.8212414e-01]
 [9.9514019e-01 1.3507839e-03 3.5089601e-03]
 [9.9605042e-01 9.4928429e-04 3.0002440e-03]
 [1.0254380e-02 7.2422260e-01 2.6552305e-01]]
[2 1 0 2 0 1 0 0 1]
