# Data는 김도휘 형제님과 김명찬 형제님이 만들어주신 보편지향 기도 데이터를 사용하였습니다. 

In [195]:
import pandas as pd
from sklearn.model_selection import train_test_split

## CSV 에서 기도문 읽어오기
def read_data(path_to_file):
    df = pd.read_csv(path_to_file, dtype=str)
    return df

df = read_data('../../data/pray456_v3.csv')

In [196]:
df.to_csv('../../data/pray456_v3withid.csv')

In [197]:
X = list(df['content'])
y = list(df['label'])
print(len(X))
print(len(y))

774
774


In [198]:
X[0]

'주님, 대림시기를 맞는 교회가 회개와 화해의 생활을 하며 저희에게  오실 아기 예수님을 기쁜 마음으로 맞이할 수 있도록 도와주소서.'

## y data encoding

In [199]:
from numpy import array
from numpy import argmax
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder

print(type(y[0]), y[:5])

# integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(y)
print(integer_encoded[:5])
y= integer_encoded
print(y[:5])

<class 'str'> ['1', '2', '3', '4', '1']
[0 1 2 3 0]
[0 1 2 3 0]


## Split to train/test

In [200]:
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1212)
print(len(x_train), len(x_test), len(y_train), len(y_test))
print(y_train[:5])

619 155 619 155
[3 2 3 1 0]


In [201]:
len(x_train), len(y_train), len(x_test), len(y_test)

(619, 619, 155, 155)

In [202]:
x_train[0]

'주님, 정신적 육체적으로 고통받는 형제들을 위하여 기도하오니  주님께서 몸소 그들을 위로하여 주시고  저희가 그들의 어려움을 함께 나누며 살아갈 수 있도록 도와주소서.'

## TF-IDF 행렬 생성

In [203]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [204]:
tfidf = TfidfVectorizer()

In [205]:
x_train = tfidf.fit_transform(x_train)
x_test = tfidf.transform(x_test)

In [206]:
print(x_train.shape)

(619, 2877)


## Logistic Regression

In [207]:
from sklearn.linear_model import LogisticRegression

In [208]:
model = LogisticRegression()

## Train

In [209]:
model.fit(x_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

# Test

In [210]:
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score

In [211]:
predict = model.predict(x_test)
accuracy = accuracy_score(y_test, predict)

In [212]:
print('Accuracy : ',accuracy)
print(classification_report(y_test, predict))

Accuracy :  0.8193548387096774
             precision    recall  f1-score   support

          0       0.86      1.00      0.93        38
          1       0.89      0.79      0.84        42
          2       0.80      0.78      0.79        41
          3       0.71      0.71      0.71        34

avg / total       0.82      0.82      0.82       155



## Train with Pytorch MLP

In [213]:
import math

import torch
from torch import nn, optim
import torch.nn.functional as F

lr = 0.5  # learning rate
epochs = 10  # how many epochs to train for


In [214]:
# Convert to np array
x_train = x_train.toarray()
x_test = x_test.toarray()


In [216]:
print(type(x_train), x_train[:5])
print(type(y_train), y_train[:5])

<class 'numpy.ndarray'> [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
<class 'numpy.ndarray'> [3 2 3 1 0]


In [221]:
# Converto torch tensor
x_train, y_train, x_test, y_test = map(
    torch.tensor, (x_train, y_train, x_test, y_test)
)

x_train = x_train.float()
x_test = x_test.float()

print(x_train.shape, x_test.shape)

torch.Size([619, 2877]) torch.Size([155, 2877])


  This is separate from the ipykernel package so we can avoid doing imports until


In [222]:

class MultiLayerPerceptron(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc_hidden = nn.Linear(2877,2000)
        self.relu = nn.ReLU() # instead of Heaviside step fn
        self.fc_output = nn.Linear(2000, 4)

    def forward(self, x):
        output = self.fc_hidden(x)
        output = self.relu(output) # instead of Heaviside step fn
        output = self.fc_output(output)
        return output
    
    
def accuracy(out, y):
    preds = torch.argmax(out, dim=1)
    return (preds == y).float().mean()


In [223]:

model = MultiLayerPerceptron()
for param in model.parameters():
    print(param)

Parameter containing:
tensor([[ 0.0061,  0.0070, -0.0039,  ...,  0.0146,  0.0164,  0.0029],
        [-0.0155,  0.0058, -0.0077,  ...,  0.0048, -0.0105, -0.0028],
        [-0.0142, -0.0164,  0.0131,  ...,  0.0051,  0.0110, -0.0015],
        ...,
        [ 0.0040,  0.0032,  0.0037,  ...,  0.0106, -0.0078, -0.0070],
        [-0.0165,  0.0021,  0.0096,  ..., -0.0127,  0.0133, -0.0153],
        [ 0.0063, -0.0147, -0.0052,  ..., -0.0167,  0.0061,  0.0004]],
       requires_grad=True)
Parameter containing:
tensor([-0.0058,  0.0171,  0.0057,  ..., -0.0184, -0.0048, -0.0126],
       requires_grad=True)
Parameter containing:
tensor([[ 0.0015,  0.0113,  0.0058,  ..., -0.0176,  0.0194, -0.0005],
        [ 0.0020,  0.0025,  0.0067,  ..., -0.0087, -0.0220,  0.0061],
        [-0.0126,  0.0159, -0.0052,  ...,  0.0121, -0.0117, -0.0155],
        [-0.0158, -0.0111,  0.0079,  ...,  0.0091,  0.0208,  0.0061]],
       requires_grad=True)
Parameter containing:
tensor([-0.0166,  0.0134, -0.0066, -0.0065], re

In [224]:
# criterion = nn.BCELoss()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = lr)

model.train()

for epoch in range(epochs):
    optimizer.zero_grad()
    # Forward pass
    y_pred = model(x_train)
    
    # Compute Loss
    loss = criterion(y_pred, y_train)
    
    # Backward pass
    loss.backward()
    optimizer.step()
    
    # Log
    val_loss = criterion(model(x_test), y_test)
    print("Epoch:", epoch, criterion(model(x_train), y_train), accuracy(model(x_train), y_train), criterion(model(x_test), y_test), accuracy(model(x_test), y_test))


Epoch: 0 tensor(88.1919, grad_fn=<NllLossBackward>) tensor(0.2633) tensor(85.6346, grad_fn=<NllLossBackward>) tensor(0.2452)
Epoch: 1 tensor(356.3346, grad_fn=<NllLossBackward>) tensor(0.6252) tensor(325.7858, grad_fn=<NllLossBackward>) tensor(0.4323)
Epoch: 2 tensor(332.7526, grad_fn=<NllLossBackward>) tensor(0.5089) tensor(393.7271, grad_fn=<NllLossBackward>) tensor(0.4968)
Epoch: 3 tensor(69.9571, grad_fn=<NllLossBackward>) tensor(0.7318) tensor(172.0058, grad_fn=<NllLossBackward>) tensor(0.6387)
Epoch: 4 tensor(3.3112, grad_fn=<NllLossBackward>) tensor(0.9661) tensor(101.9493, grad_fn=<NllLossBackward>) tensor(0.7290)
Epoch: 5 tensor(0.7965, grad_fn=<NllLossBackward>) tensor(0.9935) tensor(82.6395, grad_fn=<NllLossBackward>) tensor(0.7355)
Epoch: 6 tensor(0.6857, grad_fn=<NllLossBackward>) tensor(0.9919) tensor(81.1506, grad_fn=<NllLossBackward>) tensor(0.7419)
Epoch: 7 tensor(0.8406, grad_fn=<NllLossBackward>) tensor(0.9855) tensor(86.5473, grad_fn=<NllLossBackward>) tensor(0.7226