### Naive Bayes Classifier Task
### 문장에서 느껴지는 감정 예측
##### 다중 분류(Multiclass Classification)
- 비대면 심리 상담사로서 메세지를 전달한 환자에 대한 감정 데이터를 수집했다.
- 각 메세지 별로 감정이 표시되어 있다.
- 미래에 동일한 메세지를 보내는 환자에게 어떤 심리 치료가 적합할 수 있는지 알아보기 위한 모델을 구축한다.

In [1]:
import pandas as pd

data = pd.read_csv('./datasets/feeling.csv')
data

Unnamed: 0,message;feeling
0,im feeling quite sad and sorry for myself but ...
1,i feel like i am still looking at a blank canv...
2,i feel like a faithful servant;love
3,i am just feeling cranky and blue;anger
4,i can have for a treat or if i am feeling fest...
...,...
17995,i just had a very brief time in the beanbag an...
17996,i am now turning and i feel pathetic that i am...
17997,i feel strong and good overall;joy
17998,i feel like this was such a rude comment and i...


In [2]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18000 entries, 0 to 17999
Data columns (total 1 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   message;feeling  18000 non-null  object
dtypes: object(1)
memory usage: 140.8+ KB


In [3]:
data.isna().sum()

message;feeling    0
dtype: int64

In [4]:
fe_df = pd.DataFrame(data)

# ';'을 기준으로 메시지와 감정을 분리하여 새로운 열 추가
fe_df[['Message', 'Feeling']] = fe_df['message;feeling'].str.split(';', expand=True)

# 기존의 'message;feeling' 열은 제거
fe_df.drop('message;feeling', axis=1, inplace=True)

In [5]:
fe_df

Unnamed: 0,Message,Feeling
0,im feeling quite sad and sorry for myself but ...,sadness
1,i feel like i am still looking at a blank canv...,sadness
2,i feel like a faithful servant,love
3,i am just feeling cranky and blue,anger
4,i can have for a treat or if i am feeling festive,joy
...,...,...
17995,i just had a very brief time in the beanbag an...,sadness
17996,i am now turning and i feel pathetic that i am...,sadness
17997,i feel strong and good overall,joy
17998,i feel like this was such a rude comment and i...,anger


In [6]:
fe_co_df = fe_df.copy()

In [7]:
fe_co_df = fe_co_df.rename(columns={'Feeling': 'Target'})
fe_co_df

Unnamed: 0,Message,Target
0,im feeling quite sad and sorry for myself but ...,sadness
1,i feel like i am still looking at a blank canv...,sadness
2,i feel like a faithful servant,love
3,i am just feeling cranky and blue,anger
4,i can have for a treat or if i am feeling festive,joy
...,...,...
17995,i just had a very brief time in the beanbag an...,sadness
17996,i am now turning and i feel pathetic that i am...,sadness
17997,i feel strong and good overall,joy
17998,i feel like this was such a rude comment and i...,anger


In [8]:
fe_co_df['Target'].value_counts()

Target
joy         6066
sadness     5216
anger       2434
fear        2149
love        1482
surprise     653
Name: count, dtype: int64

In [9]:
from sklearn.preprocessing import LabelEncoder
# 레이블인코더로 Feeling의 데이터들을 인코딩한다.
# 학습은 문자열로 할수 없기 때문에
feel_encoder = LabelEncoder()
# print(feel_encoder.classes_)
# Feeling의 값들을 인코딩해주고 Target칼럼을 만들어 해당 값들을 대입해준다.
targets = feel_encoder.fit_transform(fe_co_df.loc[:, 'Target'])
fe_co_df['Target'] = targets

In [10]:
# 사용이 끝난 칼럼을 삭제해준다.
# fe_df = fe_df.drop('Feeling', axis=1)
# fe_df

In [11]:
from sklearn.model_selection import train_test_split

# 데이터셋을 분리해준다.
X_train, X_test, y_train, y_test = \
train_test_split(fe_co_df.Message, 
                 fe_co_df.Target, 
                 stratify=fe_co_df.Target, 
                 test_size=0.2, 
                 random_state=124)

In [12]:
from sklearn.feature_extraction.text import CountVectorizer

c_vct = CountVectorizer()
# fit_transform에는 파이썬list를 전달해야한다.
freq = c_vct.fit_transform(fe_co_df.Message.tolist())
print(freq.toarray())
print(c_vct.vocabulary_)

[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]


In [13]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

# CountVectorizer(): 텍스트를 벡터 형태로 변환하는 단계
# MultinomialNB(): 나이브 베이즈 분류기 모델
m_nb_pipe = Pipeline([('count_vectorizer', CountVectorizer()), ('multinomial_NB', MultinomialNB())])
m_nb_pipe.fit(X_train, y_train)

In [14]:
prediction = m_nb_pipe.predict(X_test)

In [15]:
m_nb_pipe.score(X_test, y_test)

0.7536111111111111

In [16]:
m_nb_pipe.predict([fe_df.iloc[5450].Message])

array([1])