### Naive Bayes Classifier Task
### 문장에서 느껴지는 감정 예측
##### 다중 분류(Multiclass Classification)
- 비대면 심리 상담사로서 메세지를 전달한 환자에 대한 감정 데이터를 수집했다.
- 각 메세지 별로 감정이 표시되어 있다.
- 미래에 동일한 메세지를 보내는 환자에게 어떤 심리 치료가 적합할 수 있는지 알아보기 위한 모델을 구축한다.

In [1]:
import pandas as pd
# feel_df = pd.read_csv('./datasets/feeling.csv', sep=';')
feel_df = pd.read_csv('./datasets/feeling.csv')
feel_df

Unnamed: 0,message;feeling
0,im feeling quite sad and sorry for myself but ...
1,i feel like i am still looking at a blank canv...
2,i feel like a faithful servant;love
3,i am just feeling cranky and blue;anger
4,i can have for a treat or if i am feeling fest...
...,...
17995,i just had a very brief time in the beanbag an...
17996,i am now turning and i feel pathetic that i am...
17997,i feel strong and good overall;joy
17998,i feel like this was such a rude comment and i...


In [2]:
feel_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18000 entries, 0 to 17999
Data columns (total 1 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   message;feeling  18000 non-null  object
dtypes: object(1)
memory usage: 140.8+ KB


In [4]:
# 결측치 확인
feel_df.isna().sum()

message;feeling    0
dtype: int64

In [7]:
# 문자열이 어떤 식으로 들어 있는지 확인
feel_df.iloc[1]['message;feeling']

'i feel like i am still looking at a blank canvas blank pieces of paper;sadness'

In [11]:
# 문자열을 세미콜론(;) 기준으로 분리 후 감정부분 추출해서 새 feature에 담아준다.
feel_df['feeling'] = feel_df['message;feeling'].str.split(';').str[-1]
feel_df

Unnamed: 0,message;feeling,feeling
0,im feeling quite sad and sorry for myself but ...,sadness
1,i feel like i am still looking at a blank canv...,sadness
2,i feel like a faithful servant;love,love
3,i am just feeling cranky and blue;anger,anger
4,i can have for a treat or if i am feeling fest...,joy
...,...,...
17995,i just had a very brief time in the beanbag an...,sadness
17996,i am now turning and i feel pathetic that i am...,sadness
17997,i feel strong and good overall;joy,joy
17998,i feel like this was such a rude comment and i...,anger


In [17]:
# 기존 message;feeling에 세미콜론기준 앞부분을 담아준다
feel_df['message;feeling'] = feel_df['message;feeling'].str.split(';').str[0]
feel_df

Unnamed: 0,message;feeling,feeling
0,im feeling quite sad and sorry for myself but ...,sadness
1,i feel like i am still looking at a blank canv...,sadness
2,i feel like a faithful servant,love
3,i am just feeling cranky and blue,anger
4,i can have for a treat or if i am feeling festive,joy
...,...,...
17995,i just had a very brief time in the beanbag an...,sadness
17996,i am now turning and i feel pathetic that i am...,sadness
17997,i feel strong and good overall,joy
17998,i feel like this was such a rude comment and i...,anger


In [18]:
# 슬라이싱한 문자열이 잘 담겼는지 확인
feel_df.iloc[1]['message;feeling']

'i feel like i am still looking at a blank canvas blank pieces of paper'

In [19]:
# feature명 변경
feel_df.rename(columns={'message;feeling': 'message'}, inplace=True)
feel_df

Unnamed: 0,message,feeling
0,im feeling quite sad and sorry for myself but ...,sadness
1,i feel like i am still looking at a blank canv...,sadness
2,i feel like a faithful servant,love
3,i am just feeling cranky and blue,anger
4,i can have for a treat or if i am feeling festive,joy
...,...,...
17995,i just had a very brief time in the beanbag an...,sadness
17996,i am now turning and i feel pathetic that i am...,sadness
17997,i feel strong and good overall,joy
17998,i feel like this was such a rude comment and i...,anger


In [20]:
from sklearn.preprocessing import LabelEncoder

# feeling 을 인코딩하여 Target feature를 생성 후 담아준다
feel_encoder = LabelEncoder()
targets = feel_encoder.fit_transform(feel_df.feeling)
feel_df['target'] = targets
feel_df

Unnamed: 0,message,feeling,Target
0,im feeling quite sad and sorry for myself but ...,sadness,4
1,i feel like i am still looking at a blank canv...,sadness,4
2,i feel like a faithful servant,love,3
3,i am just feeling cranky and blue,anger,0
4,i can have for a treat or if i am feeling festive,joy,2
...,...,...,...
17995,i just had a very brief time in the beanbag an...,sadness,4
17996,i am now turning and i feel pathetic that i am...,sadness,4
17997,i feel strong and good overall,joy,2
17998,i feel like this was such a rude comment and i...,anger,0


In [29]:
# 인코딩 된 feeling의 원본 확인
feel_encoder.classes_

array(['anger', 'fear', 'joy', 'love', 'sadness', 'surprise'],
      dtype=object)

In [21]:
# 이제 필요없는 feeling feature 삭제
feel_df = feel_df.drop(labels=['feeling'], axis=1)
feel_df

Unnamed: 0,message,Target
0,im feeling quite sad and sorry for myself but ...,4
1,i feel like i am still looking at a blank canv...,4
2,i feel like a faithful servant,3
3,i am just feeling cranky and blue,0
4,i can have for a treat or if i am feeling festive,2
...,...,...
17995,i just had a very brief time in the beanbag an...,4
17996,i am now turning and i feel pathetic that i am...,4
17997,i feel strong and good overall,2
17998,i feel like this was such a rude comment and i...,0


In [23]:
from sklearn.model_selection import train_test_split

# 데이터 분리
X_train, X_test, y_train, y_test = \
train_test_split(feel_df.message, 
                 feel_df.target, 
                 stratify=feel_df.target, 
                 test_size=0.2, 
                 random_state=124)

In [24]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

# 매번 작업 시, Vectorizer를 사용해야 하지만, 
# 파이프라인으로 구축 시, 직접 할 필요 없다.
m_nb_pipe = Pipeline([('count_vectorizer', CountVectorizer()), ('multinomial_NB', MultinomialNB())])
m_nb_pipe.fit(X_train, y_train)

In [25]:
# 예측 수행
prediction = m_nb_pipe.predict(X_test)

In [26]:
# score를 통해 미리 예측한 데이터를 사용하지 않고 바로 정확도 산출
m_nb_pipe.score(X_test, y_test)

0.7536111111111111

In [None]:
# message를 하나 사용하여 테스트
# message = feel_df.iloc[3].message
message = "I want go home because I'm tired"
print(message)
print(feel_encoder.classes_[m_nb_pipe.predict([message])])

ξ