<a href="https://colab.research.google.com/github/Mario-RJunior/classificador-sentimento-NLP/blob/main/classificador_sentimento_RNC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classificador de Sentimento

## 1) Introdução

Neste projeto iremos desenvolver um classificador de sentimentos e para isso utilizaremos uma base contendo dados de usuários do Twitter.

Esta base estará estruturada da seguinte com as seguintes colunas:
- ***sentiment***: 0 e 1 (negativo e positivo).
- ***id***: número de identificação do comentário.
- ***date***: data da postagem.
- ***query***: pesquisa realizada.
-	***user***: usuário da postagem.
- ***text***: texto que queremos classificar.

Agora, iremos importar as bibliotecas que usaremos neste estudo.

## 2) Importação das bibliotecas

In [1]:
# Importando as bibliotecas
import numpy as np
import math
import re
import pandas as pd
from bs4 import BeautifulSoup
from google.colab import drive
import zipfile
import seaborn as sns
import spacy as sp
import string
import random
import matplotlib.pyplot as plt

In [2]:
# Selecionando a versão do tensorflow
%tensorflow_version 2.x

# importando o tensorflow
import tensorflow as tf

# Visualizando a versão do tensorflow
tf.__version__

'2.4.0'

In [3]:
# Importando recursos específicos do tensorflow
from tensorflow.keras import layers
import tensorflow_datasets as tfds

## 3) Pré-processamento dos dados

### 3.1) Carregamento dos arquivos

In [4]:
# Conectando com o Google Drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
# Caminho para acesso ao arquivo
path = '/content/drive/MyDrive/NLP/trainingandtestdata.zip'

# Descompactando o arquivo
zip_object = zipfile.ZipFile(path, mode='r')
zip_object.extractall('./')
zip_object.close()

In [9]:
# Criando colunas para usar no dataframe
cols = ['sentiment', 'id', 'date', 'query', 'user', 'text']

In [15]:
# Criando dataframe
train_data = pd.read_csv('/content/trainingandtestdata/train.csv', 
                        encoding='latin1',
                        names=cols,
                        header=None,
                        engine='python')

In [16]:
# Visualizando o shape do dataframe
train_data.shape

(1600000, 6)

In [17]:
# Visualizando os primeiros registros
train_data.head()

Unnamed: 0,sentiment,id,date,query,user,text
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all...."


In [18]:
# Visualizando os últimos registros
train_data.tail()

Unnamed: 0,sentiment,id,date,query,user,text
1599995,4,2193601966,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,AmandaMarie1028,Just woke up. Having no school is the best fee...
1599996,4,2193601969,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,TheWDBoards,TheWDB.com - Very cool to hear old Walt interv...
1599997,4,2193601991,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,bpbabe,Are you ready for your MoJo Makeover? Ask me f...
1599998,4,2193602064,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,tinydiamondz,Happy 38th Birthday to my boo of alll time!!! ...
1599999,4,2193602129,Tue Jun 16 08:40:50 PDT 2009,NO_QUERY,RyanTrevMorris,happy #charitytuesday @theNSPCC @SparksCharity...


In [19]:
# Visualizando os valores da classe
train_data['sentiment'].unique()

array([0, 4])