# Naive Bayes - Trabalho

## Questão 1

Implemente um classifador Naive Bayes para o problema de predizer a qualidade de um carro. Para este fim, utilizaremos um conjunto de dados referente a qualidade de carros, disponível no [UCI](https://archive.ics.uci.edu/ml/datasets/car+evaluation). Este dataset de carros possui as seguintes features e classe:

** Attributos **
1. buying: vhigh, high, med, low
2. maint: vhigh, high, med, low
3. doors: 2, 3, 4, 5, more
4. persons: 2, 4, more
5. lug_boot: small, med, big
6. safety: low, med, high

** Classes **
1. unacc, acc, good, vgood

## Questão 2
Crie uma versão de sua implementação usando as funções disponíveis na biblioteca SciKitLearn para o Naive Bayes ([veja aqui](http://scikit-learn.org/stable/modules/naive_bayes.html)) 

## Questão 3

Analise a acurácia dos dois algoritmos e discuta a sua solução.

In [2]:
import numpy as np
import pandas as pd
from sklearn.naive_bayes import MultinomialNB, GaussianNB
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report

In [3]:
class NaiveBayes:
    def __init__(self):
        self.lEncoder = LabelEncoder()
        self.X = None; self.y = None
        self.classProb = None; self.likeTable = {}
    
    # Separa o dataset pelas classes
    def separateByClass(self):
        separated = {}
        for i in range(len(self.y)):
            if (self.y[i] not in separated):
                separated[self.y[i]] = []
            separated[self.y[i]].append(self.X[i])
        return separated
    
    # Contrói a tabela de verossimilhança
    def makeLikeTable(self):
        separateClass = self.separateByClass()
        classSizes = [len(separateClass[i]) for i in separateClass.keys()] 
        self.classProb = np.array(classSizes) / sum(classSizes)
        
        self.likeTable = {}
        for label in separateClass.keys():
            auxiliar = np.column_stack(separateClass[label])
            for attribute,idx in zip(auxiliar, range(len(auxiliar))):
                counts = np.asarray(np.unique(attribute, return_counts=True)).T
                for i in range(4):
                    self.likeTable[(label, idx, i)] = 0
                for count_it in counts:
                    self.likeTable[(label, idx, count_it[0])] = count_it[1] / len(separateClass[label])

    # Calcula a probabilidade de cada entrada, através da MLE
    def calculateProbability(self, inputVector):
        separateClass = self.separateByClass()
        
        probabilities = {}
        for label,_ in separateClass.items():
            probabilities[label] = self.classProb[label]
            for i in range(len(inputVector)):
                probabilities[label] *= self.likeTable[label, i, inputVector[i]]
                
        return probabilities
    
    # Treinamento, com criação da tabela de verossimilhança
    def fit(self, X_train, y_train):
        self.X = X_train
        self.y = y_train
        
        self.makeLikeTable()
    
    # Calcula as probabilidades preditas
    def predict(self, inputArray):
        ''' Return a list of predictions for each row in inputArray correspondent to
        the label of the class with the maximum probability '''
        predictions = []
        for row in inputArray:
            probabilities = self.calculateProbability(row)
            predictions.append(max(probabilities, key=probabilities.get))
        return predictions

In [9]:
dataset = pd.read_csv("carData.csv")
dataset.head()

Unnamed: 0,vhigh,vhigh.1,2,2.1,small,low,unacc
0,vhigh,vhigh,2,2,small,med,unacc
1,vhigh,vhigh,2,2,small,high,unacc
2,vhigh,vhigh,2,2,med,low,unacc
3,vhigh,vhigh,2,2,med,med,unacc
4,vhigh,vhigh,2,2,med,high,unacc


In [18]:
for i in range(0, dataset.shape[1]):
    dataset.iloc[:,i] = LabelEncoder().fit_transform(dataset.iloc[:,i])

# Split do dataset em 80% para treino e 20% para teste
X_train, X_test, y_train, y_test = train_test_split(dataset.iloc[:,:-1], dataset.iloc[:,-1], test_size=0.2)

print(len(dataset))
print(len(X_train))
print(len(X_test))

1727
1381
346


In [35]:
from sklearn.naive_bayes import MultinomialNB
#Implementação do sklearn
mult = MultinomialNB()
mult.fit(X_train.values,y_train.values)



MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

In [36]:
prediction = mult.predict(X_test.values)
print(prediction)

[2 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2]


In [37]:
from sklearn.metrics import accuracy_score, classification_report

In [38]:
accuracy_mult = accuracy_score(y_true = y_test,y_pred = prediction)
print(accuracy_mult)

0.658959537572


In [39]:
# Utilização da classe Naive Bayes criada
myNaive = NaiveBayes()
myNaive.fit(X_train.values, y_train.values)
y_pred = myNaive.predict(X_test.values)

print("Naive Bayes:")
print("Total Accuracy: {}%".format(accuracy_score(y_true=y_test, y_pred=y_pred)))

print("\nClassification Report:")
print(classification_report(y_true=y_test, y_pred=y_pred, target_names=["unacc", "acc", "good", "vgood"]))

Naive Bayes:
Total Accuracy: 0.7369942196531792%

Classification Report:
             precision    recall  f1-score   support

      unacc       0.49      0.93      0.64        85
        acc       0.47      0.53      0.50        17
       good       1.00      0.73      0.85       228
      vgood       0.00      0.00      0.00        16

avg / total       0.80      0.74      0.74       346



  'precision', 'predicted', average, warn_for)
