# Naive Bayes - Trabalho

## Questão 1

Implemente um classifacor Naive Bayes para o problema de predizer a qualidade de um carro. Para este fim, utilizaremos um conjunto de dados referente a qualidade de carros, disponível no [UCI](https://archive.ics.uci.edu/ml/datasets/car+evaluation). Este dataset de carros possui as seguintes features e classe:

** Attributos **
1. buying: vhigh, high, med, low
2. maint: vhigh, high, med, low
3. doors: 2, 3, 4, 5, more
4. persons: 2, 4, more
5. lug_boot: small, med, big
6. safety: low, med, high

** Classes **
1. unacc, acc, good, vgood

## Questão 2
Crie uma versão de sua implementação usando as funções disponíveis na biblioteca SciKitLearn para o Naive Bayes ([veja aqui](http://scikit-learn.org/stable/modules/naive_bayes.html)) 

## Questão 3

Analise a acurácia dos dois algoritmos e discuta a sua solução.

In [354]:
import pandas as pd
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn import model_selection as ms
import math
import random
from collections import Counter

### Questão 01 (Solução)

In [346]:
class NaiveBayes:
    
    def __init__(self):
        self._summaries = None
        self._totalSize = 0
        self._classValues = []
    
    def splitDataset(self, dataset, splitRatio):
        trainSize = int(len(dataset) * splitRatio)
        trainSet = []
        copy = list(dataset)
        while len(trainSet) < trainSize:
            index = random.randrange(len(copy))
            trainSet.append(copy.pop(index))
        
        return [trainSet, copy]
    
    def separateByClass(self, dataset):
        separated = {}
        for i in range(len(dataset)):
            vector = dataset[i]
            if (vector[-1] not in separated):
                separated[vector[-1]] = []
                self._classValues.append(vector[-1])
            separated[vector[-1]].append(vector)
        return separated
    

    def calculateProbability(self, x, attribute):
        cnt = Counter(attribute)
        return cnt[x]/len(attribute.index)

    def calculateClassProbabilities(self, inputVector):
        probabilities = {}
        for classValue in self._classValues:
            probabilities[classValue] = 1
            attributes = pd.DataFrame(self._summaries[classValue])
            for i in range(len(inputVector)):
                x = inputVector[i]
                probabilities[classValue] *= self.calculateProbability(x, attributes.iloc[: , i])
        return probabilities
    
    def fit(self, dataset):
        self._totalSize = len(dataset)
        self._summaries = self.separateByClass(dataset)
    
    def predict(self, inputVector):
        probabilities = self.calculateClassProbabilities(inputVector)
        bestLabel, bestProb = None, -1
        for classValue, probability in probabilities.items():
            if bestLabel is None or probability > bestProb:
                bestProb = probability
                bestLabel = classValue
        return bestLabel

    def getPredictions(self, testSet):
        predictions = []
        for i in range(len(testSet)):
            result = self.predict(testSet[i])
            predictions.append(result)
        return predictions
    
    def getAccuracy(self, testSet, predictions):
        correct = 0
        for i in range(len(testSet)):
            (b, m, d, p, l, s, _class) = testSet[i]
            if _class == predictions[i]:
                correct += 1
        return (correct/float(len(testSet))) * 100.0

In [347]:
import csv
 
def loadCsv(filename):
    lines = csv.reader(open(filename, "r"))
    dataset = list(lines)
    return dataset

df = loadCsv('carData.csv')


In [349]:
NB = NaiveBayes()
trainSet, testSet = NB.splitDataset(df, 0.8)
NB.fit(trainSet)
predictions = NB.getPredictions(testSet)
print("Handmade NaiveBayes accuracy: {0}".format(NB.getAccuracy(testSet, predictions)))

Handmade NaiveBayes accuracy: 100.0


In [359]:
def splitXY(dataset):
    X = []
    Y = []
    for i in range(len(dataset)):
        vector = dataset[i]
        Y.append(vector[-1])
        vector.remove(vector[-1])
        X.append(vector)
    return X, Y

In [360]:
NB = GaussianNB()
X, Y = splitXY(df)
X_train, y_train, X_test, y_test = ms.train_test_split(X, Y, test_size=0.2)
NB.fit(X_train, y_train)
print("Sklearn Gaussian NaiveBayes accuracy: {0}".format(NB.score(X_test, y_test)))

  y = column_or_1d(y, warn=True)


ValueError: Found input variables with inconsistent numbers of samples: [1382, 346]