# Clojure Decision Tree
- A01173359 - Mario Emilio Jiménez Vizcaíno
- A01656159 - Juan Sebastián Rodríguez Galarza
- A01656257 - Kevin Torres Martínez

Queremos predecir la calidad del vino rojo en base a 6 variables independientes no lineales, por lo que el árbol de decisión es el algoritmo más indicado para esta situación porque nuestro conjunto de datos de entrada está etiquetado, además de que la predicción dependerá de varias variables continuas.

Usamos el dataset [Red Wine Quality en Kaggle](https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009)

# Columnas

In [1]:
(require '[cemerick.pomegranate :refer [add-dependencies]])
(add-dependencies :coordinates '[[org.clojure/data.csv "0.1.2"]])
(require '[clojure.data.csv :as csv])

nil

In [3]:
(def wineQuality(with-open [in-file (clojure.java.io/reader "winequality-red.csv")] 
    (doall (csv/read-csv in-file))))

#'user/wineQuality

In [4]:
(defrecord Wine [fixed_acidity volatile_acidity citrid_acid chlorides sulphates alcohol quality])

user.Wine

In [6]:
(defn vectorToWine [v]
    (Wine.
        (Double. (nth v 0))
        (Double. (nth v 1))
        (Double. (nth v 2))
        (Double. (nth v 4))
        (Double. (nth v 9))
        (Double. (nth v 10))
        (Integer. (nth v 11))))

#'user/vectorToWine

In [8]:
(def data (map #(vectorToWine %) (rest wineQuality)))
(take 5 data)

(#user.Wine{:fixed_acidity 7.4, :volatile_acidity 0.7, :citrid_acid 0.0, :chlorides 0.076, :sulphates 0.56, :alcohol 9.4, :quality 5} #user.Wine{:fixed_acidity 7.8, :volatile_acidity 0.88, :citrid_acid 0.0, :chlorides 0.098, :sulphates 0.68, :alcohol 9.8, :quality 5} #user.Wine{:fixed_acidity 7.8, :volatile_acidity 0.76, :citrid_acid 0.04, :chlorides 0.092, :sulphates 0.65, :alcohol 9.8, :quality 5} #user.Wine{:fixed_acidity 11.2, :volatile_acidity 0.28, :citrid_acid 0.56, :chlorides 0.075, :sulphates 0.58, :alcohol 9.8, :quality 6} #user.Wine{:fixed_acidity 7.4, :volatile_acidity 0.7, :citrid_acid 0.0, :chlorides 0.076, :sulphates 0.56, :alcohol 9.4, :quality 5})

# División del dataset

En total tenemos 1,599 datos. De los cuales los primeros 1,279 (80% del total) se utilizarán para entrenar el modelo de machine learning. Los 320 (20% del total) datos restantes se utilizarán realizar las pruebas.

In [10]:
(def trainingData (take 1279 data))
(take 5 trainingData)

(#user.Wine{:fixed_acidity 7.4, :volatile_acidity 0.7, :citrid_acid 0.0, :chlorides 0.076, :sulphates 0.56, :alcohol 9.4, :quality 5} #user.Wine{:fixed_acidity 7.8, :volatile_acidity 0.88, :citrid_acid 0.0, :chlorides 0.098, :sulphates 0.68, :alcohol 9.8, :quality 5} #user.Wine{:fixed_acidity 7.8, :volatile_acidity 0.76, :citrid_acid 0.04, :chlorides 0.092, :sulphates 0.65, :alcohol 9.8, :quality 5} #user.Wine{:fixed_acidity 11.2, :volatile_acidity 0.28, :citrid_acid 0.56, :chlorides 0.075, :sulphates 0.58, :alcohol 9.8, :quality 6} #user.Wine{:fixed_acidity 7.4, :volatile_acidity 0.7, :citrid_acid 0.0, :chlorides 0.076, :sulphates 0.56, :alcohol 9.4, :quality 5})

In [11]:
(def testingData (drop 1279 data))
(take 5 testingData)

(#user.Wine{:fixed_acidity 9.8, :volatile_acidity 0.3, :citrid_acid 0.39, :chlorides 0.062, :sulphates 0.57, :alcohol 11.5, :quality 7} #user.Wine{:fixed_acidity 7.1, :volatile_acidity 0.46, :citrid_acid 0.2, :chlorides 0.077, :sulphates 0.64, :alcohol 10.4, :quality 6} #user.Wine{:fixed_acidity 7.1, :volatile_acidity 0.46, :citrid_acid 0.2, :chlorides 0.077, :sulphates 0.64, :alcohol 10.4, :quality 6} #user.Wine{:fixed_acidity 7.9, :volatile_acidity 0.765, :citrid_acid 0.0, :chlorides 0.084, :sulphates 0.68, :alcohol 10.9, :quality 6} #user.Wine{:fixed_acidity 8.7, :volatile_acidity 0.63, :citrid_acid 0.28, :chlorides 0.096, :sulphates 0.63, :alcohol 10.2, :quality 6})