# Árbol de decisión
- A01173359 - Mario Emilio Jiménez Vizcaíno
- A01656159 - Juan Sebastián Rodríguez Galarza
- A01656257 - Kevin Torres Martínez

Queremos predecir la calidad del vino rojo en base a 6 variables independientes no lineales, por lo que el árbol de decisión es el algoritmo más indicado para esta situación porque nuestro conjunto de datos de entrada está etiquetado, además de que la predicción dependerá de varias variables continuas.

Usamos el dataset [Red Wine Quality en Kaggle](https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009)

## Introducción

Para este trabajo vamos a utilizar las dependencias de:
- clojure.data.csv para leer el csv y obtener los datos para el árbol de decisión
- decision-tree.core para el procesamiento y obtener un árbol de decisión. Implementado por Miyoshi Ryota
- clojupyter.display para renderear un svg 

In [22]:
(require '[clojupyter.misc.helper :as helper])
(helper/add-dependencies '[mrcsce/decision-tree "0.1.0"])
(helper/add-dependencies '[org.clojure/data.csv "0.1.2"])
(helper/add-dependencies '[clj-http "3.11.0"])
(require '[clojure.data.csv :as csv])
(require '[decision-tree.core :as dt])
(require '[clojupyter.display :as display])
(require '[clojure.inspector :as inspector])
(require '[clj-http.client :as client])

nil

In [2]:
(def wineQuality(with-open [in-file (clojure.java.io/reader "winequality-red.csv")] 
    (doall (csv/read-csv in-file))))

#'user/wineQuality

In [3]:
(defrecord Wine [fixed_acidity volatile_acidity citrid_acid chlorides sulphates alcohol quality])

user.Wine

In [4]:
(defn vectorToWine [v]
    (Wine.
        (Double. (nth v 0))
        (Double. (nth v 1))
        (Double. (nth v 2))
        (Double. (nth v 4))
        (Double. (nth v 9))
        (Double. (nth v 10))
        (if (< 5.5 (Integer. (nth v 11))) "buena calidad" "mala calidad")))

#'user/vectorToWine

In [5]:
(def data (map #(vectorToWine %) (rest wineQuality)))
(take 5 data)

(#user.Wine{:fixed_acidity 7.4, :volatile_acidity 0.7, :citrid_acid 0.0, :chlorides 0.076, :sulphates 0.56, :alcohol 9.4, :quality "mala calidad"} #user.Wine{:fixed_acidity 7.8, :volatile_acidity 0.88, :citrid_acid 0.0, :chlorides 0.098, :sulphates 0.68, :alcohol 9.8, :quality "mala calidad"} #user.Wine{:fixed_acidity 7.8, :volatile_acidity 0.76, :citrid_acid 0.04, :chlorides 0.092, :sulphates 0.65, :alcohol 9.8, :quality "mala calidad"} #user.Wine{:fixed_acidity 11.2, :volatile_acidity 0.28, :citrid_acid 0.56, :chlorides 0.075, :sulphates 0.58, :alcohol 9.8, :quality "buena calidad"} #user.Wine{:fixed_acidity 7.4, :volatile_acidity 0.7, :citrid_acid 0.0, :chlorides 0.076, :sulphates 0.56, :alcohol 9.4, :quality "mala calidad"})

# División del dataset

En total tenemos 1,599 datos. De los cuales los primeros 1,279 (80% del total) se utilizarán para entrenar el modelo de machine learning. Los 320 (20% del total) datos restantes se utilizarán realizar las pruebas.

In [6]:
(def trainingData (take 1279 data))
(take 5 trainingData)

(#user.Wine{:fixed_acidity 7.4, :volatile_acidity 0.7, :citrid_acid 0.0, :chlorides 0.076, :sulphates 0.56, :alcohol 9.4, :quality "mala calidad"} #user.Wine{:fixed_acidity 7.8, :volatile_acidity 0.88, :citrid_acid 0.0, :chlorides 0.098, :sulphates 0.68, :alcohol 9.8, :quality "mala calidad"} #user.Wine{:fixed_acidity 7.8, :volatile_acidity 0.76, :citrid_acid 0.04, :chlorides 0.092, :sulphates 0.65, :alcohol 9.8, :quality "mala calidad"} #user.Wine{:fixed_acidity 11.2, :volatile_acidity 0.28, :citrid_acid 0.56, :chlorides 0.075, :sulphates 0.58, :alcohol 9.8, :quality "buena calidad"} #user.Wine{:fixed_acidity 7.4, :volatile_acidity 0.7, :citrid_acid 0.0, :chlorides 0.076, :sulphates 0.56, :alcohol 9.4, :quality "mala calidad"})

In [7]:
(def testingData (drop 1279 data))
(take 5 testingData)

(#user.Wine{:fixed_acidity 9.8, :volatile_acidity 0.3, :citrid_acid 0.39, :chlorides 0.062, :sulphates 0.57, :alcohol 11.5, :quality "buena calidad"} #user.Wine{:fixed_acidity 7.1, :volatile_acidity 0.46, :citrid_acid 0.2, :chlorides 0.077, :sulphates 0.64, :alcohol 10.4, :quality "buena calidad"} #user.Wine{:fixed_acidity 7.1, :volatile_acidity 0.46, :citrid_acid 0.2, :chlorides 0.077, :sulphates 0.64, :alcohol 10.4, :quality "buena calidad"} #user.Wine{:fixed_acidity 7.9, :volatile_acidity 0.765, :citrid_acid 0.0, :chlorides 0.084, :sulphates 0.68, :alcohol 10.9, :quality "buena calidad"} #user.Wine{:fixed_acidity 8.7, :volatile_acidity 0.63, :citrid_acid 0.28, :chlorides 0.096, :sulphates 0.63, :alcohol 10.2, :quality "buena calidad"})

In [8]:
(def tree (dt/make-decision-tree trainingData 3 :quality))

#'user/tree

In [9]:
(defn testCollection [tree testColl] 
    (let [predictData (pmap #(dt/predict tree %) testColl)
          realData (map #(:quality %) testColl)
          correctPredictions (map (fn [p r] (if (= p r) 1 0)) predictData realData)]
        (/ (reduce + correctPredictions) (count predictData))))

#'user/testCollection

In [10]:
(* 100.0 (testCollection tree testingData))

71.25

In [18]:
(inspector/inspect-tree tree)

#object[javax.swing.JFrame 0x682eaa4a "javax.swing.JFrame[frame0,0,0,400x600,layout=java.awt.BorderLayout,title=Clojure Inspector,resizable,normal,defaultCloseOperation=HIDE_ON_CLOSE,rootPane=javax.swing.JRootPane[,8,31,384x561,layout=javax.swing.JRootPane$RootLayout,alignmentX=0.0,alignmentY=0.0,border=,flags=16777673,maximumSize=,minimumSize=,preferredSize=],rootPaneCheckingEnabled=true]"]

In [39]:
(defn generateLeaf [node leaf-name]
    (str leaf-name "[label=\"" (:predict node) "\"]"))

(defn generateTree [node root-name]
    (let [left-node (:left node)
          left-name (str root-name "l")
          right-node (:right node)
          right-name (str root-name "r")]
        (clojure.string/join ";" [
            (str root-name "[label=\"" (:feature node) "\"]")
            (str root-name "->" left-name "[label=\"> " (:threshold node) "\"]")
            (str root-name "->" right-name "[label=\"< " (:threshold node) "\"]")
            (if (nil? (:feature left-node))
                (generateLeaf left-node left-name)
                (generateTree left-node left-name))
            (if (nil? (:feature right-node))
                (generateLeaf right-node right-name)
                (generateTree right-node right-name))])))


(def nombre (generateTree tree "r"))

(display/html (:body (client/get (str "https://quickchart.io/graphviz?graph=digraph{" nombre "}"))))