<center> <h1 style="background-color:seagreen; color:white" >Classificação com Rede Neural Artificial (Multi Layer Perceptron) com Spark usando PySpark</h1> 


<center> <h2 style="background-color:DarkKhaki; color:white" >Inicializando o PySpark</h2>

In [2]:
import findspark
import pyspark
from pyspark.sql import SparkSession

# Faz a Interafce entre o Spark e o Jupyter Notebook
findspark.init()

# Inicializando uma Sessão no Spark
spark = SparkSession.builder.appName("NaiveBayes").getOrCreate()

### Hiper Parâmetros

> + **layers**: camadas
> + **seed**: semente
> + **stepSize**: o quanto os pesos serão atualizados em cada iteração (learning
rate). (padrão: 0.03)


---

<center> <h2 style="background-color:DarkKhaki; color:white" >Aplicando Multi Layer Perceptron no DataSet Íris</h2>

### Carregando o Cunjunto de Dados Churn

In [3]:
iris = spark.read.csv("../Material_do_Curso/iris.csv",
                             header=True, inferSchema=True, sep=",")
print(f"Quantidade de Registros do Dataset: {iris.count()}")
iris.show(5, truncate=False)

                                                                                

Quantidade de Registros do Dataset: 150
+-----------+----------+-----------+----------+-----------+
|sepallength|sepalwidth|petallength|petalwidth|class      |
+-----------+----------+-----------+----------+-----------+
|5.1        |3.5       |1.4        |0.2       |Iris-setosa|
|4.9        |3.0       |1.4        |0.2       |Iris-setosa|
|4.7        |3.2       |1.3        |0.2       |Iris-setosa|
|4.6        |3.1       |1.5        |0.2       |Iris-setosa|
|5.0        |3.6       |1.4        |0.2       |Iris-setosa|
+-----------+----------+-----------+----------+-----------+
only showing top 5 rows



### Criando uma vetorização com VectorAssembler

In [4]:
from pyspark.ml.feature import VectorAssembler

In [5]:
asb = VectorAssembler(inputCols=["sepallength", "sepalwidth",
                                 "petallength", "petalwidth"],
                      outputCol="independente")

iris_asb = asb.transform(iris)
iris_asb.show(5)

[Stage 6:>                                                          (0 + 1) / 1]

+-----------+----------+-----------+----------+-----------+-----------------+
|sepallength|sepalwidth|petallength|petalwidth|      class|     independente|
+-----------+----------+-----------+----------+-----------+-----------------+
|        5.1|       3.5|        1.4|       0.2|Iris-setosa|[5.1,3.5,1.4,0.2]|
|        4.9|       3.0|        1.4|       0.2|Iris-setosa|[4.9,3.0,1.4,0.2]|
|        4.7|       3.2|        1.3|       0.2|Iris-setosa|[4.7,3.2,1.3,0.2]|
|        4.6|       3.1|        1.5|       0.2|Iris-setosa|[4.6,3.1,1.5,0.2]|
|        5.0|       3.6|        1.4|       0.2|Iris-setosa|[5.0,3.6,1.4,0.2]|
+-----------+----------+-----------+----------+-----------+-----------------+
only showing top 5 rows



                                                                                

### Transformando o rótulo **class** com o StringIndex

In [6]:
from pyspark.ml.feature import StringIndexer

In [7]:
ind = StringIndexer(inputCol="class", outputCol="dependente")
iris_asb = ind.fit(iris_asb).transform(iris_asb)
iris_asb.show(5)

                                                                                

+-----------+----------+-----------+----------+-----------+-----------------+----------+
|sepallength|sepalwidth|petallength|petalwidth|      class|     independente|dependente|
+-----------+----------+-----------+----------+-----------+-----------------+----------+
|        5.1|       3.5|        1.4|       0.2|Iris-setosa|[5.1,3.5,1.4,0.2]|       0.0|
|        4.9|       3.0|        1.4|       0.2|Iris-setosa|[4.9,3.0,1.4,0.2]|       0.0|
|        4.7|       3.2|        1.3|       0.2|Iris-setosa|[4.7,3.2,1.3,0.2]|       0.0|
|        4.6|       3.1|        1.5|       0.2|Iris-setosa|[4.6,3.1,1.5,0.2]|       0.0|
|        5.0|       3.6|        1.4|       0.2|Iris-setosa|[5.0,3.6,1.4,0.2]|       0.0|
+-----------+----------+-----------+----------+-----------+-----------------+----------+
only showing top 5 rows



### Escolha das variáveis independentes e dependente para treinamento do modelo dados do dataset.

### Importação do Módulo do PySpark Para o Pré-Processamento dos Dados

### Separando os Dados entre conjunto de Treino e Teste

O Conjunto de dados serão separados entre treino e teste, sendo que **80%** será para treinar o modelo e **20%** para testar o modelo.

In [9]:
iris_train, iris_test = iris_asb.randomSplit([0.7, 0.3])
print(f"Quantidade de Dados de Treino: {iris_train.count()}")
print(f"Quantidade de Dados de Teste: {iris_test.count()}")

                                                                                

Quantidade de Dados de Treino: 99
Quantidade de Dados de Teste: 51


---

<center> <h1 style="background-color:DarkKhaki; color:white" >Importação do Módulo do PySpark Para Criação do Modelo de Rede Neural Artificial </h1>

In [10]:
from pyspark.ml.classification import MultilayerPerceptronClassifier

#### Instanciando Objeto e criando o Modelo

In [11]:
# Instanciando o objeto LinearRegression
obj_mlp = MultilayerPerceptronClassifier(maxIter=10, layers=[4, 5, 4, 3],
                                        featuresCol="independente", labelCol="dependente")

# Criando o Modelo
model_mlp = obj_mlp.fit(iris_train)

                                                                                

#### Realizando Presição com o Modelo Criado

In [12]:
previsao_test = model_mlp.transform(iris_test)
previsao_test.select("dependente", "prediction").show(5, truncate=True)

[Stage 36:>                                                         (0 + 1) / 1]

+----------+----------+
|dependente|prediction|
+----------+----------+
|       0.0|       0.0|
|       0.0|       0.0|
|       0.0|       0.0|
|       0.0|       0.0|
|       0.0|       0.0|
+----------+----------+
only showing top 5 rows



                                                                                

### Avaliando o Modelo

In [13]:
from pyspark.ml.evaluation import MulticlassClassificationEvaluator

In [14]:
avaliar = MulticlassClassificationEvaluator(labelCol="dependente",
                                            predictionCol="prediction",
                                            metricName="accuracy")

acuracia = avaliar.evaluate(previsao_test)

print(f"Acurácia: {acuracia}")

[Stage 37:>                                                         (0 + 1) / 1]

Acurácia: 0.6666666666666666


[Stage 38:>                                                         (0 + 1) / 1]                                                                                

### Pegando os valores dos hiper parâmetros do modelo

In [15]:
print(model_mlp.getMaxIter())
print(model_mlp.getLayers())
print(model_mlp.getStepSize())

10
[4, 5, 4, 3]
0.03


---

### Modificando os Hiper Parâmetros do Modelo

In [27]:
parunico = {model_mlp.maxIter: 1000}

In [28]:
model_mlp = obj_mlp.fit(iris_train, parunico)

23/03/05 10:22:58 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.5
23/03/05 10:22:59 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.5
23/03/05 10:23:00 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.5
23/03/05 10:23:00 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.25
23/03/05 10:23:00 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.125
23/03/05 10:23:00 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.0625
23/03/05 10:23:00 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.03125
23/03/05 10:23:00 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.0234375
23/03/05

In [29]:
print(model_mlp.getMaxIter())
print(model_mlp.getLayers())
print(model_mlp.getStepSize())

1000
[4, 5, 4, 3]
0.03


In [30]:
previsao_test = model_mlp.transform(iris_test)
previsao_test.select("dependente", "prediction").show(5, truncate=True)

+----------+----------+
|dependente|prediction|
+----------+----------+
|       0.0|       0.0|
|       0.0|       0.0|
|       0.0|       0.0|
|       0.0|       0.0|
|       0.0|       0.0|
+----------+----------+
only showing top 5 rows



In [31]:
acuracia = avaliar.evaluate(previsao_test)

print(f"Acurácia: {acuracia}")

Acurácia: 0.9411764705882353
