<h2> Decision Forest Example</h2>
In this example, you will build a Decision Forest model based on the training dataset and apply the model to the test dataset to evaluate the performance of the model. You will use the load_example_data library call to load the housing_train and housing_test data into your database.

<i>NOTE: You must have a connection to Teradata Vantage that has the Teradata analytic functions installed.</i>

In [1]:
# Replace your cluster details for user, passwd and host
from teradataml.context.context import *
from teradataml.dataframe.dataframe import DataFrame
from teradataml.data.load_example_data import load_example_data
from teradataml.analytics.sqle.DecisionForestPredict import DecisionForestPredict
from teradataml.analytics.mle.DecisionForest import DecisionForest

user = "xxxxx"
passwd = "xxxxx"
host = "xxxxx"
td_context = create_context(host = host, username = user, password = passwd)

load_example_data("decisionforestpredict", ["housing_train","housing_test"])
formula = "homestyle ~ driveway + recroom + fullbase + gashw + airco + prefarea + price + lotsize + bedrooms + bathrms + stories + garagepl"
housing_train = DataFrame.from_table("housing_train")
rft_model = DecisionForest(data=housing_train,
                             formula = formula,
                             tree_type="classification",
                             ntree=50,
                             tree_size=100,
                             nodesize=1,
                             variance=0.0,
                             max_depth=12,
                             maxnum_categorical=20,
                             mtry=3,
                             mtry_seed=100,
                             seed=100
                             )



  return _compile(pattern, flags).split(string, maxsplit)


In [2]:
rft_model

############ STDOUT Output ############

                                             message
0   Each tree will contain approximately 100 points.
1                 Computing 50 classification trees.
3                           Decision forest created.
4                  Poisson sampling parameter: 0.203
5                   Query finished in 1.358 seconds.
6                 Each worker is computing 50 trees.


############ predictive_model Output ############

     worker_ip task_index  tree_num                                               tree
0  172.24.0.11          1        17  {"responseCounts_":{"Eclectic":64,"bungalow":1...
1  172.24.0.11          1        19  {"responseCounts_":{"Eclectic":55,"bungalow":1...
2  172.24.0.11          1        20  {"responseCounts_":{"Eclectic":61,"bungalow":9...
3  172.24.0.11          1        21  {"responseCounts_":{"Eclectic":46,"bungalow":1...
4  172.24.0.11          1        23  {"responseCounts_":{"Eclectic":74,"bungalow":1...
5  172.24.0.1

In [3]:
housing_test = DataFrame.from_table("housing_test")

In [4]:
decision_forest_predict_out = DecisionForestPredict(object = rft_model,
                                                    newdata = housing_test,
                                                    id_column = "sn",
                                                    detailed = False,
                                                    terms = ["homestyle"]
                                                    )

In [5]:
decision_forest_predict_out

############ STDOUT Output ############

  homestyle   sn prediction  confidence_lower  confidence_upper
0  Eclectic  440   Eclectic              0.94              0.94
1  Eclectic  255   Eclectic              0.96              0.96
2   Classic  260    Classic              0.86              0.86
3  Eclectic  301   Eclectic              0.94              0.94
4   Classic  459    Classic              0.82              0.82
5  Eclectic  469   Eclectic              0.92              0.92
6  Eclectic   38   Eclectic              0.84              0.84
7  Eclectic  364   Eclectic              0.96              0.96
8   Classic   13    Classic              0.92              0.92
9   Classic  463    Classic              0.82              0.82

#### For more information on the Teradata analytic functions, refer to the [Teradata Documentation](https://docs.teradata.com/) and search for Teradata Python Package.

Copyright 2019 Teradata. All rights reserved.