This is an experiment using XGBoost and the scikit-learn iris data set. Let's begin by loading up the iris data and looking at the rows and columns of the data.


In [1]:
from sklearn.datasets import load_iris

iris = load_iris()

numSamples, numFeatures = iris.data.shape
print(numSamples)
print(numFeatures)
print(list(iris.target_names))

150
4
['setosa', 'versicolor', 'virginica']


Now we will divide up our data for testing and training. We will allot 20% for testing and 80% for training. We do this to validate our data, ensuring we do not overfit and we can see how our model performs on data it has not seen before. We notice that the target names correspond to numbers:<br>
- 0 corresponds to 'setosa'<br>
- 1 corresponds to 'versicolor'<rbr>
- 2 corresponds to 'virginica'<br>

In [2]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=0)
print(y_test)

[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0]


We will now load it into XGBoost and define the parameters. In this case we are using softmax since this is a classification model, finetuning the parameters will come with experimentation.

In [3]:
import xgboost as xgb

train = xgb.DMatrix(X_train, label=y_train)
test = xgb.DMatrix(X_test, label=y_test)

In [4]:

param = {
    'max_depth': 4,
    'eta': 0.3,
    'objective': 'multi:softmax',
    'num_class': 3} 
epochs = 10 

model = xgb.train(param, train, epochs)

Let's have our model predict on our test data.

In [5]:
predictions = model.predict(test)
print(predictions)

[2. 1. 0. 2. 0. 2. 0. 1. 1. 1. 2. 1. 1. 1. 1. 0. 1. 1. 0. 0. 2. 1. 0. 0.
 2. 0. 0. 1. 1. 0.]


Now we can test the accuracy of our predictions.

In [6]:
from sklearn.metrics import accuracy_score

accuracy_score(y_test, predictions)

1.0