# The prequential evaluation delayed- Time stamp data
The prequential evaluation delayed is designed specifically for stream settings, in the sense that each sample serves two purposes, and that samples are analysed sequentially, in order of arrival, and are used to update the model only when their label are available, given their timestamps (arrival and available times).

This method consists of using each sample to test the model, which means to make a predictions, and then the same sample is used to train the model (partial fit) after its label is available after a certain delay. This way the model is always tested on samples that it hasn’t seen yet and updated on samples that have their labels available.

In [36]:
%matplotlib notebook
import numpy as np
import pandas as pd
from skmultiflow.data import TemporalDataStream
from skmultiflow.trees import HoeffdingTreeClassifier
from skmultiflow.evaluation import EvaluatePrequentialDelayed

In [38]:
# Columns used to get the data, label and time from iris_timestamp dataset
DATA_COLUMNS = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
LABEL_COLUMN = "label"
TIME_COLUMN = "timestamp"

In [39]:
# Read a csv with stream data
data = pd.read_csv("D:/Streaming data set/streaming-datasets-master/iris_timestamp.csv")

In [40]:
# Convert time column to datetime
data[TIME_COLUMN] = pd.to_datetime(data[TIME_COLUMN])

In [41]:
# Sort data by time
data = data.sort_values(by=TIME_COLUMN)

In [42]:
# Get X, y and time
X = data[DATA_COLUMNS].values
y = data[LABEL_COLUMN].values
time = data[TIME_COLUMN].values

In [43]:
# Set a delay of 1 day
delay_time = np.timedelta64(1, "D")

In [44]:
# Set the stream
stream = TemporalDataStream(X, y, time, sample_delay=delay_time, ordered=False)

In [45]:
# Set the model
ht = HoeffdingTreeClassifier()

In [46]:
# Set the evaluator
evaluator = EvaluatePrequentialDelayed(batch_size=1,
                                pretrain_size=X.shape[0]//2,
                                max_samples=X.shape[0],
                                output_file='results_delay.csv',
                                metrics=['accuracy', 'recall', 'precision', 'f1', 'kappa'])

In [47]:
# Run evaluation
evaluator.evaluate(stream=stream, model=ht, model_names=['HT'])

Prequential Evaluation Delayed
Evaluating 1 target(s).
Pre-training on 150 sample(s).
Evaluating...
 ###################- [95%] [2.03s]Processed samples: 300
Mean performance:
HT - Accuracy     : 0.5882
HT - Kappa        : 0.3824
HT - Precision: 0.5938
HT - Recall: 0.5882
HT - F1 score: 0.5465


[HoeffdingTreeClassifier(binary_split=False, grace_period=200,
                         leaf_prediction='nba', max_byte_size=33554432,
                         memory_estimate_period=1000000, nb_threshold=0,
                         no_preprune=False, nominal_attributes=None,
                         remove_poor_atts=False, split_confidence=1e-07,
                         split_criterion='info_gain', stop_mem_management=False,
                         tie_threshold=0.05)]