# The prequential evaluation delayed- Time stamp data

The prequential evaluation delayed is designed specifically for stream settings, in the sense that each sample serves two purposes, and that samples are analysed sequentially, in order of arrival, and are used to update the model only when their label are available, given their timestamps (arrival and available times).

This method consists of using each sample to test the model, which means to make a predictions, and then the same sample is used to train the model (partial fit) after its label is available after a certain delay. This way the model is always tested on samples that it hasn’t seen yet and updated on samples that have their labels available.

class **skmultiflow.evaluation.EvaluatePrequentialDelayed(n_wait=200, max_samples=100000, batch_size=1, pretrain_size=200, max_time=inf, metrics=None, output_file=None, show_plot=False, restart_stream=True, data_points_for_classification=False)**

### The first example demonstrates how to evaluate one model

In [1]:
import numpy as np
import pandas as pd
from skmultiflow.data import TemporalDataStream
from skmultiflow.trees import HoeffdingTreeClassifier
from skmultiflow.evaluation import EvaluatePrequentialDelayed

In [2]:
# Columns used to get the data, label and time from iris_timestamp dataset
DATA_COLUMNS = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
LABEL_COLUMN = "label"
TIME_COLUMN = "timestamp"

In [3]:
# Read a csv with stream data
data = pd.read_csv("D:/Streaming data set/streaming-datasets-master/iris_timestamp.csv")

In [4]:
# Convert time column to datetime
data[TIME_COLUMN] = pd.to_datetime(data[TIME_COLUMN])

In [5]:
# Sort data by time
data = data.sort_values(by=TIME_COLUMN)

In [6]:
# Get X, y and time
X = data[DATA_COLUMNS].values
y = data[LABEL_COLUMN].values
time = data[TIME_COLUMN].values

In [7]:
# Set a delay of 1 day
delay_time = np.timedelta64(1, "D")

In [8]:
# Set the stream
stream = TemporalDataStream(X, y, time, sample_delay=delay_time, ordered=False)

In [9]:
# Set the model
ht = HoeffdingTreeClassifier()

In [10]:
# Set the evaluator
evaluator = EvaluatePrequentialDelayed(batch_size=1,
                                pretrain_size=X.shape[0]//2,
                                max_samples=X.shape[0],
                                output_file='results_delay.csv',
                                metrics=['accuracy', 'recall', 'precision', 'f1', 'kappa'])

In [11]:
# Run evaluation
evaluator.evaluate(stream=stream, model=ht, model_names=['HT'])

Prequential Evaluation Delayed
Evaluating 1 target(s).
Pre-training on 150 sample(s).
Evaluating...
 ###################- [95%] [1.21s]Processed samples: 300
Mean performance:
HT - Accuracy     : 0.5882
HT - Kappa        : 0.3824
HT - Precision: 0.5938
HT - Recall: 0.5882
HT - F1 score: 0.5465


[HoeffdingTreeClassifier(binary_split=False, grace_period=200,
                         leaf_prediction='nba', max_byte_size=33554432,
                         memory_estimate_period=1000000, nb_threshold=0,
                         no_preprune=False, nominal_attributes=None,
                         remove_poor_atts=False, split_confidence=1e-07,
                         split_criterion='info_gain', stop_mem_management=False,
                         tie_threshold=0.05)]

### The second example demonstrates how to compare two models-(Hoeffding Tree & Naive Bayes)

In [12]:
from skmultiflow.bayes import NaiveBayes

In [13]:
# Set a delay of 30 minutes
delay_time = np.timedelta64(30, "m")
# Set the stream
stream = TemporalDataStream(X, y, time, sample_delay=delay_time, ordered=False)
# Set the models
ht = HoeffdingTreeClassifier()
nb = NaiveBayes()

evaluator = EvaluatePrequentialDelayed(batch_size=1,
                                pretrain_size=X.shape[0]//2,
                                max_samples=X.shape[0],
                                output_file='results_delay.csv',
                                metrics=['accuracy', 'recall', 'precision', 'f1', 'kappa'])

# Run evaluation
evaluator.evaluate(stream=stream, model=[ht, nb], model_names=['HT', 'NB'])

Prequential Evaluation Delayed
Evaluating 1 target(s).
Pre-training on 150 sample(s).
Evaluating...
Processed samples: 300
Mean performance:
HT - Accuracy     : 0.5926
HT - Kappa        : 0.3889
HT - Precision: 0.5931
HT - Recall: 0.5926
HT - F1 score: 0.5487
NB - Accuracy     : 0.5926
NB - Kappa        : 0.3889
NB - Precision: 0.5931
NB - Recall: 0.5926
NB - F1 score: 0.5487


[HoeffdingTreeClassifier(binary_split=False, grace_period=200,
                         leaf_prediction='nba', max_byte_size=33554432,
                         memory_estimate_period=1000000, nb_threshold=0,
                         no_preprune=False, nominal_attributes=None,
                         remove_poor_atts=False, split_confidence=1e-07,
                         split_criterion='info_gain', stop_mem_management=False,
                         tie_threshold=0.05),
 NaiveBayes(nominal_attributes=None)]

### Very Fast Decision Rules classifier

In [29]:
from skmultiflow.rules import VeryFastDecisionRulesClassifier

dr = VeryFastDecisionRulesClassifier()

No Nominal attributes have been defined, will consider all attributes as numerical


In [30]:
evaluator = EvaluatePrequentialDelayed(batch_size=1,
                                pretrain_size=X.shape[0]//2,
                                max_samples=X.shape[0],
                                output_file='results_delay.csv',
                                metrics=['accuracy', 'recall', 'precision', 'f1', 'kappa'])

# Run evaluation
evaluator.evaluate(stream=stream, model=[dr], model_names=['DR'])

Prequential Evaluation Delayed
Evaluating 1 target(s).
Pre-training on 150 sample(s).
Evaluating...
Processed samples: 300
Mean performance:
DR - Accuracy     : 0.5926
DR - Kappa        : 0.3889
DR - Precision: 0.5931
DR - Recall: 0.5926
DR - F1 score: 0.5487


[VeryFastDecisionRulesClassifier(drift_detector=None, expand_confidence=1e-07,
                                 expand_criterion='info_gain', grace_period=200,
                                 max_rules=1000, min_weight=100,
                                 nb_prediction=True, nb_threshold=0,
                                 nominal_attributes=[], ordered_rules=True,
                                 remove_poor_atts=False,
                                 rule_prediction='first_hit',
                                 tie_threshold=0.05)]