In [None]:
from pprint import pprint
from river import datasets
from river import compose
from river import linear_model
from river import metrics
from river import preprocessing
from river import stream

We select the Elec2 dataset from the River library:

In [None]:
print(datasets.Elec2())

We prepare a generator that provides conversion to a dictionary with float values. For the \"class\" column, we prepare a simple \"one-hot\" encoding that writes 1 to the column when it is \"UP\" and 0 when it is \"DOWN\". 

In [None]:
# Definicja funkcji do kodowania "one-hot"
def one_hot_encode_class(x):
    if x == 'DOWN':
        return {'class_DOWN': 1, 'class_UP': 0}
    elif x == 'UP':
        return {'class_DOWN': 0, 'class_UP': 1}
    else:
        return {'class_DOWN': 0, 'class_UP': 0}

params = {
    'converters': {
        'date': float,
        'day': float,
        'period': float,
        'nswprice': float,
        'nswdemand': float,
        'vicprice': float,
        'vicdemand': float,
        'transfer': float,
        'class': lambda x: 1 if x == 'UP' else 0
    }
}

dataset = stream.iter_csv('electricity.csv', target='class', **params)
for x, y in dataset:
  print(x, y)



# The task will require the use of logistic classification
The selected set includes various features such as date, day, time period, electricity prices in New South Wales, electricity demand in New South Wales, electricity prices in Victoria, electricity demand in Victoria, electricity transfer.

The dataset is presented as a `Binary classification'. The goal is to predict whether the price of electricity will rise or fall.

In order to predict the change in electricity prices, a model based on logistic regression was created. The model is built as a Pipeline, in which the data is first processed through feature standardization and then fed into the logistic regression model.

The following metrics are used to assess the quality of the model: accuracy, area under the ROC curve, precision, sensitivity and F1. Accuracy measures the proportion of correct predictions of the model. Area under the ROC curve assesses the model's ability to discriminate between classes. Precision measures the proportion of correctly predicted positive labels out of all predicted positive labels. Sensitivity measures the percentage of correctly predicted positive labels out of all actual positive labels. F1 calculates the harmonic mean of precision and recall.

In [None]:
dataset = stream.iter_csv('electricity.csv', target='class', **params)
model = compose.Pipeline(
    preprocessing.StandardScaler(),
    linear_model.LogisticRegression()
)
metricAcc = metrics.Accuracy()
metricROCAUC = metrics.ROCAUC()
metricPrecision = metrics.Precision()
metricRecall = metrics.Recall()
metricF = metrics.F1()

In [None]:
%%time
for x, y in dataset:
    y_pred = model.predict_one(x)      
    metricAcc = metricAcc.update(y, y_pred)
    metricROCAUC = metricROCAUC.update(y, y_pred)
    metricPrecision = metricPrecision.update(y, y_pred)
    metricRecall = metricRecall.update(y, y_pred)
    metricF = metricF.update(y, y_pred)
    model = model.learn_one(x, y)

In [None]:
print(metricAcc)
print(metricROCAUC)
print(metricPrecision)
print(metricRecall)
print(metricF)

In [None]:
dataset = stream.iter_csv('electricity.csv', target='class', **params)
n = 0
for x,y in iter(dataset):
    y_pred = model.predict_one(x)      
    metricAcc = metricAcc.update(y, y_pred)
    metricROCAUC = metricROCAUC.update(y, y_pred)
    metricPrecision = metricPrecision.update(y, y_pred)
    metricRecall = metricRecall.update(y, y_pred)
    metricF = metricF.update(y, y_pred)
    model = model.learn_one(x, y)

    n += 1
    if(n>100): break

print(f"Step {n} - Accuracy: {metricAcc}, ROCAUC: {metricROCAUC}, Precision: {metricPrecision}, Recall: {metricRecall}, F1-Score: {metricF}")

First iteration:

Step 101 - Accuracy: Accuracy: 83.76%, ROCAUC: ROCAUC: 82.77%, Precision: Precision: 84.00%, Recall: Recall: 76.24%, F1-Score: F1: 79.93%

Second iteration:

Step 101 - Accuracy: Accuracy: 83.76%, ROCAUC: ROCAUC: 82.76%, Precision: Precision: 84.02%, Recall: Recall: 76.20%, F1-Score: F1: 79.92%

Third iteration:

Step 101 - Accuracy: Accuracy: 83.75%, ROCAUC: ROCAUC: 82.75%, Precision: Precision: 84.03%, Recall: Recall: 76.15%, F1-Score: F1: 79.90%

Fourth iteration:

Step 101 - Accuracy: Accuracy: 83.75%, ROCAUC: ROCAUC: 82.73%, Precision: Precision: 84.05%, Recall: Recall: 76.09%, F1-Score: F1: 79.87%

15 iteration:

Step 101 - Accuracy: Accuracy: 83.69%, ROCAUC: ROCAUC: 82.57%, Precision: Precision: 84.21%, Recall: Recall: 75.45%, F1-Score: F1: 79.59%

30 iteration:

Step 101 - Accuracy: Accuracy: 83.70%, ROCAUC: ROCAUC: 82.47%, Precision: Precision: 84.43%, Recall: Recall: 74.91%, F1-Score: F1: 79.38%

The model deteriorates its metrics gently all the time, but eventually the values of the quality metrics remain at a similar level in successive iterations of code execution, without much change.

The accuracy of the model is moderate, oscillating around 83.7-83.8%

The ROCAUC index is oscillating around the 82.4-82.8%.

Precision oscillating around 84.0-84.4%.

Recall oscillating around 74.9-76.2%.

The F1-Score of the model oscillates around 79.4-79.9%.

The Logistic Regression model, despite stable results, may be somewhat limited in its ability to detect all positive cases. There is potential room for improving the results by using more advanced models or adjusting the model's hyperparameters.


# Model modification

The following model change is from logistic regression to percepton.

In [None]:
dataset = stream.iter_csv('electricity.csv', target='class', **params)
model = compose.Pipeline(
    preprocessing.StandardScaler(),
    linear_model.Perceptron()
)
metricAcc = metrics.Accuracy()
metricROCAUC = metrics.ROCAUC()
metricPrecision = metrics.Precision()
metricRecall = metrics.Recall()
metricF = metrics.F1()

In [None]:
%%time
for x, y in dataset:
    y_pred = model.predict_one(x)      
    metricAcc = metricAcc.update(y, y_pred)
    metricROCAUC = metricROCAUC.update(y, y_pred)
    metricPrecision = metricPrecision.update(y, y_pred)
    metricRecall = metricRecall.update(y, y_pred)
    metricF = metricF.update(y, y_pred)
    model = model.learn_one(x, y)

In [None]:
print(metricAcc)
print(metricROCAUC)
print(metricPrecision)
print(metricRecall)
print(metricF)

In [None]:
dataset = stream.iter_csv('electricity.csv', target='class', **params)
n = 0
for x,y in iter(dataset):
    y_pred = model.predict_one(x)      
    metricAcc = metricAcc.update(y, y_pred)
    metricROCAUC = metricROCAUC.update(y, y_pred)
    metricPrecision = metricPrecision.update(y, y_pred)
    metricRecall = metricRecall.update(y, y_pred)
    metricF = metricF.update(y, y_pred)
    model = model.learn_one(x, y)

    n += 1
    if(n>100): break

print(f"Step {n} - Accuracy: {metricAcc}, ROCAUC: {metricROCAUC}, Precision: {metricPrecision}, Recall: {metricRecall}, F1-Score: {metricF}")

First iteration:

Step 101 - Accuracy: Accuracy: 91.08%, ROCAUC: ROCAUC: 90.87%, Precision: Precision: 89.49%, Recall: Recall: 89.48%, F1-Score: F1: 89.48%

Second iteration:

Step 101 - Accuracy: Accuracy: 91.07%, ROCAUC: ROCAUC: 90.86%, Precision: Precision: 89.48%, Recall: Recall: 89.47%, F1-Score: F1: 89.48%

Third iteration:

Step 101 - Accuracy: Accuracy: 91.08%, ROCAUC: ROCAUC: 90.86%, Precision: Precision: 89.48%, Recall: Recall: 89.47%, F1-Score: F1: 89.47%

Fourth iteration:

Step 101 - Accuracy: Accuracy: 91.07%, ROCAUC: ROCAUC: 90.86%, Precision: Precision: 89.47%, Recall: Recall: 89.46%, F1-Score: F1: 89.47%

15 iteration:

Step 101 - Accuracy: Accuracy: 91.09%, ROCAUC: ROCAUC: 90.86%, Precision: Precision: 89.44%, Recall: Recall: 89.43%, F1-Score: F1: 89.43%

30 iteration:

Step 101 - Accuracy: Accuracy: 91.30%, ROCAUC: ROCAUC: 91.06%, Precision: Precision: 89.62%, Recall: Recall: 89.61%, F1-Score: F1: 89.62%

50 iteration:

Step 101 - Accuracy: Accuracy: 91.64%, ROCAUC: ROCAUC: 91.40%, Precision: Precision: 89.95%, Recall: Recall: 89.94%, F1-Score: F1: 89.95%

100 iteration:

Step 101 - Accuracy: Accuracy: 92.40%, ROCAUC: ROCAUC: 92.14%, Precision: Precision: 90.70%, Recall: Recall: 90.69%, F1-Score: F1: 90.70%

145 iteration:

Step 101 - Accuracy: Accuracy: 93.00%, ROCAUC: ROCAUC: 92.72%, Precision: Precision: 91.32%, Recall: Recall: 91.31%, F1-Score: F1: 91.31%

The model, after several loop executions, began to improve its performance.

On the first execution of the loop, his Accuracy score was 91.08%, and on the 145th execution of the loop it was 93% - a significant difference. 

There is a precisely visible upward trend in the value of the metrics, which means that the model learns more and more with each successive execution of the loop.

All other metrics also improved significantly, on a similar scale to the Accuracy metric.

The Perceptron model achieved better results than the previously used Logistic Regression model. It is more effective in classifying data from this set, achieving higher accuracy and better quality indicators.
