# 9. Fourth model implementation

This notebook aims to implement the `ClassifierChain` with genetic algorithm just like described in the article `[d9e6797] LinearOrderingProblembasedClassifierChainusingGeneticAlgorithm.pdf`.

## 9.1. Recapping the implementation

After reading the article again, here's the main idea of the implementation:

* The order os labels is chosen based on the order that showed a better "fitness score". Choosing a better order is handled by the genetic algorithm, and we don't have to worry about (apart from ensuring that the parameters are the same that the article used).
* The fitness score is where we get the "linear ordering problem" (LOP). Here's how it works.
  * Take a label order, for example, `[1, 4, 2, 3]`. It has `n=4` labels.
  * Build a `n x n` matrix. Rows index are `[1, 4, 2, 3]`, and the columns are `[1, 4, 2, 3]`.
  * Each cell of this matrix will have a value that is equal to the **conditional entropy** of the label in the row, given the label in the column. For example, the cell `[1, 4]` will have the value of the conditional entropy of the label `1`, given the label `4`.
  * Now sum the upper triangle of this matrix. That's the **fitness score**.

## 9.2. Setup

In [120]:
from skmultilearn.dataset import load_dataset
import numpy as np
from skmultilearn.problem_transform import ClassifierChain
import pygad
from typing import List
import sklearn.metrics as metrics
from typing import Any, Optional
import copy
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import math
from numpy.typing import NDArray
from typing import Dict
import pandas as pd
from typing import cast


In [2]:
desired_datasets = ["scene", "emotions", "birds"]

datasets = {}
for dataset_name in desired_datasets:
    print(f"getting dataset `{dataset_name}`")
    
    full_dataset = load_dataset(dataset_name, "undivided")
    X, y, _, _ = full_dataset

    train_dataset = load_dataset(dataset_name, "train")
    X_train, y_train, _, _ = train_dataset

    test_dataset = load_dataset(dataset_name, "test")
    X_test, y_test, _, _ = test_dataset

    datasets[dataset_name] = {
        "X": X,
        "y": y,
        "X_train": X_train,
        "y_train": y_train,
        "X_test": X_test,
        "y_test": y_test,
        "rows": X.shape[0],
        "labels_count": y.shape[1]
    }

for name, info in datasets.items():
    print("===")
    print(f"information for dataset `{name}`")
    print(f"rows: {info['rows']}, labels: {info['labels_count']}")


getting dataset `scene`
scene:undivided - exists, not redownloading
scene:train - exists, not redownloading
scene:test - exists, not redownloading
getting dataset `emotions`
emotions:undivided - exists, not redownloading
emotions:train - exists, not redownloading
emotions:test - exists, not redownloading
getting dataset `birds`
birds:undivided - exists, not redownloading
birds:train - exists, not redownloading
birds:test - exists, not redownloading
===
information for dataset `scene`
rows: 2407, labels: 6
===
information for dataset `emotions`
rows: 593, labels: 6
===
information for dataset `birds`
rows: 645, labels: 19


## 9.3. Playing around with entropy calculation

In [22]:
y = datasets["scene"]["y_train"]
y.todense()

matrix([[1, 0, 0, 0, 1, 0],
        [1, 0, 0, 0, 0, 1],
        [1, 0, 0, 0, 0, 0],
        ...,
        [0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 1]], dtype=int64)

In [4]:
y.shape

(1211, 6)

In [5]:
label_count = y.shape[1]
rows_count = y.shape[0]

probs = []

for label in range(label_count):
    instances_with_label = y[:, label].todense().sum()
    probs.append(instances_with_label / rows_count)

probs

[0.1874483897605285,
 0.13625103220478943,
 0.16267547481420314,
 0.16184971098265896,
 0.22873658133773742,
 0.18497109826589594]

In [6]:
entropies = []

for prob in probs:
    entropy = -1 * prob * math.log(prob, 2)
    entropies.append(entropy)

entropies

[0.45276933803928565,
 0.3918117707258502,
 0.4261985731270658,
 0.42522342518043577,
 0.4868065668618529,
 0.45033585713511676]

In [7]:
joint_entropies = []

for label in range(label_count):
    joint_entropies.append([])

    for column_position in range(label_count):
        and_prob = probs[label] * probs[column_position]
        joint_entropy = -1 * and_prob * math.log(and_prob, 2)
        joint_entropies[label].append(joint_entropy)

joint_entropies

[[0.169741766696809,
  0.13513477517031391,
  0.15354470329775657,
  0.15298803284199744,
  0.19481601760076195,
  0.1681639729896544],
 [0.13513477517031391,
  0.10676951638276679,
  0.12180816135339254,
  0.12135175245007311,
  0.15594958218271368,
  0.13383257891815417],
 [0.15354470329775657,
  0.12180816135339254,
  0.1386641104971626,
  0.1381535384751864,
  0.17667869399503078,
  0.1520930175359874],
 [0.15298803284199744,
  0.12135175245007311,
  0.1381535384751864,
  0.13764457693701967,
  0.17605365473154738,
  0.15154077228645788],
 [0.19481601760076195,
  0.15594958218271368,
  0.17667869399503078,
  0.17605365473154738,
  0.22270093975348185,
  0.19305342973037357],
 [0.1681639729896544,
  0.13383257891815417,
  0.1520930175359874,
  0.15154077228645788,
  0.19305342973037357,
  0.16659823616559233]]

In [8]:
def cond_entropy(x, y):
    return joint_entropies[x][y] - entropies[y]

cond_entropy(4,2)

-0.24951987913203505

Okay, I managed to implement the calculation. But the article gives the impression that, when `x` and `y` are the same, the conditional entropy should be zero. But that's not what I'm getting.

In [9]:
cond_entropy(0,0)

-0.2830275713424767

In [10]:
cond_entropy(1,1)

-0.28504225434308345

In [11]:
cond_entropy(2,2)

-0.28753446262990323

**I think that I understood what is wrong**. We are supposed to sum the probabilities, as the probability are calculated for each value that the variable may take. In the case of the labels, since they are binary, the values are either 0 or 1. We have to calculate the probabilities for 0, then 1, then sum them.

In [12]:
label_count = y.shape[1]
rows_count = y.shape[0]

probs = []

for label in range(label_count):
    instances_with_label = y[:, label].todense().sum() # value = 1
    instances_without_label = rows_count - instances_with_label # value = 0

    probs.append([
        instances_without_label / rows_count,  # value = 0
        instances_with_label / rows_count,    # value = 1
    ])

probs

[[0.8125516102394715, 0.1874483897605285],
 [0.8637489677952106, 0.13625103220478943],
 [0.8373245251857968, 0.16267547481420314],
 [0.838150289017341, 0.16184971098265896],
 [0.7712634186622626, 0.22873658133773742],
 [0.815028901734104, 0.18497109826589594]]

In [13]:
entropies = []

for prob in probs:
    results = []
    for value in [0,1]:
        prob_for_value = prob[value]
        summand = prob_for_value * math.log(prob_for_value, 2)
        results.append(summand)
    
    entropy = -1 * sum(results)
    entropies.append(entropy)

entropies

[0.6961030672262447,
 0.5743357592196299,
 0.6406718924240619,
 0.6387163439963871,
 0.7758023710944532,
 0.690832038687419]

In [14]:
joint_entropies = []

# for prob in probs:
#     for value_i in [0,1]:
#         for value_j in [0,1]:
#             and_prob = prob[value_i] * prob[value_j]

#             joint_entropy = -1 * and_prob * math.log(and_prob, 2)
#             joint_entropies.append(joint_entropy)

for x in range(label_count):
    joint_entropies.append([])

    for y in range(label_count):
        results = []
        for value_i in [0,1]:
            for value_j in [0,1]:
                and_prob = probs[x][value_i] * probs[y][value_j]
                summand = and_prob * math.log(and_prob, 2)
                results.append(summand)
                print(f"for x={x}, y={y}, value_i={value_i}, value_j={value_j}, and_prob={and_prob}, math.log(and_prob, 2) = {math.log(and_prob, 2)}, summand={summand}")

        joint_entropy = -1 * sum(results)
        joint_entropies[x].append(joint_entropy)

joint_entropies

for x=0, y=0, value_i=0, value_j=0, and_prob=0.660240119302758, math.log(and_prob, 2) = -0.5989372887101777, summand=-0.3954424269528782
for x=0, y=0, value_i=0, value_j=1, and_prob=0.1523114909367135, math.log(and_prob, 2) = -2.714903306758502, summand=-0.4135109704014011
for x=0, y=0, value_i=1, value_j=0, and_prob=0.1523114909367135, math.log(and_prob, 2) = -2.714903306758502, summand=-0.4135109704014011
for x=0, y=0, value_i=1, value_j=1, and_prob=0.035136898823815, math.log(and_prob, 2) = -4.8308693248068275, summand=-0.169741766696809
for x=0, y=1, value_i=0, value_j=0, and_prob=0.7018406146246798, math.log(and_prob, 2) = -0.5107846578024762, summand=-0.35848941817294666
for x=0, y=1, value_i=0, value_j=1, and_prob=0.11071099561479174, math.log(and_prob, 2) = -3.175129579803602, summand=-0.3515217569860321
for x=0, y=1, value_i=1, value_j=0, and_prob=0.16190835317053082, math.log(and_prob, 2) = -2.626750675850801, summand=-0.425292876116582
for x=0, y=1, value_i=1, value_j=1, and

[[1.3922061344524894,
  1.2704388264458748,
  1.3367749596503065,
  1.3348194112226317,
  1.471905438320698,
  1.3869351059136636],
 [1.2704388264458748,
  1.14867151843926,
  1.2150076516436918,
  1.2130521032160169,
  1.350138130314083,
  1.265167797907049],
 [1.3367749596503065,
  1.2150076516436918,
  1.2813437848481235,
  1.279388236420449,
  1.416474263518515,
  1.3315039311114807],
 [1.3348194112226317,
  1.2130521032160169,
  1.279388236420449,
  1.2774326879927738,
  1.4145187150908403,
  1.3295483826838062],
 [1.471905438320698,
  1.350138130314083,
  1.4164742635185146,
  1.4145187150908403,
  1.5516047421889063,
  1.4666344097818722],
 [1.3869351059136636,
  1.265167797907049,
  1.3315039311114807,
  1.329548382683806,
  1.4666344097818722,
  1.3816640773748379]]

In [15]:
import numpy as np

def get_joint_entropy(x, y, probs):
    results = []
    for value_i in [0,1]:
        for value_j in [0,1]:
            and_prob = probs[x][value_i] * probs[y][value_j]
            # summand = and_prob * np.log2(and_prob) 

            if and_prob > 0:  # Avoid taking the log of 0
                summand = and_prob * np.log2(and_prob)
                results.append(summand)
    
    return -1 * sum(results)

get_joint_entropy(0, 0, probs)

1.3922061344524894

In [16]:
def cond_entropy(x, y):
    return joint_entropies[x][y] - entropies[y]

In [17]:
print(joint_entropies[1][1])
print(entropies[1])
print(entropies[1])
print(entropies[1]+ entropies[1])


1.14867151843926
0.5743357592196299
0.5743357592196299
1.1486715184392597


In [18]:
for label in range(label_count):
    for column_position in range(label_count):
        print(cond_entropy(label,column_position))

0.6961030672262447
0.6961030672262449
0.6961030672262446
0.6961030672262446
0.6961030672262448
0.6961030672262446
0.5743357592196301
0.5743357592196301
0.57433575921963
0.5743357592196298
0.5743357592196298
0.57433575921963
0.6406718924240618
0.640671892424062
0.6406718924240616
0.6406718924240619
0.6406718924240619
0.6406718924240616
0.638716343996387
0.638716343996387
0.6387163439963871
0.6387163439963867
0.6387163439963871
0.6387163439963871
0.7758023710944533
0.775802371094453
0.7758023710944527
0.7758023710944532
0.7758023710944532
0.7758023710944532
0.6908320386874189
0.6908320386874192
0.6908320386874188
0.6908320386874188
0.690832038687419
0.6908320386874188


In [19]:
print(cond_entropy(1,3))
print(cond_entropy(3,1))

0.5743357592196298
0.638716343996387


Honestly, I think that I got it right... I was expecting conditional_entropy to be zero, as the article states that:

> If condEntropy(X /Y ) = 0, X has no uncertainty in the Y ’s presence, i.e., X can be determined entirely by Y.

So if X and Y are the same variable, obviously knowing X determines Y. But the calculations provided in the article, which I could verify [here](https://www.ece.tufts.edu/ee/194NIT/lect01.pdf), [here](https://en.wikipedia.org/wiki/Joint_entropy) and even [here](https://gist.github.com/kudkudak/dabbed1af234c8e3868e) and also [here](https://www.cs.cmu.edu/~venkatg/teaching/ITCS-spr2013/notes/lect-jan17.pdf), will obviously lead to the joint entropy of X,X always being the sum of the entropy of X and the entropy of X. And since conditional entropy is the joint entropy minus the entropy of Y, it will always be the entropy of X.

The reason for that is that the formula for the joint entropy takes two sums, one for X and one for Y. So when X and Y are the same, we have to sum the entropy of X twice. Therefore, the joint entropy of X,X will always be the entropy of X multiplied by 2. If you subtract the entropy of Y, which is zero, you will always get the entropy of X.

Other implementation in Python seen [here](https://datascience.stackexchange.com/questions/58565/conditional-entropy-calculation-in-python-hyx).

_So my implementation is most likely correct_. The only weird thing is that the final conditional entropy values are very similar across the same row (same X, or same Y). I don't know if that's normal or not.

## 9.4. Stablish code for entropy calculation

Now that we are comfortable with the entropy calculation, let's make a more "stable" code for it.

In [20]:
# by Copilot:

class Entropy:
    def __init__(self, probs):
        self.probs = probs
        self.label_count = len(probs)
        self.entropies = self._get_entropies()
        self.joint_entropies = self._get_joint_entropies()
    
    def _get_entropies(self):
        entropies = []

        for prob in self.probs:
            results = []
            for value in [0,1]:
                prob_for_value = prob[value]
                summand = prob_for_value * math.log(prob_for_value, 2)
                results.append(summand)
            
            entropy = -1 * sum(results)
            entropies.append(entropy)

        return entropies

    def _get_joint_entropies(self):
        joint_entropies = []

        for x in range(self.label_count):
            joint_entropies.append([])

            for y in range(self.label_count):
                results = []
                for value_i in [0,1]:
                    for value_j in [0,1]:
                        and_prob = self.probs[x][value_i] * self.probs[y][value_j]
                        summand = and_prob * math.log(and_prob, 2)
                        results.append(summand)
                        # print(f"for x={x}, y={y}, value_i={value_i}, value_j={value_j}, and_prob={and_prob}, math.log(and_prob, 2) = {math.log(and_prob, 2)}, summand={summand}")

                joint_entropy = -1 * sum(results)
                joint_entropies[x].append(joint_entropy)

        return joint_entropies

    def get_cond_entropy(self, x, y):
        return self.joint_entropies[x][y] - self.entropies[y]

    def get_cond_entropy_matrix(self):
        matrix = []

        for i in range(self.label_count):
            matrix.append([])
            for j in range(self.label_count):
                matrix[i].append(self.get_cond_entropy(i,j))
        
        return matrix

    def get_cond_entropy_matrix(self):
        matrix = []

        for i in range(self.label_count):
            matrix.append

In [47]:
Probabilities = Dict[int, Dict[int, float]]

def calculate_probabilities(y: NDArray[np.int64]) -> Probabilities:
    dense_y = y.todense()

    label_count = dense_y.shape[1]
    rows_count = dense_y.shape[0]

    probs = {}

    for label in range(label_count):
        probs[label] = {}
        y_label_specific = np.asarray(dense_y[:, label]).reshape(-1)
        # convert_matrix_to_vector

        possible_values = np.unique(y_label_specific)

        for value in possible_values:
            instances_with_label = np.count_nonzero(y_label_specific == value)
            probs[label][value] = instances_with_label / rows_count
    
    return probs

y = datasets["scene"]["y_train"]
probs = calculate_probabilities(y)
probs

{0: {0: 0.8125516102394715, 1: 0.1874483897605285},
 1: {0: 0.8637489677952106, 1: 0.13625103220478943},
 2: {0: 0.8373245251857968, 1: 0.16267547481420314},
 3: {0: 0.838150289017341, 1: 0.16184971098265896},
 4: {0: 0.7712634186622626, 1: 0.22873658133773742},
 5: {0: 0.815028901734104, 1: 0.18497109826589594}}

In [52]:
Entropies = Dict[int, float]

def calculate_entropies(probabilities: Probabilities) -> Entropies:
    entropies = {}

    for label, calculated_probabilities in probabilities.items():
        results = []
        for _, prob in calculated_probabilities.items():
            summand = prob * math.log(prob, 2)
            results.append(summand)
        
        entropy = -1 * sum(results)
        entropies[label] = entropy

    return entropies

entropies = calculate_entropies(probs)
entropies

{0: 0.6961030672262447,
 1: 0.5743357592196299,
 2: 0.6406718924240619,
 3: 0.6387163439963871,
 4: 0.7758023710944532,
 5: 0.690832038687419}

In [56]:
def calculate_joint_probability(probabilities: Probabilities, label_x: int, label_y: int):
    results = []
    
    for _, prob_i in probabilities[label_x].items():
        for _, prob_j in probabilities[label_y].items():
            and_prob = prob_i * prob_j

            if and_prob > 0:  # avoid taking the log of 0
                summand = and_prob * np.log2(and_prob)
                results.append(summand)
    
    joint_probability = -1 * sum(results)
    return joint_probability

calculate_joint_probability(probs, 0, 0)

1.3922061344524894

In [78]:
def calculate_conditional_entropy(probabilities: Probabilities, entropies: Entropies, label_x: int, label_y: int):
    joint_entropy = calculate_joint_probability(probabilities, label_x, label_y)
    entropy = entropies[label_y]
    return joint_entropy - entropy

calculate_conditional_entropy(probs, entropies, 3, 1)

0.638716343996387

## 9.5. Actually solving the LOP

Now that we can easily calculate the entropies, let's focus on solving the linear ordering problem (LOP) that the article suggests.

Recapping again, the article proposes that:
* We get an `n x n` matrix, where `n` is the number of labels.
* The rows and columns are the labels.
* Each cell in the matrix is the conditional entropy of the label in the row, given the label in the column.
* We sum the upper triangle of the matrix. That's the fitness score.

In [91]:
label_count = y.shape[1]

label_order = [1, 3, 2, 0, 5, 4]
# trying a random order to see if it works

matrix = {}

for row_i in label_order:
    matrix[row_i] = {}
    for row_j in label_order:
        if row_i == row_j:
            matrix[row_i][row_j] = 0
            # this is to match the table described in the paper
            # but in reality we _have_ a >0 conditional entropy for a label with itself
            continue

        cond_entropy = calculate_conditional_entropy(probs, entropies, row_i, row_j)
        matrix[row_i][row_j] = cond_entropy
    
t = pd.DataFrame(matrix)
t

Unnamed: 0,1,3,2,0,5,4
1,0.0,0.638716,0.640672,0.696103,0.690832,0.775802
3,0.574336,0.0,0.640672,0.696103,0.690832,0.775802
2,0.574336,0.638716,0.0,0.696103,0.690832,0.775802
0,0.574336,0.638716,0.640672,0.0,0.690832,0.775802
5,0.574336,0.638716,0.640672,0.696103,0.0,0.775802
4,0.574336,0.638716,0.640672,0.696103,0.690832,0.0


In [105]:
matrix

{1: {1: 0,
  3: 0.5743357592196298,
  2: 0.5743357592196298,
  0: 0.5743357592196296,
  5: 0.5743357592196298,
  4: 0.5743357592196298},
 3: {1: 0.638716343996387,
  3: 0,
  2: 0.6387163439963869,
  0: 0.638716343996387,
  5: 0.6387163439963869,
  4: 0.6387163439963871},
 2: {1: 0.6406718924240618,
  3: 0.6406718924240616,
  2: 0,
  0: 0.6406718924240618,
  5: 0.6406718924240614,
  4: 0.6406718924240619},
 0: {1: 0.6961030672262445,
  3: 0.6961030672262446,
  2: 0.6961030672262446,
  0: 0,
  5: 0.6961030672262446,
  4: 0.6961030672262448},
 5: {1: 0.6908320386874189,
  3: 0.6908320386874188,
  2: 0.6908320386874186,
  0: 0.6908320386874189,
  5: 0,
  4: 0.690832038687419},
 4: {1: 0.775802371094453,
  3: 0.7758023710944532,
  2: 0.7758023710944527,
  0: 0.7758023710944533,
  5: 0.7758023710944532,
  4: 0}}

In [103]:
matrix_size_n = t.shape[0]

upper_triangule_sum = 0
for row_position in range(matrix_size_n):
    for column_position in range(matrix_size_n):
        if column_position > row_position:
            upper_triangule_sum += t.iloc[row_position,column_position]

upper_triangule_sum

10.650709340745188

In [104]:
(0.638716+0.640672+0.696103+0.690832+0.775802) + \
    (0.640672+0.696103+0.690832+0.775802) + \
        (0.696103+0.690832+0.775802) + (0.690832+0.775802) + 0.775802

10.650706999999999

Similarly as before, let's now build a _final code_.

In [108]:
LOPMatrix = Dict[int, Dict[int, float]]

def build_lop_matrix(
    label_order: List[int],
    probabilities: Probabilities,
    entropies: Entropies
) -> LOPMatrix:
    matrix = {}

    for row_i in label_order:
        matrix[row_i] = {}
        for row_j in label_order:
            if row_i == row_j:
                matrix[row_i][row_j] = 0
                # this is to match the table described in the paper
                # but in reality we _have_ a >0 conditional entropy for a label with itself
                continue

            cond_entropy = calculate_conditional_entropy(probabilities, entropies, row_i, row_j)
            matrix[row_i][row_j] = cond_entropy
        
    return matrix

label_order = [1, 3, 2, 0, 5, 4]
# trying a random order to see if it works
lop_matrix = build_lop_matrix(label_order, probs, entropies)
lop_matrix

{1: {1: 0,
  3: 0.5743357592196298,
  2: 0.5743357592196298,
  0: 0.5743357592196296,
  5: 0.5743357592196298,
  4: 0.5743357592196298},
 3: {1: 0.638716343996387,
  3: 0,
  2: 0.6387163439963869,
  0: 0.638716343996387,
  5: 0.6387163439963869,
  4: 0.6387163439963871},
 2: {1: 0.6406718924240618,
  3: 0.6406718924240616,
  2: 0,
  0: 0.6406718924240618,
  5: 0.6406718924240614,
  4: 0.6406718924240619},
 0: {1: 0.6961030672262445,
  3: 0.6961030672262446,
  2: 0.6961030672262446,
  0: 0,
  5: 0.6961030672262446,
  4: 0.6961030672262448},
 5: {1: 0.6908320386874189,
  3: 0.6908320386874188,
  2: 0.6908320386874186,
  0: 0.6908320386874189,
  5: 0,
  4: 0.690832038687419},
 4: {1: 0.775802371094453,
  3: 0.7758023710944532,
  2: 0.7758023710944527,
  0: 0.7758023710944533,
  5: 0.7758023710944532,
  4: 0}}

In [123]:
def calculate_lop(lop_matrix: LOPMatrix) -> float:
    matrix_size_n = len(lop_matrix)
    lop_df = pd.DataFrame(lop_matrix)

    upper_triangle_sum = 0
    for row_position in range(matrix_size_n):
        for column_position in range(matrix_size_n):
            if column_position > row_position:
                conditional_probability = lop_df.iloc[row_position, column_position]
                upper_triangle_sum += cast(float, conditional_probability)
                # the conversion to a dataframe is not necessary
                # but makes it easier to find the element we want
                # by their order in the rows or columns
                # instead of the actual column or row index
    
    return upper_triangle_sum

calculate_lop(lop_matrix)

10.650709340745188

### 9.5.1 Why summing the upper triangle?

The article proposes that the we sum the upper triangle of the LOP matrix, and this sum is the value of the fitness score.

My interpretation is that, since the article states that labels with more uncertainty (higher entropy) should be placed towards the end of the chain, then the upper triangle, as it considers all elements above the diagonal, will give a "higher weight" to the labels towards the end of the chain.

If the labels towards the end of the chain really have higher entropies, then the final column will carry the highest values seen. Summing all of that, we will end up with a higher fitness score.

## 9.6. Putting it all together

In [128]:
def test_solution(y: NDArray[np.int64], label_order: List[int]) -> float:
    probs = calculate_probabilities(y)
    entropies = calculate_entropies(probs)
    lop_matrix = build_lop_matrix(label_order, probs, entropies)
    return calculate_lop(lop_matrix)

y = datasets["scene"]["y_train"]
label_order = [4, 1, 3, 5, 2, 0]
test_solution(y, label_order)

9.967467469102132

In [149]:
label_count = y.shape[1]
label_space = np.arange(label_count)

def fitness_func(ga_instance: Any, solution: Any, solution_idx: Any) -> float:
    probs = calculate_probabilities(y)
    entropies = calculate_entropies(probs)
    lop_matrix = build_lop_matrix(solution, probs, entropies)
    return calculate_lop(lop_matrix)

ga_model = pygad.GA( #type:ignore
    gene_type=int,
    gene_space=label_space,
    save_best_solutions=False,
    fitness_func=fitness_func,
    allow_duplicate_genes=False, # very important, otherwise we will have duplicate labels in the ordering
    num_genes=label_count,

    # set up
    num_generations=5,
    sol_per_pop=3,

    # following what the article describes
    keep_elitism=1, # also following what the article describes, but we have to double check [TODO]
    parent_selection_type="rws", # following what the article describes
    # mutation_probability=0.005, # following what the article describes

    # the following settings are fixed
    # they were chosen for no particular reason
    # they are being kept as fixed to simplify the model
    num_parents_mating=2,
    crossover_type="scattered",
    mutation_type="random",
    mutation_by_replacement=True,
    mutation_num_genes=1,
)

ga_model.run()

solution, _, _ = ga_model.best_solution()
display(solution)

test_solution(y, solution)

array([2, 0, 1, 5, 3, 4])

10.351147933185574

In [152]:
class ClassifierChainWithLOPAndGA():
    def __init__(self, base_classifier: Any, num_generations: int = 5, random_state: Optional[int] = None) -> None:
        self.base_classifier = base_classifier
        self.num_generations = num_generations

        if random_state is None:
            self.random_state = np.random.randint(0, 1000)
        else:
            self.random_state = random_state
    
    def fit(self, X: Any, y: Any):
        self.probs = calculate_probabilities(y)
        self.entropies = calculate_entropies(self.probs)
        self.label_count = y.shape[1]

        label_space = np.arange(label_count)
        solutions_per_population = math.ceil(label_count / 2)

        ga_model = pygad.GA( #type:ignore
            gene_type=int,
            gene_space=label_space,
            random_seed=self.random_state,
            save_best_solutions=False,
            fitness_func=self.model_fitness_func,
            allow_duplicate_genes=False, # very important, otherwise we will have duplicate labels in the ordering
            num_genes=label_count,

            # set up
            num_generations=self.num_generations,
            sol_per_pop=solutions_per_population,

            # following what the article describes
            keep_elitism=1, # also following what the article describes, but we have to double check [TODO]
            parent_selection_type="rws", # following what the article describes
            # mutation_probability=0.005, # following what the article describes

            # the following settings are fixed
            # they were chosen for no particular reason
            # they are being kept as fixed to simplify the model
            num_parents_mating=2,
            crossover_type="scattered",
            mutation_type="random",
            mutation_by_replacement=True,
            mutation_num_genes=1,
        )

        ga_model.run()

        solution, _, _ = ga_model.best_solution()

        best_classifier = ClassifierChain(
            classifier=copy.deepcopy(self.base_classifier),
            require_dense=[False, True],
            order=solution,
        )

        best_classifier.fit(X, y)

        self.best_classifier = best_classifier
    
    def model_fitness_func(self, ga_instance: Any, solution: Any, solution_idx: Any) -> float:
        return self.test_solution(solution)

    def test_solution(self, label_order: List[int]) -> float:
        if self.probs is None or self.entropies is None:
            raise Exception("probabilities and entropies must be calculated before testing a solution")
        
        lop_matrix = build_lop_matrix(label_order, self.probs, self.entropies)
        return calculate_lop(lop_matrix)
    
    def predict(self, X: Any) -> Any:
        if self.best_classifier is None:
            raise Exception("model was not trained yet")

        return self.best_classifier.predict(X)

In [162]:
X = datasets["scene"]["X_train"]
y = datasets["scene"]["y_train"]
X_test = datasets["scene"]["X_test"]
y_test = datasets["scene"]["y_test"]

m = ClassifierChainWithLOPAndGA(RandomForestClassifier(random_state=42), num_generations=20)
m.fit(X, y)
preds = m.predict(X_test)

hamming_loss = metrics.hamming_loss(y_test, preds)
f1_score = metrics.f1_score(y_test, preds, average="macro")

print(f"hamming loss: {hamming_loss}")
print(f"f1 score: {f1_score}")


hamming loss: 0.2801003344481605
f1 score: 0.056223522137844796


In [163]:
preds

<1196x6 sparse matrix of type '<class 'numpy.float64'>'
	with 835 stored elements in Compressed Sparse Column format>

In [170]:
X = datasets["emotions"]["X_train"]
y = datasets["emotions"]["y_train"]
X_test = datasets["emotions"]["X_test"]
y_test = datasets["emotions"]["y_test"]

m = ClassifierChainWithLOPAndGA(RandomForestClassifier(random_state=42), num_generations=20)
m.fit(X, y)
other_preds = m.predict(X_test)

hamming_loss = metrics.hamming_loss(y_test, other_preds.todense())
f1_score = metrics.f1_score(y_test, other_preds, average="macro")

print(f"hamming loss: {hamming_loss}")
print(f"f1 score: {f1_score}")


hamming loss: 0.5734323432343235
f1 score: 0.056981860448637754
