<a href="https://colab.research.google.com/github/David-Medina/LenguajeNatural/blob/main/TextModel_Optimization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Text Model Optimization Using Differential Evolution**

The `microtc` library will be used to train a Text Model and predict the class for each given input. The problem to be solved is to optimize said model. The `TextModel`from this library takes into account various parameteres for its constructor. This paper will show a Differential Evolution Algorithm to optimize said parameters and find the best text model for the input data. 

In [None]:
pip install microtc



The parameters that the `TextModel`'s constructor takes into account and that will be used in the algorithm are the following: 


* `num_option` (str) – Transformations on numbers (none | group | delete)

* `usr_option` (str) – Transformations on users (none | group | delete)

* `url_option` (str) – Transformations on urls (none | group | delete) 

* `emo_option` (str) – Transformations on emojis and emoticons (none | group | delete)

* `hashtag_option` (str) – Transformations on hashtag (none | group | delete)

* `ent_option` (str) – Transformations on entities (none | group | delete)

* `weighting` (str) – Weighting scheme (tfidf | tf | entropy)

* `lc` (bool) – Lower case

* `del_dup` (bool) – Remove duplicates e.g. hooola -> hola

* `del_punc` (bool) – Remove punctuation symbols

* `del_diac` (bool) – Remove diacritics

* `token_min_filter` (int or float) – Keep those tokens that appear more times than the parameter (used in weighting class)

* `token_max_filter` (int or float) – Keep those tokens that appear less times than the parameter (used in weighting class)

* `token_list` (list) – Tokens > 0 qgrams < 0 word-grams

## **1. Representation of the `TextModel`**

Our proposed algorithm represents each individual -each `TextModel`- with an array of `0s & 1s`. Those digits will represent the selected value for the different parameters; they can be grouped as more than one bit if there are more than two options for a parameter.

As shown in the previous section, there are 13 parameters categorized in the following way:

* 7 str
* 4 bool
* 2 int/float

The `str` parameters all coincide in having 3 options for its value (option 0, 1 or 2), so a pair of bits (representing 0-2 binary) will represent each of those.

The `bool`parameters are binary options so only a bit is necessary: `1 `represents `True` and `0` represents `False`. 

Finally, the `int/float` parameters, the min/max tokens will be represented the same as the `bool` ones. If a `True`is set, then a generated random number from a previous set range will be its value; otherwise, said parameter will not be set. For the token list, it will perform like the `str`ones, having 4 options which will represent the chosen list with qgrams and/or ngrams. 





### **Creating an individual**

Based on the explanation above, each individual will be an **array with 21 cells**: the first 14 for the non-binary parameters (2 bits per each), the next 6 for the binary ones (1 bit per each) and the last 2 for the token list.

In [None]:
import numpy as np

In [None]:
def create_random_ind():
  ind = np.ones(21).astype(int)
  ind[:10] = 0
  np.random.shuffle(ind)

  return ind

In [None]:
ind = create_random_ind()
print(ind)
print(ind.shape)

[0 0 0 0 1 1 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1]
(21,)


### **Fixing an individual**

It has been established that the non-binary parameters have 3 options each and there are 2 bits appointed per parameter. So there is a combination of those 2 bits that is invalid: 11. If this combination appears for a parameter, we will account it as a `random` option. This means that when the `TextModel` is being constructed, an option for said parameter will be randomly chosen creating a number between 1 and 3 (decimal). More detail of this in the *Fitness* section of the algorithm.

## **2. Mutation**

For the mutation of each individual, the common formula for Differential Evolution will be used.

$$v' = x_1 + F(x_2  - x_3)$$

where $x_i$ corresponds to a random individual from the population and $F$ is a random value in the range $[0, 1]$. 

A validation of the new values is necessary since we are only working with 1s and 0s. 

In [None]:
def mutation(population):

  n = len(population)       # size of population
  d = len(population[0])    # size of individual

  vPopulation = np.zeros((n, d), int)   # Mutated population

  for i in range(n):

      n1 = np.random.randint(n)
      n2 = np.random.randint(n)
      n3 = np.random.randint(n)
      while n1 == n2 or n1 == n3 or n2 == n3:
          n1 = np.random.randint(n)
          n2 = np.random.randint(n)
          n3 = np.random.randint(n)

      vPopulation[i] = population[n1] + np.random.uniform(0, 1) * (population[n2] - population[n3])

      for idx in range(d):
        if vPopulation[i][idx] < 0.5:
          vPopulation[i][idx] = 0
        else:
          vPopulation[i][idx] = 1

  return vPopulation

## **3. Crossover**

As in the previous step, the rules of Differential Evolution are followed. The aim is to obtain a vector using the individuals created in the mutation with the normal population. To do this, a random value between $[0,1]$ is taken and compared with a $CR$ (crossover rate) value. If this is less, the mutated vector $v'$ is taken, otherwise the individual $x$ from the normal population is taken.

In [None]:
def crossover(population, vPopulation, CR):

  n = len(population)
  d = len(population[0])
  uPopulation = np.zeros((n, d), int)

  for i in range(n):
      l = np.random.randint(0, d - 1)   
      for j in range(d):
          randFloat = np.random.random()
          if randFloat < CR or j == l:
              uPopulation[i][j] = vPopulation[i][j]
          else:
              uPopulation[i][j] = population[i][j]

  return uPopulation

In [None]:
population = [create_random_ind(), create_random_ind(), create_random_ind()]
print(population[0])
vPopulation = mutation(population)
print(vPopulation[0])
uPopulation = crossover(population, vPopulation, 0.3)
print(uPopulation[0])

[0 0 1 0 1 0 1 1 1 0 0 0 1 0 1 1 0 1 0 1]
[1 1 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0]
[0 1 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 0 1]


## **4. Fitness**

The fitness of each individual will be represented by the `f-1 score` of the `TextModel`. For this, a model will be created with the values of the parameters established by each individual, trained and evaluated. 

In [None]:
from microtc.textmodel import TextModel
from sklearn.metrics import f1_score
from sklearn.svm import LinearSVC

In [None]:
def calculate_fitness(ind, train, test, y, yt):
  
  # Non-binary parameters [none | group | delete] & [tfidf | tf | entropy]

  non_binary_options = ['none', 'group', 'delete']
  weighting_options = ['tfidf', 'tf', 'entropy']

  parameters = []

  for idx in range(0, 12, 2):
    bits = ''.join(ind[idx:idx+2].astype(str))
    
    if bits == '00':
      parameters.append(non_binary_options[0])
    elif bits == '01':
      parameters.append(non_binary_options[1])
    elif bits == '10':
      parameters.append(non_binary_options[2])
    else:
      num = np.random.randint(1, 4)
      parameters.append(non_binary_options[num-1])

  bits = ''.join(ind[12:14].astype(str))
  if bits == '00':
      parameters.append(weighting_options[0])
  elif bits == '01':
    parameters.append(weighting_options[1])
  #elif bits == '10':
   # parameters.append(weighting_options[2])
  else:
    #num = np.random.randint(1, 4)
    num = np.random.randint(1, 3)
    parameters.append(weighting_options[num-1])
    
  # Binary parameters

  for bit in ind[14:]:
    if bit:
      parameters.append(True)
    else:
      parameters.append(False)

  print(parameters)

  # int/float parameters

  min_range = (1, 50)
  max_range = (51, 100)
  token_min_filter = None
  token_max_filter = None

  if parameters[-2]: 
    token_min_filter = np.random.randint(min_range[0], min_range[1]+1)

  if parameters[-1]:
    token_max_filter = np.random.randint(max_range[0], max_range[1]+1)

  # Construct Text Model

  if token_min_filter and token_max_filter:
    mtc = TextModel(num_option=parameters[0], usr_option=parameters[1], url_option=parameters[2], emo_option=parameters[3], hashtag_option=parameters[4], 
                    ent_option=parameters[5], weighting=parameters[6], lc=parameters[7], del_dup=parameters[8], del_punc=parameters[9], del_diac=parameters[10],
                    token_min_filter=token_min_filter, token_max_filter=token_max_filter
                    )

  elif token_min_filter:
    mtc = TextModel(num_option=parameters[0], usr_option=parameters[1], url_option=parameters[2], emo_option=parameters[3], hashtag_option=parameters[4], 
                    ent_option=parameters[5], weighting=parameters[6], lc=parameters[7], del_dup=parameters[8], del_punc=parameters[9], del_diac=parameters[10],
                    token_min_filter=token_min_filter
                    )

  elif token_max_filter:
    mtc = TextModel(num_option=parameters[0], usr_option=parameters[1], url_option=parameters[2], emo_option=parameters[3], hashtag_option=parameters[4], 
                    ent_option=parameters[5], weighting=parameters[6], lc=parameters[7], del_dup=parameters[8], del_punc=parameters[9], del_diac=parameters[10],
                    token_max_filter=token_max_filter
                    )

  else:
    mtc = TextModel(num_option=parameters[0], usr_option=parameters[1], url_option=parameters[2], emo_option=parameters[3], hashtag_option=parameters[4], 
                    ent_option=parameters[5], weighting=parameters[6], lc=parameters[7], del_dup=parameters[8], del_punc=parameters[9], del_diac=parameters[10]
                    )
    
  # Train Text Model

  mtc.fit(train)
  X = mtc.transform(train)
  Xt = mtc.transform(test)

  clf = LinearSVC().fit(X, y)
  yp = clf.predict(Xt)

  return f1_score(yt, yp, average='weighted')


## **5. Selection**

A simple tournament is held where the $u$ and $x$ vectors are compared.

In [None]:
def selection(population, fitness, uPopulation, uFitness):

    n = len(population)
    d = len(population[0])

    newPopulation = np.zeros((n,d),float)
    newFitness = []

    for i in range(n):

        if fitness[i] >= uFitness[i]:
            newPopulation[i] = population[i]
            newFitness.append(fitness[i])
        else:
            newPopulation[i] = uPopulation[i]
            newFitness.append(uFitness[i])

    return newPopulation, newFitness

## **Differential Evolution Class**

Just to remember the DE algorithm: 

```
create random population
calculate fitness
get elite

for i 0: maximum iterations
  obtain population V → mutation
  obtain population U → crossover
  get fitness
  select the best → compare population U and population
  get elite
```

Now the functions implemented above will be used to find the most accurate text model. Some parameters are needed to run the DE algorithm:

* `maxIterations` - maximum number of iterations for the algorithm. 
* `n` - number of individuals per population.
* `CR` - Crossover Rate

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from microtc.textmodel import TextModel
from sklearn.metrics import f1_score
from sklearn.svm import LinearSVC
import numpy as np

class DifferentialEvolution:

  def __init__(self, numTests, maxIterations, n, CR, results_file):
    self.numTests = numTests
    self.maxIterations = maxIterations
    self.n = n
    self.CR = CR
    self.results_file = results_file
    self.ind_length = 21

  def set_train_test(self, data):

    self.X = data.tweet
    self.y = data.emotion

    self.train, self.test, self.y_train, self.yt = train_test_split(self.X, self.y, test_size=0.33, random_state=42)
    print(f"train: {self.train.shape} | test: {self.test.shape} | y: {self.y_train.shape} | yt: {self.yt.shape}")

  def create_random_individual(self):
    ind = np.ones(self.ind_length).astype(int)
    ind[:10] = 0
    np.random.shuffle(ind)

    return ind

  def create_random_population(self):

    population = []

    for i in range(self.n):
      population.append(self.create_random_individual())

    return np.array(population)

  def mutation(self, population):

    n = len(population)       # size of population
    d = len(population[0])    # size of individual

    vPopulation = np.zeros((n, d), int)   # Mutated population

    for i in range(n):

        n1 = np.random.randint(n)
        n2 = np.random.randint(n)
        n3 = np.random.randint(n)
        while n1 == n2 or n1 == n3 or n2 == n3:
            n1 = np.random.randint(n)
            n2 = np.random.randint(n)
            n3 = np.random.randint(n)

        vPopulation[i] = population[n1] + np.random.uniform(0, 1) * (population[n2] - population[n3])

        for idx in range(d):
          if vPopulation[i][idx] < 0 or vPopulation[i][idx] < 0.5:
            vPopulation[i][idx] = 0
          else:
            vPopulation[i][idx] = 1

    return vPopulation

  def crossover(self, population, vPopulation, CR):

    n = len(population)
    d = len(population[0])
    uPopulation = np.zeros((n, d), int)

    for i in range(n):
        l = np.random.randint(0, d - 1)   
        for j in range(d):
            randFloat = np.random.random()
            if randFloat < CR or j == l:
                uPopulation[i][j] = vPopulation[i][j]
            else:
                uPopulation[i][j] = population[i][j]

    return uPopulation

  def calculate_fitness(self, ind, train, test, y, yt):

    # Non-binary parameters [none | group | delete] & [tfidf | tf | entropy]

    non_binary_options = ['none', 'group', 'delete']
    weighting_options = ['tfidf', 'tf', 'entropy']

    parameters = []

    for idx in range(0, 12, 2):
      bits = ''.join(ind[idx:idx+2].astype(str))
      
      if bits == '00':
        parameters.append(non_binary_options[0])
      elif bits == '01':
        parameters.append(non_binary_options[1])
      elif bits == '10':
        parameters.append(non_binary_options[2])
      else:
        num = np.random.randint(1, 4)
        parameters.append(non_binary_options[num-1])

    bits = ''.join(ind[12:14].astype(str))
    if bits == '00':
        parameters.append(weighting_options[0])
    elif bits == '01':
      parameters.append(weighting_options[1])
    #elif bits == '10':
    # parameters.append(weighting_options[2])
    else:
      #num = np.random.randint(1, 4)
      num = np.random.randint(1, 3)
      parameters.append(weighting_options[num-1])
      
    # Binary parameters

    for bit in ind[14:-3]:
      if bit:
        parameters.append(True)
      else:
        parameters.append(False)

    # int/float parameters

    min_range = (1, 3)
    max_range = (10, 15)
    token_min_filter = None
    token_max_filter = None

    if ind[-3]: 
      token_min_filter = np.random.randint(min_range[0], min_range[1]+1)
      token_max_filter = np.random.randint(max_range[0], max_range[1]+1)

    bits = ''.join(ind[-2:].astype(str))
    token_list = None
      
    if bits == '00':
      token_list = [[2, 1], -1, 3, 4]
    elif bits == '01':
      token_list = [2, 1]
    elif bits == '10':
      token_list = [-2, -1]
    else:
      token_list = [1]


    # Construct Text Model

    if token_min_filter and token_max_filter:
      mtc = TextModel(num_option=parameters[0], usr_option=parameters[1], url_option=parameters[2], emo_option=parameters[3], hashtag_option=parameters[4], 
                      ent_option=parameters[5], weighting=parameters[6], lc=parameters[7], del_dup=parameters[8], del_punc=parameters[9], del_diac=parameters[10],
                      token_min_filter=token_min_filter, token_max_filter=token_max_filter, 
                      token_list=token_list
                      )
      
    else:
      mtc = TextModel(num_option=parameters[0], usr_option=parameters[1], url_option=parameters[2], emo_option=parameters[3], hashtag_option=parameters[4], 
                        ent_option=parameters[5], weighting=parameters[6], lc=parameters[7], del_dup=parameters[8], del_punc=parameters[9], del_diac=parameters[10],
                        token_list=token_list
                        )

                        # np.random.randint(1, 5, (1, list_length)
      
    # Train Text Model

    #mtc.fit(train)
    #X = mtc.transform(train)
    #Xt = mtc.transform(test)

    #clf = LinearSVC().fit(X, y_train)
    #yp = clf.predict(Xt)

    #return f1_score(yt, yp, average='weighted'), mtc

    mtc.fit(self.X)
    X = mtc.transform(self.X)
    clf = LinearSVC()
    scores = cross_val_score(clf, X, self.y, cv=5, scoring='f1_weighted')

    return np.mean(scores), mtc

  def selection(self, population, fitness, models, uPopulation, uFitness, uModels):

    n = len(population)
    d = len(population[0])

    newPopulation = np.zeros((n,d),float)
    newFitness = []
    newModels = []

    for i in range(n):

        if fitness[i] >= uFitness[i]:
            newPopulation[i] = population[i]
            newFitness.append(fitness[i])
            newModels.append(models[i])
        else:
            newPopulation[i] = uPopulation[i]
            newFitness.append(uFitness[i])
            newModels.append(uModels[i])

    return newPopulation, newFitness, newModels

  def get_elite(self, population, fitness, models):

    idx = np.argmax(fitness)
    return population[idx], fitness[idx], models[idx]

  def describe_model(self, f, model, fitness):

    f.write(f"-------------------------------------\n")

    f.write(f"Best TextModel:\n\n")
    f.write(f"\tnum_option: {model.num_option}\n")
    f.write(f"\tusr_option: {model.usr_option}\n")
    f.write(f"\turl_option: {model.url_option}\n")
    f.write(f"\temo_option: {model.emo_option}\n")
    f.write(f"\thashtag_option: {model.hashtag_option}\n")
    f.write(f"\tent_option: {model.ent_option}\n")
    f.write(f"\tweighting: {model.weighting}\n")
    f.write(f"\tlc: {model.lc}\n")
    f.write(f"\tdel_dup: {model.del_dup}\n")
    f.write(f"\tdel_punc: {model.del_punc}\n")
    f.write(f"\tdel_diac: {model.del_diac}\n")
    f.write(f"\ttoken_min_filter: {model.token_min_filter}\n")
    f.write(f"\ttoken_max_filter: {model.token_max_filter}\n\n")

    f.write(f"F1-score: {fitness}")
  

  def run(self):

    f = open(self.results_file, "w")

    f.write(f"Number of tests: {self.numTests}\nMaximum iterations: {self.maxIterations}\nSize of population: {self.n}\nCR: {self.CR}\n---------------------------------------\n")
    
    masterElite, masterModelElite, masterFitnessElite = None, None, 0.0

    m = 0

    while m < self.numTests:

      print(f"m: {m}")

      f.write(f"Test #{m+1}\n\n")

      population = self.create_random_population()
      fitnessAndModel = np.array([ self.calculate_fitness(population[j,:], self.train, self.test, self.y, self.yt) for j in range(self.n) ])
      fitness = fitnessAndModel[:, 0]
      models = fitnessAndModel[:, 1]
      elite, fitnessElite, modelElite = self.get_elite(population, fitness, models)

      k = 0

      while k < self.maxIterations:

        print(f"\tk: {k}")

        vPopulation = self.mutation(population)
        uPopulation = self.crossover(population, vPopulation, self.CR)
        uFitnessAndModel = np.array([ self.calculate_fitness(uPopulation[j,:], self.train, self.test, self.y, self.yt) for j in range(self.n) ])
        uFitness = uFitnessAndModel[:, 0]
        uModels = uFitnessAndModel[:, 1]
        population, fitness, models = self.selection(population, fitness, models, uPopulation, uFitness, uModels)
        elite, fitnessElite, modelElite = self.get_elite(population, fitness, models)

        if k%5 == 0:
          f.write(f"k: {k} | fitnessElite: {fitnessElite} | elite: {elite}\n")

        print(f"k: {k} | fitnessElite: {fitnessElite} | elite: {elite}\n")

        k += 1

      if fitnessElite > masterFitnessElite:
        masterElite = elite
        masterFitnessElite = fitnessElite
        masterModelElite = modelElite

      f.write(f"\nmasterElite --> f1-score: {masterFitnessElite} | ind: {masterElite}\n\n")

      m += 1

    self.describe_model(f, masterModelElite, masterFitnessElite)

    f.close()

    return masterElite, masterFitnessElite, masterModelElite


### **Testing the algorithm**

In [None]:
import pandas as pd

In [None]:
data_route = '/content/drive/MyDrive/NLP/train.tsv'
results_route = '/content/drive/MyDrive/NLP/results'

In order to optimize the process, the most efficient `CR` needs to be found.

In [None]:
DE = DifferentialEvolution(1, 15, 5, 0.4, f"{results_route}_new_04.txt")
DE.set_train_test(pd.read_table(data_route))
elite, fitness, model = DE.run()
print(f"CR: 0.4\n\telite: {elite}\n\tfitness: {fitness}")

train: (3834,) | test: (1889,) | y: (3834,) | yt: (1889,)
m: 0
	k: 0
k: 0 | fitnessElite: 0.5690269411811275 | elite: [1. 0. 0. 1. 1. 1. 1. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 1.]

	k: 1
k: 1 | fitnessElite: 0.5690269411811275 | elite: [1. 0. 0. 1. 1. 1. 1. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 1.]

	k: 2
k: 2 | fitnessElite: 0.5690269411811275 | elite: [1. 0. 0. 1. 1. 1. 1. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 1.]

	k: 3
k: 3 | fitnessElite: 0.6120780449208507 | elite: [0. 0. 0. 0. 1. 1. 1. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0.]

	k: 4
k: 4 | fitnessElite: 0.6120780449208507 | elite: [0. 0. 0. 0. 1. 1. 1. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0.]

	k: 5
k: 5 | fitnessElite: 0.6120780449208507 | elite: [0. 0. 0. 0. 1. 1. 1. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0.]

	k: 6
k: 6 | fitnessElite: 0.6168376666664065 | elite: [0. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]

	k: 7
k: 7 | fitnessElite: 0.6168376666664065 | elite: [0. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0

In [None]:
for CR in np.linspace(0.1, 1, num=10)[:-1]:
  print(f"CR: {CR}")
  DE = DifferentialEvolution(1, 15, 5, CR, f"{results_route}-CR{int(CR*10)}.txt")
  DE.set_train_test(pd.read_table(data_route))
  elite, fitness, model = DE.run()
  print(f"CR: {CR}\n\telite: {elite}\n\tfitness: {fitness}")

CR: 0.1
train: (3834,) | test: (1889,) | y: (3834,) | yt: (1889,)
m: 0
	k: 0
k: 0 | fitnessElite: 0.4869402681837238 | elite: [0. 0. 1. 0. 1. 0. 1. 1. 1. 0. 1. 1. 0. 0. 0. 1. 1. 1. 1. 0. 0.]

	k: 1
k: 1 | fitnessElite: 0.5248672755773183 | elite: [1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 1.]

	k: 2
k: 2 | fitnessElite: 0.5248672755773183 | elite: [1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 1.]

	k: 3
k: 3 | fitnessElite: 0.5248672755773183 | elite: [1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 1.]

	k: 4
k: 4 | fitnessElite: 0.5248672755773183 | elite: [1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 1.]

	k: 5
k: 5 | fitnessElite: 0.5248672755773183 | elite: [1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 1.]

	k: 6
k: 6 | fitnessElite: 0.5248672755773183 | elite: [1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 1.]

	k: 7
k: 7 | fitnessElite: 0.5248672755773183 | elite: [1. 0. 0. 0. 1. 0. 0. 0. 0.

  return [(i, x/n) for i, x in r]


k: 12 | fitnessElite: 0.5807275182573645 | elite: [1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1.]

	k: 13
k: 13 | fitnessElite: 0.5807275182573645 | elite: [1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1.]

	k: 14
k: 14 | fitnessElite: 0.5807275182573645 | elite: [1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1.]

CR: 0.4
	elite: [1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1.]
	fitness: 0.5807275182573645
CR: 0.5
train: (3834,) | test: (1889,) | y: (3834,) | yt: (1889,)
m: 0


  return [(i, x/n) for i, x in r]


	k: 0
k: 0 | fitnessElite: 0.4977425332031296 | elite: [1. 1. 1. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 0.]

	k: 1
k: 1 | fitnessElite: 0.5420035230530461 | elite: [0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 1. 0. 1. 0.]

	k: 2
k: 2 | fitnessElite: 0.5638617633065357 | elite: [1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 0.]

	k: 3
k: 3 | fitnessElite: 0.5711142300202257 | elite: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 0. 1. 0.]

	k: 4
k: 4 | fitnessElite: 0.5712963638925574 | elite: [0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 1. 0.]

	k: 5
k: 5 | fitnessElite: 0.5712963638925574 | elite: [0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 1. 0.]

	k: 6
k: 6 | fitnessElite: 0.5712963638925574 | elite: [0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 1. 0.]

	k: 7
k: 7 | fitnessElite: 0.5712963638925574 | elite: [0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 1. 0.]

	k: 8
k: 8 | fitnessElite: 0.571

  return [(i, x/n) for i, x in r]


	k: 0
k: 0 | fitnessElite: 0.5937335646444118 | elite: [0. 0. 0. 1. 1. 0. 0. 1. 1. 1. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]

	k: 1
k: 1 | fitnessElite: 0.5937335646444118 | elite: [0. 0. 0. 1. 1. 0. 0. 1. 1. 1. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]

	k: 2
k: 2 | fitnessElite: 0.6101347004965008 | elite: [1. 1. 1. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]

	k: 3
k: 3 | fitnessElite: 0.6129095348613064 | elite: [0. 1. 1. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]

	k: 4
k: 4 | fitnessElite: 0.6129095348613064 | elite: [0. 1. 1. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]

	k: 5
k: 5 | fitnessElite: 0.6129095348613064 | elite: [0. 1. 1. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]

	k: 6
k: 6 | fitnessElite: 0.6129095348613064 | elite: [0. 1. 1. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]

	k: 7
k: 7 | fitnessElite: 0.6129219501848213 | elite: [0. 0. 1. 0. 1. 0. 0. 1. 1. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0.]

	k: 8
k: 8 | fitnessElite: 0.612

  return [(i, x/n) for i, x in r]


k: 0 | fitnessElite: 0.5955742392807267 | elite: [1. 0. 0. 1. 1. 1. 1. 0. 0. 0. 1. 0. 1. 1. 1. 1. 0. 1. 0. 0. 0.]

	k: 1
k: 1 | fitnessElite: 0.5955742392807267 | elite: [1. 0. 0. 1. 1. 1. 1. 0. 0. 0. 1. 0. 1. 1. 1. 1. 0. 1. 0. 0. 0.]

	k: 2
k: 2 | fitnessElite: 0.5955742392807267 | elite: [1. 0. 0. 1. 1. 1. 1. 0. 0. 0. 1. 0. 1. 1. 1. 1. 0. 1. 0. 0. 0.]

	k: 3
k: 3 | fitnessElite: 0.6197161568256784 | elite: [0. 0. 0. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]

	k: 4
k: 4 | fitnessElite: 0.6229748555625161 | elite: [0. 0. 0. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]

	k: 5
k: 5 | fitnessElite: 0.6229748555625161 | elite: [0. 0. 0. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]

	k: 6
k: 6 | fitnessElite: 0.6229748555625161 | elite: [0. 0. 0. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]

	k: 7
k: 7 | fitnessElite: 0.6229748555625161 | elite: [0. 0. 0. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]

	k: 8
k: 8 | fitnessElite: 0.622974855