# Voice leading reinforcement learning agents.

## Introduction.
[...]

$$
\mathbb{E}[\text{return}|\alpha, s]
\ =\ 
\text{reward}+\underset{\ \beta\ \in\ \mathcal{A}_{\alpha(s)}\!\!}{\text{max}}\mathbb{E}\left[\text{return}|\beta, \alpha(s)\right]
$$

$$
v(\alpha, s)
\ =\ 
R(\alpha)+\underset{\ \beta\ \in\ \mathcal{A}_{\alpha(s)}\!\!}{\text{max}}v\big(\beta, \alpha(s)\big)
$$

In [1]:
import copy

import random
import math
import numpy as np
from inspect import isfunction

import torch
import torch.nn as nn

from tqdm import tqdm

___

## Classes for various aspects of music theory.
The various Python classes we define in this section collect important aspects of music theory relevant to problem of voice leading. Using MIDI standard encoding for instance, every note in the scale can be assigned an integer value between $0$ and $127$. In this way, a solution to any voice leading problem can be encoded completely numerically. However, the reward functions for the sequence of step-by-step actions that constitute a proposed solution to a voice leading problem depend on musical theoretical considerations. We will use the classes we define in the present section in order to evaluation remards for our agent's actions.

### Classes related to harmony and melody.

#### Class: `Notes`
Parent(s): *none*

Constructor arguments: *none*

In [2]:
class Notes():
    def __init__(self):
        
        valmod12_to_class = {0: ('C', 'C'),
            1: ('C♯', 'D♭'),
            2: ('D', 'D'),
            3: ('D♯', 'E♭'),
            4: ('E', 'E'),
            5: ('F', 'F'),
            6: ('F♯', 'G♭'),
            7: ('G', 'G'),
            8: ('G♯', 'A♭'),
            9: ('A', 'A'),
            10: ('A♯', 'B♭'),
            11: ('B', 'B')}
        self.valmod12_to_class = valmod12_to_class
        
        all_note_class_names = []
        for key in self.valmod12_to_class:
            class_pair = self.valmod12_to_class[key]
            all_note_class_names.append(class_pair[0])
            all_note_class_names.append(class_pair[1])
        self.all_note_class_names = all_note_class_names
        
        class_to_valmod12 = {}
        for key in self.valmod12_to_class:
            class_pair = self.valmod12_to_class[key]
            for entry in class_pair:
                class_to_valmod12.update({entry: key})
        self.class_to_valmod12 = class_to_valmod12
        
        value_to_class = {}
        for value in range(128):
            valmod12 = value%12
            class_pair = self.valmod12_to_class[valmod12]
            value_to_class.update({value: class_pair})
        self.value_to_class = value_to_class
        
        note_to_value = {}
        for value in self.value_to_class:
            class_pair = self.value_to_class[value]
            sharp_class = class_pair[0]
            flat_class = class_pair[1]
            valmod12 = value%12
            octave = -1 + int((value - valmod12)/12)
            note_to_value.update({sharp_class+'{}'.format(octave): value})
            note_to_value.update({flat_class+'{}'.format(octave): value})
        self.note_to_value = note_to_value

Testing:

In [3]:
notes = Notes()
print(notes.valmod12_to_class[8][0] == 'G♯')
print(notes.class_to_valmod12['E♭'] == 3)
print(notes.value_to_class[54] == ('F♯', 'G♭'))
print(notes.note_to_value['E♭2'] == 39)

True
True
True
True


#### Class: `Scales`
Parent(s):

Constructor arguments: *none*

In [4]:
class Scales():
    def __init__(self):
        
        # Construct modern mode degrees, ascending and descending, as attributes:
        self.long_step_sequence = [2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2]
        
        self.mode_start = {'Ionian': 0,
            'Dorian': 1,
            'Phrygian': 2,
            'Lydian': 3,
            'Mixolydian': 4,
            'Aeolian': 5,
            'Locrian': 6}
        
        modern_mode_steps = {}
        for key, value in self.mode_start.items():
            mode = key
            start_position = value
            current_mode_steps = [self.long_step_sequence[i] for i in range(start_position, start_position+7)]
            modern_mode_steps.update({mode: current_mode_steps})
        self.modern_mode_steps = modern_mode_steps
        
        updown_mode_degrees = {}
        for key, value in self.modern_mode_steps.items():
            mode = key
            step_sequence = value
            degree_sequence = [0]
            for i, step in enumerate(step_sequence):
                scale_degree = degree_sequence[i]
                new_scale_degree = (scale_degree + step)%12
                degree_sequence.append(new_scale_degree)
                rev_degree_sequence = degree_sequence[::-1]
            updown_mode_degrees.update({mode: {'up': copy.deepcopy(degree_sequence),
                                     'down': copy.deepcopy(rev_degree_sequence)}})
        
        # Construct Major mode degrees, ascending and descending, as attributes:
        major_updown = updown_mode_degrees['Ionian']
        updown_mode_degrees.update({'Major': copy.deepcopy(major_updown)})

        # Construct Natural minor mode degrees, ascending and descending, as attributes:
        natural_minor_updown = updown_mode_degrees['Aeolian']
        updown_mode_degrees.update({'Natural_minor': copy.deepcopy(natural_minor_updown)})

        # Construct Harmonic minor mode degrees, ascending and descending, as attributes:
        harmonic_minor_steps = [2, 1, 2, 2, 1, 3, 1]
        harmonic_minor_degree_sequence = [0]
        for i, step in enumerate(harmonic_minor_steps):
            scale_degree = harmonic_minor_degree_sequence[i]
            new_scale_degree = (scale_degree + step)%12
            harmonic_minor_degree_sequence.append(new_scale_degree)
            rev_harmonic_minor_degree_sequence = harmonic_minor_degree_sequence[::-1]
        updown_mode_degrees.update({'Harmonic_minor': {'up': copy.deepcopy(harmonic_minor_degree_sequence),
                                                     'down': copy.deepcopy(rev_harmonic_minor_degree_sequence)}})

        # Construct Melodic minor mode degrees, ascending and descending, as attributes:
        melodic_minor_steps_up = [2, 2, 1, 2, 2, 2, 1]
        melodic_minor_degrees_up = [0]
        for i in range(7):
            current_degree = melodic_minor_degrees_up[i]
            next_degree = (current_degree + melodic_minor_steps_up[i])%12
            melodic_minor_degrees_up.append(next_degree)
        melodic_minor_steps_down = [2, 2, 1, 2, 1, 2, 2]
        melodic_minor_degrees_down = [0]
        for i in range(7):
            current_degree = melodic_minor_degrees_down[i]
            next_degree = (current_degree - melodic_minor_steps_down[i])%12
            melodic_minor_degrees_down.append(next_degree)
        updown_mode_degrees.update({'Melodic_minor': {'up': copy.deepcopy(melodic_minor_degrees_up),
                                                    'down': copy.deepcopy(melodic_minor_degrees_down)}})

        # Combine all ascending and descending mode degrees into attribute dictionary:
        self.updown_mode_degrees = updown_mode_degrees

        # Collect all modes constructed as list attribute:
        mode_list = [key for key in self.updown_mode_degrees]
        self.mode_list = mode_list
 
    # Method for querying the ascending/descending mode degree dictionary attribute:
    def updown_degrees(self, mode):
        assert mode in self.mode_list
        output = self.updown_mode_degrees[mode]
        return output
        

Testing:

In [5]:
scales = Scales()

#### Class: `Key`
Parent(s):

Constructor arguments:
* *root* = `'C'`, 
* *mode* = `'Major'`

**Important.** The constructor for the `Key` class constructs an instance each of the `Notes` and `Scales` classes as attributes of `Key`.

In [6]:
class Key():
    def __init__(self,
                 root = 'C',
                 mode = 'Major'):
        
        self.notes = Notes()
        self.scales = Scales()
        
        assert root in self.notes.all_note_class_names
        assert mode in self.scales.mode_list
        
        self.root_class = root
        self.root_valmod12 = self.notes.class_to_valmod12[self.root_class]
        
        self.mode = mode
        
        self.scale_degrees = self.scales.updown_degrees(mode = self.mode)
        self.up_degrees = self.scale_degrees['up']
        self.triad_degrees = [self.up_degrees[i] for i in [0,2,4]]
        self.triad_valsmod12 = [(self.root_valmod12 + degree)%12 for degree in self.triad_degrees]

Testing:

In [7]:
key = Key(root = 'E', mode = 'Melodic_minor')
print(key.triad_degrees == [0, 4, 7])
print(key.triad_valsmod12 == [4, 8, 11])

True
True


## Classes for rewards.

### Classes for progress-to-final-interval rewards.

#### Class: `SmallProgtoFinScheme`
Parent(s): *none*

Constructor arguments:

In [8]:
class SmallProgtoFinSchema():
    def __init__(self):
        pass
        
    def reward(self,
        chord_0 = np.array([73, 76]),
        chord_1 = np.array([72, 74]),
        final_chord = np.array([35, 45])):
        
        assert len(chord_0) == len(chord_1) == len(final_chord)
        
        n = len(final_chord)
        
        centroid_0 = np.sum(chord_0)/n
        centroid_1 = np.sum(chord_1)/n
        final_centroid = np.sum(final_chord)/n
        
        needed_change = final_centroid - centroid_0
        #needed_direction = np.sign(needed_change)
        
        actual_change = centroid_1 - centroid_0
        #actual_direction = np.sign(actual_change)
        
        if actual_change == 0.:
            change_ratio = -100.
        else:
            change_ratio = needed_change/actual_change
        
        return change_ratio

Testing:

In [9]:
small_prog_to_fin_scheme = SmallProgtoFinSchema()

print(small_prog_to_fin_scheme.reward(chord_0 = np.array([73, 76]),
                                      chord_1 = np.array([73, 75]),
                                      final_chord = np.array([35, 45])))

print(small_prog_to_fin_scheme.reward(chord_0 = np.array([73, 76]),
                                      chord_1 = np.array([73, 75]),
                                      final_chord = np.array([73, 77])))

print(small_prog_to_fin_scheme.reward(chord_0 = np.array([73, 76]),
                                      chord_1 = np.array([73, 76]),
                                      final_chord = np.array([74, 78])))

69.0
-1.0
-100.0


___

## Agent classes.

#### Function: `randinterval`
Arguments: *none*

**Remark.** It occurs to me that I was introducing inductive biases into the agent with the way I had this `randchord` function written. I was randomly selecting an `np.array` $[i_0, i_1, \dots, i_{n-1}]$ by selecting $i_0$, then selecting $i_1>i_0$, and so on, within a given range. With a bit of thought though, it becomes clear that this method is not *independent and identically distributed* (iid). For instance, if we draw a pair or integers $i_0 < i_1$ from the set $[0,1,2]$ using this method, then the probability density ends up being
$$\rho([0,1])=\tfrac{1}{4},\ \ \ \ \ \ \rho([0,2])=\tfrac{1}{4},\ \ \ \ \ \ \text{and}\ \ \ \ \ \ \rho([1,2])=\tfrac{1}{2}.$$

In [10]:
class RandomChord():
    def __init__(self, chord_size = 3, lower_limit = 0, upper_limit = 127):
        assert isinstance(chord_size, int)
        assert chord_size > 0
        assert isinstance(lower_limit, int)
        assert isinstance(upper_limit, int)
        assert lower_limit <= upper_limit
        assert chord_size <= upper_limit - lower_limit + 1
        
        self.chord_size = chord_size
        self.lower_limit = lower_limit
        self.upper_limit = upper_limit
        
        admissible_chords = [[k] for k in range(self.lower_limit, self.upper_limit + 2 - self.chord_size)]
        for i in range(1, self.chord_size):
            new_admissible_chords = []
            for running_chord in admissible_chords:
                new_lower_limit = running_chord[-1] + 1
                for k in range(new_lower_limit, self.upper_limit + 2 - self.chord_size + i):
                    new_chord = running_chord + [k] 
                    new_admissible_chords.append(new_chord)
            admissible_chords = new_admissible_chords
        self.admissible_chords = admissible_chords
        
    
    def sample(self):
        chord = random.choice(self.admissible_chords)
        return chord

Testing:

In [11]:
random_chord = RandomChord(chord_size = 3, lower_limit = 0, upper_limit = 5)

for i in range(4):
    output = random_chord.sample()
    print(output[0]<output[1]<output[2])

True
True
True
True


#### Class: `ActionValue_Spec1`
Parent(s): `torch.nn.Module`

Constructor arguments:
* *layer_count* = `6`
* *layer_features* = `1000`

**Remark: `ReLU` versus `Softmax`.** Because we're implicitly using the *greedy policy*, which, at each state $s$, always selects the action $\alpha$ that maximizes the action-value $v_{\text{greed}}(s,\alpha)$, it might appear that the neural network that approximates $v_{\text{greed}}(s,\alpha)$ should use *softmax* activation at its final layer. However, the specific value of $v_{\text{greed}}(s,\alpha)$ is also important. This activation function $v_{\text{greed}}(s,\alpha)$ is supposed to output the *excpected return* $\mathbb{E}_{\text{greed}}[G|\alpha,\pi]$, which is a (potentially weighted) sum of all future rewards that the agent will obtain under the greedy policy. Because we've already specified our rewards implicitly in the various reward functions we defined above, we will run into trouble if we use softmax. Indeed, $0\le \text{softmax}(x)\le 1$, whereas our reward functions can tske all sorts of integer values, sometimes negative. Thus is makes more sense to use `ReLU` or `LeakyReLU` for activation in our neural network.

In [12]:
def onehot_tensor(index, length):
    assert isinstance(index, int)
    assert isinstance(length, int)
    assert 0 <= index <= length
    
    onehot = torch.Tensor([float(i == index) for i in range(length)])
    
    return onehot

In [13]:
class ActionValue_01(nn.Module):
    def __init__(self,
                 chord_size = 3,
                 lower_limit = 0,
                 upper_limit = 5,
                 layer_count = 8,
                 layer_features = 1000):
        super().__init__()
        
        self.chord_size = chord_size
        self.lower_limit = lower_limit
        self.upper_limit = upper_limit
        
        self.random_chord = RandomChord(chord_size = self.chord_size,
                                        lower_limit = self.lower_limit,
                                        upper_limit = self.upper_limit)
        
        self.admissible_chord_count = len(self.random_chord.admissible_chords)
        
        self.index_to_chord = {i: chord for i, chord in enumerate(self.random_chord.admissible_chords)}
        self.chord_to_index = {tuple(chord): i for i, chord in enumerate(self.random_chord.admissible_chords)}
        self.chord_to_tensor = {tuple(chord): onehot_tensor(index, self.admissible_chord_count) \
                                for index, chord in enumerate(self.random_chord.admissible_chords)}
        
        
        assert isinstance(layer_count, int)
        assert layer_count > 0
        assert isinstance(layer_features, int)
        assert layer_features > 0
        
        self.layer_count = layer_count
        self.layer_features = layer_features
        
        self.layers = nn.ModuleList()
        # Critical here: the `3` in our `in_features = 3 * self.admissible_chord_count` comes from:
        # 0: current state
        # 1: next state
        # 2: final (goal) state
        # The action variable here is implict in the assignment `step current state ← next state`
        self.layers.append(nn.Linear(in_features = 3 * self.admissible_chord_count, out_features = self.layer_features))
        for k in range(self.layer_count-2):
            self.layers.append(nn.Linear(in_features = self.layer_features, out_features = self.layer_features))
        self.layers.append(nn.Linear(in_features = self.layer_features, out_features = 1))
        
        self.activation = nn.ELU()
        
        
    def forward(self, x):
        features = x
        for i, layer in enumerate(self.layers):
            #print(i)
            activated_features = self.activation(features)
            features = layer(activated_features)
            
        return features
    
    
    def greedy_action(self, chord_0, final_chord):
        assert isinstance(chord_0, list)
        assert isinstance(final_chord, list)
        assert len(chord_0) == len(final_chord) == self.random_chord.chord_size
        for i in range(self.random_chord.chord_size):
            assert isinstance(chord_0[i], int)
            assert self.random_chord.lower_limit <= chord_0[i] <= self.random_chord.upper_limit
            assert isinstance(final_chord[i], int)
            assert self.random_chord.lower_limit <= final_chord[i] <= self.random_chord.upper_limit
            
        onehot_0 = self.chord_to_tensor[tuple(chord_0)]
        final_onehot = self.chord_to_tensor[tuple(final_chord)]

        max_index = -math.inf
        for index in range(self.admissible_chord_count):
            test_chord = self.index_to_chord[index]
            test_onehot = self.chord_to_tensor[tuple(test_chord)]
            
            model_input = torch.cat((onehot_0,
                                     test_onehot,
                                     final_onehot))
            
            model_output = self.forward(model_input)
            
            if model_output.item() > max_index:
                max_index = model_output.item()
                chord_that_maximizes = test_chord
                
        return chord_that_maximizes
    
    
    def epsilon_greedy_action(self, epsilon, chord_0, final_chord):
        assert isinstance(epsilon, float)
        assert 0.0 <= epsilon <= 1.0
        
        greedy_or_random = np.random.choice(['greedy', 'random'], p=[1-epsilon, epsilon])
        
        if greedy_or_random == 'greedy':
            output_chord = self.greedy_action(chord_0, final_chord)
        elif greedy_or_random == 'random':
            output_chord = self.random_chord.sample()
        
        return output_chord

Testing:

In [14]:
action_value = ActionValue_01(chord_size = 3,
                              lower_limit = 36,
                              upper_limit = 60,
                              layer_count = 4,
                              layer_features = 500)

x = torch.rand(3 * action_value.admissible_chord_count)
print(action_value(x))

g_a = action_value.greedy_action([36, 38, 40], [37, 39, 44])
print(g_a)

trial_count = 20
eps = 0
for i in tqdm(range(trial_count)):
    e_g_a = action_value.epsilon_greedy_action(.1, [36, 38, 40], [37, 39, 44])
    if e_g_a == g_a:
        eps += 1
print(1-(eps/trial_count))

tensor([0.0072], grad_fn=<AddBackward0>)
[36, 54, 57]


100%|███████████████████████████████████████████| 20/20 [00:22<00:00,  1.13s/it]

0.050000000000000044





### "First species" voice leading reinforcement learning agent.

#### Class: `Agent_01`

Constructor arguments:
* *action_value* = `ActionValue_01`, 
* *start_chord* = `[0, 1, 2]`, 
* *end_chord* = `[3, 4, 5]`, 

**What it does.** 

**Remark: Tips for future `Agent_##`s.** I

In [15]:
class Agent_01():
    def __init__(self,
                 action_value = ActionValue_01(),
                 start_chord = [0, 1, 2],
                 final_chord = [3, 4, 5]):
        
        assert isinstance(action_value, ActionValue_01)
        
        self.action_value = action_value
        
        self.chord_size = self.action_value.random_chord.chord_size
        
        self.lower_limit = self.action_value.random_chord.lower_limit
        self.upper_limit = self.action_value.random_chord.upper_limit
        
        self.chord_to_tensor = self.action_value.chord_to_tensor
        
        assert isinstance(start_chord, list)
        assert isinstance(final_chord, list)
        assert len(start_chord) == len(final_chord) == self.chord_size
        for i in range(self.chord_size):
            assert isinstance(start_chord[i], int)
            assert isinstance(final_chord[i], int)
            assert self.lower_limit <= start_chord[i] <= self.upper_limit
            assert self.lower_limit <= final_chord[i] <= self.upper_limit
        
        self.start_chord = start_chord
        self.start_tensor = self.chord_to_tensor[tuple(self.start_chord)]
        
        self.final_chord = final_chord
        self.final_tensor = self.chord_to_tensor[tuple(self.final_chord)]
        
        self.chord_episode = [self.start_chord]
        self.tensor_episode = [self.start_tensor]
        
        self.small_prog_to_fin_scheme = SmallProgtoFinSchema()
        
        
    def next_interval(self, epsilon = 0.1):
        chord_0 = self.chord_episode[-1]
        
        next_chord = action_value.epsilon_greedy_action(epsilon, chord_0, self.final_chord)
        next_tensor = self.chord_to_tensor[tuple(next_chord)]
        
        self.chord_episode.append(next_chord)
        self.tensor_episode.append(next_tensor)
    

    def last_reward(self):
        assert len(self.chord_episode) > 1
        
        last_chord = self.chord_episode[-2]
        last_action = self.chord_episode[-1]
        
        if last_action == self.final_chord:
            reward = 1000.0
        else:
            reward = self.small_prog_to_fin_scheme.reward(chord_0 = np.array(last_chord),
                                                          chord_1 = np.array(last_action),
                                                          final_chord = np.array(self.final_chord))
            
        return reward

Testing:

In [16]:
action_value = ActionValue_01(chord_size = 3,
                              lower_limit = 36,
                              upper_limit = 60,
                              layer_count = 4,
                              layer_features = 500)

agent = Agent_01(action_value = action_value,
                 start_chord = [36, 38, 40],
                 final_chord = [37, 39, 44])

print(agent.tensor_episode, '\n')

agent.next_interval()
print(agent.tensor_episode)
print(agent.last_reward(), '\n')

agent.next_interval()
print(agent.tensor_episode)
print(agent.last_reward(), '\n')

agent.next_interval()
print(agent.tensor_episode)
print(agent.last_reward(), '\n')

agent.next_interval()
print(agent.tensor_episode)
print(agent.last_reward(), '\n')

[tensor([0., 0., 0.,  ..., 0., 0., 0.])] 

[tensor([0., 0., 0.,  ..., 0., 0., 0.]), tensor([0., 0., 0.,  ..., 0., 0., 0.])]
0.2 

[tensor([0., 0., 0.,  ..., 0., 0., 0.]), tensor([0., 0., 0.,  ..., 0., 0., 0.]), tensor([0., 0., 0.,  ..., 0., 0., 0.])]
-100.0 

[tensor([0., 0., 0.,  ..., 0., 0., 0.]), tensor([0., 0., 0.,  ..., 0., 0., 0.]), tensor([0., 0., 0.,  ..., 0., 0., 0.]), tensor([0., 0., 0.,  ..., 0., 0., 0.])]
-100.0 

[tensor([0., 0., 0.,  ..., 0., 0., 0.]), tensor([0., 0., 0.,  ..., 0., 0., 0.]), tensor([0., 0., 0.,  ..., 0., 0., 0.]), tensor([0., 0., 0.,  ..., 0., 0., 0.]), tensor([0., 0., 0.,  ..., 0., 0., 0.])]
-100.0 



___

## Training loop(s).

### Attempt 1.
#### Verdict(s): 

In [28]:
action_value = ActionValue_01(chord_size = 1,
                              lower_limit = 36,
                              upper_limit = 48,
                              layer_count = 5,
                              layer_features = 100)

print(action_value.random_chord.admissible_chords)
print(len(action_value.random_chord.admissible_chords))

agent = Agent_01(action_value = action_value,
                 start_chord = [36],
                 final_chord = [39])

[[36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48]]
13


In [49]:
episode_count = 10000
present_bias = 1.0
learning_rate = 0.0001

epsilon = 0.25

loss_function = nn.MSELoss().double()

# Optimizers specified in the torch.optim package
optimizer = torch.optim.SGD(action_value.parameters(), lr = learning_rate)

goal_count = 0
total_actions = 0

for episode_number in tqdm(range(episode_count)):
    
    # print('\n________________________________________________________________________\n')
    
    print('Epsiode count:', episode_number, '\n')
    
    random_starting_chord = action_value.random_chord.sample()
    starting_tensor = action_value.chord_to_tensor[tuple(random_starting_chord)]
    
    agent.chord_episode = [random_starting_chord]
    agent.tensor_episode = [starting_tensor]
    
    agent.final_chord = action_value.random_chord.sample()
    agent.final_tensor = agent.chord_to_tensor[tuple(agent.final_chord)]
    
    episode_loss = 0.0
    action_count = 0
    
    reward_list = []
    
    while (agent.chord_episode[-1] != agent.final_chord) and (len(agent.chord_episode) <= 1000):
        # print('Starting chord:', random_starting_chord, '   ', 'Final goal chord:', agent.final_chord)
        # print('Running state count:', len(agent.chord_episode))
        
        agent.next_interval(epsilon = epsilon)
        
        reward = torch.Tensor([agent.last_reward()])
        reward_list.append(reward)
        
        action_count += 1

        if len(agent.chord_episode) >= 3:
            
            previous_tensor_state = agent.tensor_episode[-3]
            present_tensor_state = agent.tensor_episode[-2]
            next_tensor_state = agent.tensor_episode[-1]
            
            # print('Current chord window:', [agent.chord_episode[-3],
            #            agent.chord_episode[-2],
            #            agent.chord_episode[-1]])
            
            #print(len(previous_tensor_state))
            #print(len(present_tensor_state))
            #print(len(agent.final_tensor))
            
            previous_expected_return = action_value(torch.cat([previous_tensor_state,
                                                               present_tensor_state,
                                                               agent.final_tensor]))
            
            present_expected_return = action_value(torch.cat([present_tensor_state,
                                                              next_tensor_state,
                                                              agent.final_tensor]))
            present_expected_return = torch.Tensor([present_expected_return.item()])
            
            present_reward = reward_list[-1]
            
            # zero the parameter gradients
            optimizer.zero_grad()
            
            # Compute the loss and its gradients
            loss = loss_function(present_reward + present_bias * present_expected_return,
                                 previous_expected_return).double()
           
            episode_loss += loss.item()

            loss.backward()

            # Adjust learning weights
            optimizer.step()
            
            # print('Loss at this action:', loss.item(), '\n')
            

    if (agent.chord_episode[-1] == agent.final_chord) and len(agent.chord_episode)>1:
        
        goal_count += 1
        
        print('**********************************\n',)
        
        present_tensor_state = agent.tensor_episode[-2]
        next_tensor_state = agent.tensor_episode[-1]

        present_expected_return = action_value(torch.cat([present_tensor_state,
                                                          next_tensor_state,
                                                          agent.final_tensor]))

        present_reward = reward_list[-1]
            
        # zero the parameter gradients
        optimizer.zero_grad()

        # Compute the loss and its gradients
        loss = loss_function(present_reward, present_expected_return).double()

        episode_loss += loss.item()

        loss.backward()

        # Adjust learning weights
        optimizer.step()
            
        # print('Loss at this action:', loss.item(), '\n')
    
    total_actions += action_count
        
    print('Goals per spisode:', goal_count/(1+episode_number))
    print('Average loss:', episode_loss/(1+action_count))    
    print('Actions per episode:', total_actions/(1+episode_number))    
        
     
    # if episode_number%100 == 0:
    #     print('Total loss this episode:', episode_loss)
    #print('\n', reward_list)


  0%|                                         | 1/10000 [00:00<26:59,  6.17it/s]

Epsiode count: 0 

**********************************

Goals per spisode: 1.0
Average loss: 78802.21589396788
Actions per episode: 83.0
Epsiode count: 1 

**********************************

Goals per spisode: 1.0
Average loss: 1951445.5
Actions per episode: 42.5
Epsiode count: 2 



  0%|                                         | 4/10000 [00:00<11:41, 14.25it/s]

**********************************

Goals per spisode: 1.0
Average loss: 151404.3881854672
Actions per episode: 41.333333333333336
Epsiode count: 3 

**********************************

Goals per spisode: 1.0
Average loss: 154474.00364783642
Actions per episode: 40.25
Epsiode count: 4 

**********************************

Goals per spisode: 1.0
Average loss: 346025.3376459414
Actions per episode: 35.2
Epsiode count: 5 

**********************************

Goals per spisode: 1.0
Average loss: 376974.66891334736
Actions per episode: 31.5
Epsiode count: 6 



  0%|                                         | 9/10000 [00:00<10:57, 15.21it/s]

**********************************

Goals per spisode: 1.0
Average loss: 88482.47525822461
Actions per episode: 36.285714285714285
Epsiode count: 7 

**********************************

Goals per spisode: 1.0
Average loss: 97458.22467444938
Actions per episode: 39.0
Epsiode count: 8 

**********************************

Goals per spisode: 1.0
Average loss: 309883.1791539727
Actions per episode: 36.44444444444444
Epsiode count: 9 

**********************************

Goals per spisode: 1.0
Average loss: 2014890.875
Actions per episode: 32.9
Epsiode count: 10 



  0%|                                        | 12/10000 [00:00<11:48, 14.10it/s]

**********************************

Goals per spisode: 1.0
Average loss: 113829.03996726393
Actions per episode: 34.0
Epsiode count: 11 

**********************************

Goals per spisode: 1.0
Average loss: 66888.29246556846
Actions per episode: 38.25
Epsiode count: 12 



  0%|                                        | 14/10000 [00:01<14:27, 11.51it/s]

**********************************

Goals per spisode: 1.0
Average loss: 62151.689068076645
Actions per episode: 42.53846153846154
Epsiode count: 13 

**********************************

Goals per spisode: 1.0
Average loss: 119120.02790651533
Actions per episode: 42.857142857142854
Epsiode count: 14 

**********************************

Goals per spisode: 1.0
Average loss: 2106221.75
Actions per episode: 40.06666666666667
Epsiode count: 15 

Goals per spisode: 0.9375
Average loss: 0.0
Actions per episode: 37.5625
Epsiode count: 16 

**********************************

Goals per spisode: 0.9411764705882353
Average loss: 340475.19162069884
Actions per episode: 36.1764705882353
Epsiode count: 17 

**********************************

Goals per spisode: 0.9444444444444444
Average loss: 543780.4444444445
Actions per episode: 34.611111111111114
Epsiode count: 18 



  0%|                                        | 23/10000 [00:01<10:02, 16.57it/s]

**********************************

Goals per spisode: 0.9473684210526315
Average loss: 48263.73321018249
Actions per episode: 39.1578947368421
Epsiode count: 19 

**********************************

Goals per spisode: 0.95
Average loss: 176490.74495197576
Actions per episode: 38.6
Epsiode count: 20 

Goals per spisode: 0.9047619047619048
Average loss: 0.0
Actions per episode: 36.76190476190476
Epsiode count: 21 

**********************************

Goals per spisode: 0.9090909090909091
Average loss: 251811.68700085432
Actions per episode: 35.95454545454545
Epsiode count: 22 

**********************************

Goals per spisode: 0.9130434782608695
Average loss: 88791.0120340222
Actions per episode: 36.91304347826087
Epsiode count: 23 



  0%|                                        | 30/10000 [00:01<08:10, 20.33it/s]

**********************************

Goals per spisode: 0.9166666666666666
Average loss: 90090.54704696313
Actions per episode: 37.75
Epsiode count: 24 

**********************************

Goals per spisode: 0.92
Average loss: 185183.21194305643
Actions per episode: 37.28
Epsiode count: 25 

**********************************

Goals per spisode: 0.9230769230769231
Average loss: 1833638.625
Actions per episode: 35.88461538461539
Epsiode count: 26 

**********************************

Goals per spisode: 0.9259259259259259
Average loss: 1481745.4166666667
Actions per episode: 34.629629629629626
Epsiode count: 27 

**********************************

Goals per spisode: 0.9285714285714286
Average loss: 213932.25983792543
Actions per episode: 34.107142857142854
Epsiode count: 28 

Goals per spisode: 0.896551724137931
Average loss: 0.0
Actions per episode: 32.93103448275862
Epsiode count: 29 

**********************************

Goals per spisode: 0.9
Average loss: 82090.70433616417
Actions p

  0%|▏                                       | 37/10000 [00:02<10:18, 16.10it/s]

**********************************

Goals per spisode: 0.8823529411764706
Average loss: 34473.8549655864
Actions per episode: 36.205882352941174
Epsiode count: 34 

Goals per spisode: 0.8571428571428571
Average loss: 0.0
Actions per episode: 35.17142857142857
Epsiode count: 35 

Goals per spisode: 0.8333333333333334
Average loss: 0.0
Actions per episode: 34.19444444444444
Epsiode count: 36 

**********************************

Goals per spisode: 0.8378378378378378
Average loss: 58862.7368940957
Actions per episode: 35.83783783783784
Epsiode count: 37 

**********************************

Goals per spisode: 0.8421052631578947
Average loss: 439607.02477375924
Actions per episode: 35.1578947368421
Epsiode count: 38 



  0%|▏                                       | 39/10000 [00:02<13:10, 12.59it/s]

**********************************

Goals per spisode: 0.8461538461538461
Average loss: 39782.190865264725
Actions per episode: 38.30769230769231
Epsiode count: 39 

Goals per spisode: 0.825
Average loss: 0.0
Actions per episode: 37.35
Epsiode count: 40 

**********************************

Goals per spisode: 0.8292682926829268
Average loss: 221509.87210856937
Actions per episode: 37.0
Epsiode count: 41 

**********************************

Goals per spisode: 0.8333333333333334
Average loss: 247581.59913333258
Actions per episode: 36.595238095238095
Epsiode count: 42 

**********************************

Goals per spisode: 0.8372093023255814
Average loss: 1948434.375
Actions per episode: 35.76744186046512
Epsiode count: 43 



  0%|▏                                       | 46/10000 [00:03<11:32, 14.37it/s]

**********************************

Goals per spisode: 0.8409090909090909
Average loss: 54263.11728654272
Actions per episode: 37.40909090909091
Epsiode count: 44 

**********************************

Goals per spisode: 0.8444444444444444
Average loss: 1000187.35
Actions per episode: 36.666666666666664
Epsiode count: 45 

**********************************

Goals per spisode: 0.8478260869565217
Average loss: 76247.14305361167
Actions per episode: 37.43478260869565
Epsiode count: 46 



  1%|▏                                       | 51/10000 [00:03<11:04, 14.98it/s]

**********************************

Goals per spisode: 0.851063829787234
Average loss: 49913.8132611812
Actions per episode: 39.276595744680854
Epsiode count: 47 

Goals per spisode: 0.8333333333333334
Average loss: 0.0
Actions per episode: 38.458333333333336
Epsiode count: 48 

**********************************

Goals per spisode: 0.8367346938775511
Average loss: 314347.5127431277
Actions per episode: 38.0
Epsiode count: 49 

**********************************

Goals per spisode: 0.84
Average loss: 184962.12068965516
Actions per episode: 37.8
Epsiode count: 50 

**********************************

Goals per spisode: 0.8431372549019608
Average loss: 156896.96600024018
Actions per episode: 37.68627450980392
Epsiode count: 51 

**********************************

Goals per spisode: 0.8461538461538461
Average loss: 192037.36906491048
Actions per episode: 37.44230769230769
Epsiode count: 52 



  1%|▏                                       | 53/10000 [00:03<11:54, 13.92it/s]

**********************************

Goals per spisode: 0.8490566037735849
Average loss: 73644.33314537848
Actions per episode: 38.132075471698116
Epsiode count: 53 

Goals per spisode: 0.8333333333333334
Average loss: 0.0
Actions per episode: 37.425925925925924
Epsiode count: 54 



  1%|▏                                       | 58/10000 [00:04<13:32, 12.24it/s]

**********************************

Goals per spisode: 0.8363636363636363
Average loss: 35483.995031827195
Actions per episode: 40.4
Epsiode count: 55 

**********************************

Goals per spisode: 0.8392857142857143
Average loss: 332963.2120057003
Actions per episode: 39.964285714285715
Epsiode count: 56 

**********************************

Goals per spisode: 0.8421052631578947
Average loss: 218543.79752257752
Actions per episode: 39.70175438596491
Epsiode count: 57 

**********************************

Goals per spisode: 0.8448275862068966
Average loss: 109561.5084726878
Actions per episode: 39.91379310344828
Epsiode count: 58 

Goals per spisode: 0.8305084745762712
Average loss: 0.0
Actions per episode: 39.23728813559322
Epsiode count: 59 

**********************************

Goals per spisode: 0.8333333333333334
Average loss: 338488.2803065926
Actions per episode: 38.833333333333336
Epsiode count: 60 



  1%|▏                                       | 62/10000 [00:04<10:30, 15.76it/s]

**********************************

Goals per spisode: 0.8360655737704918
Average loss: 136476.36474927998
Actions per episode: 38.83606557377049
Epsiode count: 61 

**********************************

Goals per spisode: 0.8387096774193549
Average loss: 174069.71600202768
Actions per episode: 38.67741935483871
Epsiode count: 62 

**********************************

Goals per spisode: 0.8412698412698413
Average loss: 695193.3462325803
Actions per episode: 38.15873015873016
Epsiode count: 63 

**********************************

Goals per spisode: 0.84375
Average loss: 932010.8
Actions per episode: 37.625
Epsiode count: 64 

**********************************

Goals per spisode: 0.8461538461538461
Average loss: 126410.86934855444
Actions per episode: 37.61538461538461
Epsiode count: 65 



  1%|▎                                       | 66/10000 [00:04<09:15, 17.89it/s]

**********************************

Goals per spisode: 0.8484848484848485
Average loss: 93343.88226586889
Actions per episode: 37.81818181818182
Epsiode count: 66 

**********************************

Goals per spisode: 0.8507462686567164
Average loss: 111411.17362241256
Actions per episode: 37.88059701492537
Epsiode count: 67 



  1%|▎                                       | 73/10000 [00:04<08:29, 19.48it/s]

**********************************

Goals per spisode: 0.8529411764705882
Average loss: 49678.838593104556
Actions per episode: 38.85294117647059
Epsiode count: 68 

**********************************

Goals per spisode: 0.855072463768116
Average loss: 1779058.0
Actions per episode: 38.30434782608695
Epsiode count: 69 

**********************************

Goals per spisode: 0.8571428571428571
Average loss: 1707981.5
Actions per episode: 37.77142857142857
Epsiode count: 70 

**********************************

Goals per spisode: 0.8591549295774648
Average loss: 161299.02325191468
Actions per episode: 37.61971830985915
Epsiode count: 71 

**********************************

Goals per spisode: 0.8611111111111112
Average loss: 255073.16748565633
Actions per episode: 37.31944444444444
Epsiode count: 72 

**********************************

Goals per spisode: 0.863013698630137
Average loss: 225724.19736842104
Actions per episode: 37.054794520547944
Epsiode count: 73 

***********************

  1%|▎                                       | 78/10000 [00:05<12:06, 13.66it/s]

**********************************

Goals per spisode: 0.868421052631579
Average loss: 77993.25685234918
Actions per episode: 38.64473684210526
Epsiode count: 76 

**********************************

Goals per spisode: 0.8701298701298701
Average loss: 77691.63152740622
Actions per episode: 38.96103896103896
Epsiode count: 77 

**********************************

Goals per spisode: 0.8717948717948718
Average loss: 318922.7189140501
Actions per episode: 38.62820512820513
Epsiode count: 78 



  1%|▎                                       | 82/10000 [00:05<11:02, 14.97it/s]

**********************************

Goals per spisode: 0.8734177215189873
Average loss: 80739.22700859758
Actions per episode: 38.87341772151899
Epsiode count: 79 

**********************************

Goals per spisode: 0.875
Average loss: 1057813.4375
Actions per episode: 38.425
Epsiode count: 80 

**********************************

Goals per spisode: 0.8765432098765432
Average loss: 317208.4308361319
Actions per episode: 38.098765432098766
Epsiode count: 81 

**********************************

Goals per spisode: 0.8780487804878049
Average loss: 77413.25788026127
Actions per episode: 38.31707317073171
Epsiode count: 82 



  1%|▎                                       | 84/10000 [00:05<10:27, 15.81it/s]

**********************************

Goals per spisode: 0.8795180722891566
Average loss: 74504.28401374258
Actions per episode: 38.54216867469879
Epsiode count: 83 

**********************************

Goals per spisode: 0.8809523809523809
Average loss: 434781.91794139147
Actions per episode: 38.17857142857143
Epsiode count: 84 

**********************************

Goals per spisode: 0.8823529411764706
Average loss: 46067.20339956621
Actions per episode: 38.870588235294115
Epsiode count: 85 



  1%|▎                                       | 88/10000 [00:06<11:37, 14.21it/s]

**********************************

Goals per spisode: 0.8837209302325582
Average loss: 184663.74567930028
Actions per episode: 38.66279069767442
Epsiode count: 86 

**********************************

Goals per spisode: 0.8850574712643678
Average loss: 109413.07057086247
Actions per episode: 38.632183908045974
Epsiode count: 87 

**********************************

Goals per spisode: 0.8863636363636364
Average loss: 104879.695590684
Actions per episode: 38.61363636363637
Epsiode count: 88 

**********************************

Goals per spisode: 0.8876404494382022
Average loss: 133051.56212113952
Actions per episode: 38.49438202247191
Epsiode count: 89 



  1%|▎                                       | 90/10000 [00:06<14:31, 11.38it/s]

**********************************

Goals per spisode: 0.8888888888888888
Average loss: 44079.937422157585
Actions per episode: 39.31111111111111
Epsiode count: 90 

**********************************

Goals per spisode: 0.8901098901098901
Average loss: 80002.49179218858
Actions per episode: 39.472527472527474
Epsiode count: 91 



  1%|▎                                       | 92/10000 [00:06<16:51,  9.80it/s]

**********************************

Goals per spisode: 0.8913043478260869
Average loss: 48206.076039199834
Actions per episode: 40.15217391304348
Epsiode count: 92 

**********************************

Goals per spisode: 0.8924731182795699
Average loss: 1382756.0
Actions per episode: 39.74193548387097
Epsiode count: 93 

**********************************

Goals per spisode: 0.8936170212765957
Average loss: 231783.07253161736
Actions per episode: 39.5
Epsiode count: 94 



  1%|▍                                       | 96/10000 [00:07<20:21,  8.11it/s]

**********************************

Goals per spisode: 0.8947368421052632
Average loss: 24532.79445878531
Actions per episode: 41.77894736842105
Epsiode count: 95 

**********************************

Goals per spisode: 0.8958333333333334
Average loss: 81964.59848471037
Actions per episode: 42.010416666666664
Epsiode count: 96 

**********************************

Goals per spisode: 0.8969072164948454
Average loss: 370604.47968884156
Actions per episode: 41.70103092783505
Epsiode count: 97 



  1%|▍                                       | 98/10000 [00:07<18:08,  9.10it/s]

**********************************

Goals per spisode: 0.8979591836734694
Average loss: 71487.0274229678
Actions per episode: 42.03061224489796
Epsiode count: 98 

**********************************

Goals per spisode: 0.898989898989899
Average loss: 51639.84106730687
Actions per episode: 42.656565656565654
Epsiode count: 99 



  1%|▍                                      | 103/10000 [00:07<14:11, 11.63it/s]

**********************************

Goals per spisode: 0.9
Average loss: 152130.2693160209
Actions per episode: 42.55
Epsiode count: 100 

**********************************

Goals per spisode: 0.900990099009901
Average loss: 146819.26924211314
Actions per episode: 42.45544554455446
Epsiode count: 101 

**********************************

Goals per spisode: 0.9019607843137255
Average loss: 359166.94808490376
Actions per episode: 42.15686274509804
Epsiode count: 102 

**********************************

Goals per spisode: 0.9029126213592233
Average loss: 107677.34684606007
Actions per episode: 42.16504854368932
Epsiode count: 103 



  1%|▍                                      | 105/10000 [00:08<17:18,  9.53it/s]

**********************************

Goals per spisode: 0.9038461538461539
Average loss: 53366.64808173368
Actions per episode: 42.75961538461539
Epsiode count: 104 

**********************************

Goals per spisode: 0.9047619047619048
Average loss: 76195.97394399335
Actions per episode: 43.00952380952381
Epsiode count: 105 

**********************************

Goals per spisode: 0.9056603773584906
Average loss: 303099.40972561203
Actions per episode: 42.74528301886792
Epsiode count: 106 

**********************************

Goals per spisode: 0.9065420560747663
Average loss: 289370.315255885
Actions per episode: 42.48598130841121
Epsiode count: 107 



  1%|▍                                      | 110/10000 [00:08<18:03,  9.13it/s]

**********************************

Goals per spisode: 0.9074074074074074
Average loss: 30858.270512538966
Actions per episode: 44.092592592592595
Epsiode count: 108 

**********************************

Goals per spisode: 0.908256880733945
Average loss: 145022.00016456575
Actions per episode: 44.03669724770642
Epsiode count: 109 

**********************************

Goals per spisode: 0.9090909090909091
Average loss: 116316.83139440946
Actions per episode: 44.06363636363636
Epsiode count: 110 



  1%|▍                                      | 112/10000 [00:08<16:46,  9.82it/s]

**********************************

Goals per spisode: 0.9099099099099099
Average loss: 99898.38841009566
Actions per episode: 44.16216216216216
Epsiode count: 111 

**********************************

Goals per spisode: 0.9107142857142857
Average loss: 149241.9806905699
Actions per episode: 44.089285714285715
Epsiode count: 112 

Goals per spisode: 0.9026548672566371
Average loss: 0.0
Actions per episode: 43.69911504424779
Epsiode count: 113 

**********************************

Goals per spisode: 0.9035087719298246
Average loss: 95844.6871003974
Actions per episode: 43.80701754385965
Epsiode count: 114 



  1%|▍                                      | 118/10000 [00:09<12:18, 13.38it/s]

**********************************

Goals per spisode: 0.9043478260869565
Average loss: 76277.06673556684
Actions per episode: 44.07826086956522
Epsiode count: 115 

**********************************

Goals per spisode: 0.9051724137931034
Average loss: 570659.4761132813
Actions per episode: 43.76724137931034
Epsiode count: 116 

**********************************

Goals per spisode: 0.905982905982906
Average loss: 1215808.1875
Actions per episode: 43.41880341880342
Epsiode count: 117 

**********************************

Goals per spisode: 0.9067796610169492
Average loss: 78919.5306288511
Actions per episode: 43.610169491525426
Epsiode count: 118 

**********************************

Goals per spisode: 0.907563025210084
Average loss: 89023.14188339046
Actions per episode: 43.739495798319325
Epsiode count: 119 



  1%|▍                                      | 123/10000 [00:09<15:27, 10.65it/s]

**********************************

Goals per spisode: 0.9083333333333333
Average loss: 38522.13429074776
Actions per episode: 44.86666666666667
Epsiode count: 120 

**********************************

Goals per spisode: 0.9090909090909091
Average loss: 187104.09013883962
Actions per episode: 44.743801652892564
Epsiode count: 121 

**********************************

Goals per spisode: 0.9098360655737705
Average loss: 237812.94431344324
Actions per episode: 44.557377049180324
Epsiode count: 122 

**********************************

Goals per spisode: 0.9105691056910569
Average loss: 109373.59392094846
Actions per episode: 44.60162601626016
Epsiode count: 123 

**********************************

Goals per spisode: 0.9112903225806451
Average loss: 37860.31693897085
Actions per episode: 45.70161290322581
Epsiode count: 124 



  1%|▍                                      | 127/10000 [00:10<18:50,  8.73it/s]

**********************************

Goals per spisode: 0.912
Average loss: 53336.8346836929
Actions per episode: 46.352
Epsiode count: 125 

**********************************

Goals per spisode: 0.9126984126984127
Average loss: 144597.63638208297
Actions per episode: 46.32539682539682
Epsiode count: 126 

**********************************

Goals per spisode: 0.9133858267716536
Average loss: 301296.9199258007
Actions per episode: 46.110236220472444
Epsiode count: 127 

**********************************

Goals per spisode: 0.9140625
Average loss: 266140.87745729496
Actions per episode: 45.9140625
Epsiode count: 128 



  1%|▌                                      | 134/10000 [00:10<10:10, 16.15it/s]

**********************************

Goals per spisode: 0.9147286821705426
Average loss: 133693.7132496824
Actions per episode: 45.89922480620155
Epsiode count: 129 

**********************************

Goals per spisode: 0.9153846153846154
Average loss: 127962.20286347393
Actions per episode: 45.89230769230769
Epsiode count: 130 

**********************************

Goals per spisode: 0.916030534351145
Average loss: 1791844.1666666667
Actions per episode: 45.55725190839695
Epsiode count: 131 

Goals per spisode: 0.9090909090909091
Average loss: 0.0
Actions per episode: 45.21212121212121
Epsiode count: 132 

**********************************

Goals per spisode: 0.9097744360902256
Average loss: 520908.29525499494
Actions per episode: 44.93984962406015
Epsiode count: 133 

**********************************

Goals per spisode: 0.9104477611940298
Average loss: 555793.6111111111
Actions per episode: 44.66417910447761
Epsiode count: 134 

**********************************

Goals per spisod

  1%|▌                                      | 137/10000 [00:11<14:11, 11.58it/s]

**********************************

Goals per spisode: 0.9124087591240876
Average loss: 73168.20195545656
Actions per episode: 45.43065693430657
Epsiode count: 137 

**********************************

Goals per spisode: 0.9130434782608695
Average loss: 401919.9836007632
Actions per episode: 45.18840579710145
Epsiode count: 138 

Goals per spisode: 0.9064748201438849
Average loss: 0.0
Actions per episode: 44.86330935251799
Epsiode count: 139 



  1%|▌                                      | 143/10000 [00:11<13:06, 12.54it/s]

**********************************

Goals per spisode: 0.9071428571428571
Average loss: 47241.88490518392
Actions per episode: 45.56428571428572
Epsiode count: 140 

**********************************

Goals per spisode: 0.9078014184397163
Average loss: 191817.6382576724
Actions per episode: 45.4468085106383
Epsiode count: 141 

**********************************

Goals per spisode: 0.9084507042253521
Average loss: 324156.39637454203
Actions per episode: 45.23943661971831
Epsiode count: 142 

**********************************

Goals per spisode: 0.9090909090909091
Average loss: 115184.87685816135
Actions per episode: 45.27272727272727
Epsiode count: 143 



  1%|▌                                      | 145/10000 [00:11<12:41, 12.94it/s]

**********************************

Goals per spisode: 0.9097222222222222
Average loss: 155706.79460687024
Actions per episode: 45.201388888888886
Epsiode count: 144 

**********************************

Goals per spisode: 0.9103448275862069
Average loss: 123067.76068329712
Actions per episode: 45.19310344827586
Epsiode count: 145 

**********************************

Goals per spisode: 0.910958904109589
Average loss: 100634.21744619441
Actions per episode: 45.26712328767123
Epsiode count: 146 



  2%|▌                                      | 151/10000 [00:11<09:55, 16.55it/s]

**********************************

Goals per spisode: 0.9115646258503401
Average loss: 114569.24576531768
Actions per episode: 45.29251700680272
Epsiode count: 147 

**********************************

Goals per spisode: 0.9121621621621622
Average loss: 2106434.25
Actions per episode: 44.99324324324324
Epsiode count: 148 

**********************************

Goals per spisode: 0.912751677852349
Average loss: 173240.26393872115
Actions per episode: 44.88590604026846
Epsiode count: 149 

**********************************

Goals per spisode: 0.9133333333333333
Average loss: 196554.94354515924
Actions per episode: 44.74666666666667
Epsiode count: 150 

**********************************

Goals per spisode: 0.9139072847682119
Average loss: 264185.8614628551
Actions per episode: 44.562913907284766
Epsiode count: 151 



  2%|▌                                      | 154/10000 [00:12<10:33, 15.53it/s]

**********************************

Goals per spisode: 0.9144736842105263
Average loss: 55953.65368675143
Actions per episode: 44.91447368421053
Epsiode count: 152 

**********************************

Goals per spisode: 0.9150326797385621
Average loss: 228593.22365318966
Actions per episode: 44.751633986928105
Epsiode count: 153 

**********************************

Goals per spisode: 0.9155844155844156
Average loss: 1504937.3333333333
Actions per episode: 44.47402597402598
Epsiode count: 154 



  2%|▌                                      | 159/10000 [00:12<10:37, 15.44it/s]

**********************************

Goals per spisode: 0.9161290322580645
Average loss: 52619.774225110225
Actions per episode: 44.83870967741935
Epsiode count: 155 

**********************************

Goals per spisode: 0.9166666666666666
Average loss: 163335.70833333334
Actions per episode: 44.73717948717949
Epsiode count: 156 

**********************************

Goals per spisode: 0.9171974522292994
Average loss: 1771105.375
Actions per episode: 44.45859872611465
Epsiode count: 157 

**********************************

Goals per spisode: 0.9177215189873418
Average loss: 229325.775
Actions per episode: 44.29746835443038
Epsiode count: 158 

**********************************

Goals per spisode: 0.9182389937106918
Average loss: 112753.22449753416
Actions per episode: 44.270440251572325
Epsiode count: 159 



  2%|▋                                      | 163/10000 [00:12<08:28, 19.34it/s]

**********************************

Goals per spisode: 0.91875
Average loss: 242967.22916666666
Actions per episode: 44.1
Epsiode count: 160 

**********************************

Goals per spisode: 0.9192546583850931
Average loss: 1566456.875
Actions per episode: 43.83229813664596
Epsiode count: 161 

**********************************

Goals per spisode: 0.9197530864197531
Average loss: 150863.24870698995
Actions per episode: 43.72222222222222
Epsiode count: 162 

**********************************

Goals per spisode: 0.9202453987730062
Average loss: 146054.9719445056
Actions per episode: 43.61963190184049
Epsiode count: 163 

**********************************

Goals per spisode: 0.9207317073170732
Average loss: 123504.5170171028
Actions per episode: 43.542682926829265
Epsiode count: 164 



  2%|▋                                      | 169/10000 [00:13<09:00, 18.19it/s]

**********************************

Goals per spisode: 0.9212121212121213
Average loss: 78446.40541545113
Actions per episode: 43.593939393939394
Epsiode count: 165 

Goals per spisode: 0.9156626506024096
Average loss: 0.0
Actions per episode: 43.33132530120482
Epsiode count: 166 

**********************************

Goals per spisode: 0.9161676646706587
Average loss: 254439.10222031275
Actions per episode: 43.15568862275449
Epsiode count: 167 

**********************************

Goals per spisode: 0.9166666666666666
Average loss: 95075.46348836519
Actions per episode: 43.13690476190476
Epsiode count: 168 

**********************************

Goals per spisode: 0.9171597633136095
Average loss: 80187.95787010067
Actions per episode: 43.171597633136095
Epsiode count: 169 

**********************************

Goals per spisode: 0.9176470588235294
Average loss: 163744.58984451983
Actions per episode: 43.04705882352941
Epsiode count: 170 

**********************************

Goals per spis

  2%|▋                                      | 172/10000 [00:13<15:46, 10.38it/s]

**********************************

Goals per spisode: 0.9186046511627907
Average loss: 25599.910379100915
Actions per episode: 44.29651162790697
Epsiode count: 172 

Goals per spisode: 0.9132947976878613
Average loss: 0.0
Actions per episode: 44.040462427745666
Epsiode count: 173 



  2%|▋                                      | 176/10000 [00:14<15:59, 10.24it/s]

**********************************

Goals per spisode: 0.9137931034482759
Average loss: 33824.05434337048
Actions per episode: 44.82183908045977
Epsiode count: 174 

**********************************

Goals per spisode: 0.9142857142857143
Average loss: 1982802.75
Actions per episode: 44.57142857142857
Epsiode count: 175 

**********************************

Goals per spisode: 0.9147727272727273
Average loss: 82189.97301936065
Actions per episode: 44.6875
Epsiode count: 176 

**********************************

Goals per spisode: 0.9152542372881356
Average loss: 327902.74430134345
Actions per episode: 44.51412429378531
Epsiode count: 177 



  2%|▋                                      | 180/10000 [00:14<14:26, 11.34it/s]

**********************************

Goals per spisode: 0.9157303370786517
Average loss: 114017.7116472327
Actions per episode: 44.51123595505618
Epsiode count: 178 

**********************************

Goals per spisode: 0.9162011173184358
Average loss: 391508.43792205834
Actions per episode: 44.324022346368714
Epsiode count: 179 

**********************************

Goals per spisode: 0.9166666666666666
Average loss: 59411.54419159456
Actions per episode: 44.583333333333336
Epsiode count: 180 



  2%|▋                                      | 182/10000 [00:14<12:59, 12.59it/s]

**********************************

Goals per spisode: 0.9171270718232044
Average loss: 395748.51620596397
Actions per episode: 44.39779005524862
Epsiode count: 181 

**********************************

Goals per spisode: 0.9175824175824175
Average loss: 99978.24784483777
Actions per episode: 44.41208791208791
Epsiode count: 182 






KeyboardInterrupt: 