# Policy Optimisation

We will use the ```xcs``` Python library that, obviously, implements the XCS algorithm

In [1]:
%pip install xcs

[0mCollecting xcs
  Downloading xcs-1.0.0-py3-none-any.whl.metadata (221 bytes)
Downloading xcs-1.0.0-py3-none-any.whl (53 kB)
Installing collected packages: xcs
Successfully installed xcs-1.0.0
Note: you may need to restart the kernel to use updated packages.


In [1]:
from xcs import XCSAlgorithm
from xcs.scenarios import MUXProblem, ScenarioObserver

The problem that we want solve is the _Multiplexer Problem_ with $11$ bits. Three of the bits are used to address the remaining eight bits. The following is adapted from the ```xcs``` library tutorial.

In [None]:
# The multiplexer problem with 11 has 8 bits as input and 3 to select
scenario = ScenarioObserver(MUXProblem(50000))

The ```XCSAlgorithm``` object has some reasonable defaults...

In [3]:
algorithm = XCSAlgorithm()

That can be easily explored and modified:

In [4]:
print(algorithm.learning_rate)

0.15


We create a new scenario and we run the resulting model on it to learn a set of rules

In [5]:
model = algorithm.new_model(scenario)
model.run(scenario, learn=True)

How many rules does the learned model have?

In [6]:
# Here the number of rules
print(len(model))

125


What are the rules with the best fitness?

In [None]:
# From the output we can see the fitness.
# The first three bits define the position and the value of the bit, with some probabilities
# The algorithm have find a "policy" or bettere to say a set of rules that define the behavior of the system

for rule in model:
    if rule.fitness > .5:
        print(f"{rule.condition} => {rule.action}\t {rule.fitness}")

0001####### => True	 0.8157891624151895
010##1##### => True	 0.6819925224215757
0000####### => True	 0.8352575487357687
010##0##### => True	 0.533365324489684
100####1### => True	 0.6799140909959224
110######0# => True	 0.924125694219482
101####11## => True	 0.6046614038820011
001#1###### => True	 0.5054741105193185


We can also print the entire model:

In [None]:
# And here a more described statistic

print(model)

#000#100000 => True
    Time Stamp: 47267
    Average Reward: 1e-05
    Error: 1e-05
    Fitness: 1e-05
    Experience: 0
    Action Set Size: 1
    Numerosity: 1
1#101##11## => True
    Time Stamp: 49934
    Average Reward: 0.875
    Error: 0.13541666666666669
    Fitness: 0.012127251467608253
    Experience: 0
    Action Set Size: 1
    Numerosity: 1
#1#1####### => True
    Time Stamp: 49980
    Average Reward: 0.5409023702712538
    Error: 0.44178429973899974
    Fitness: 0.013072332759138655
    Experience: 20
    Action Set Size: 13.889454579696539
    Numerosity: 1
10110##1### => True
    Time Stamp: 49911
    Average Reward: 0.22111904633586232
    Error: 0.21316479786352743
    Fitness: 0.013392499213254656
    Experience: 0
    Action Set Size: 1
    Numerosity: 1
11#1#####1# => True
    Time Stamp: 49964
    Average Reward: 0.7888013153089439
    Error: 0.26852694147825207
    Fitness: 0.013424274955039827
    Experience: 47
    Action Set Size: 11.560363270925128
    Numeros