logs/main_py3_memory_profiler_log.txt

Loaded experiments configuration from 'configuration.py' :
configuration['policies'] = [{'archtype': <class 'SMPyBandits.Policies.UCBalpha.UCBalpha'>, 'params': {'alpha': 1}}, {'archtype': <class 'SMPyBandits.Policies.MOSS.MOSS'>, 'params': {}}, {'archtype': <class 'SMPyBandits.Policies.MOSSH.MOSSH'>, 'params': {'horizon': 1000}}, {'archtype': <class 'SMPyBandits.Policies.MOSSAnytime.MOSSAnytime'>, 'params': {'alpha': 1.35}}, {'archtype': <class 'SMPyBandits.Policies.DMED.DMEDPlus'>, 'params': {}}, {'archtype': <class 'SMPyBandits.Policies.Thompson.Thompson'>, 'params': {'posterior': <class 'SMPyBandits.Policies.Posterior.Beta.Beta'>}}, {'archtype': <class 'SMPyBandits.Policies.klUCB.klUCB'>, 'params': {'klucb': <built-in function klucbBern>}}, {'archtype': <class 'SMPyBandits.Policies.klUCBPlusPlus.klUCBPlusPlus'>, 'params': {'horizon': 1000, 'klucb': <built-in function klucbBern>}}, {'archtype': <class 'SMPyBandits.Policies.klUCBswitch.klUCBswitch'>, 'params': {'horizon': 1000, 'klucb': <built-in function klucbBern>, 'threshold': 'best'}}, {'archtype': <class 'SMPyBandits.Policies.klUCBswitch.klUCBswitch'>, 'params': {'horizon': 1000, 'klucb': <built-in function klucbBern>, 'threshold': 'delayed'}}, {'archtype': <class 'SMPyBandits.Policies.klUCBswitch.klUCBswitchAnytime'>, 'params': {'klucb': <built-in function klucbBern>, 'threshold': 'best'}}, {'archtype': <class 'SMPyBandits.Policies.klUCBswitch.klUCBswitchAnytime'>, 'params': {'klucb': <built-in function klucbBern>, 'threshold': 'delayed'}}, {'archtype': <class 'SMPyBandits.Policies.BayesUCB.BayesUCB'>, 'params': {'posterior': <class 'SMPyBandits.Policies.Posterior.Beta.Beta'>}}, {'archtype': <class 'SMPyBandits.Policies.AdBandits.AdBandits'>, 'params': {'alpha': 0.5, 'horizon': 1000}}, {'archtype': <class 'SMPyBandits.Policies.ApproximatedFHGittins.ApproximatedFHGittins'>, 'params': {'alpha': 0.5, 'horizon': 1100}}, {'archtype': <class 'SMPyBandits.Policies.UCBoost.UCB_lb'>, 'params': {}}]
====> TURNING NOPLOTS MODE ON <=====
====> TURNING DEBUG MODE ON <=====
plots/ is already a directory here...
Number of policies in this comparison: 16
Time horizon: 1000
Number of repetitions: 16
Sampling rate for plotting, delta_t_plot: 1
Number of jobs for parallelization: 1


Creating a new MAB problem ...
  Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7000000000000001, 0.8, 0.9]} ...
 - with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
 - with 'params' = [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7000000000000001, 0.8, 0.9]
 - with 'arms' = [B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)]
 - with 'means' = [0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
 - with 'nbArms' = 9
 - with 'maxArm' = 0.9
 - with 'minArm' = 0.1

This MAB problem has: 
 - a [Lai & Robbins] complexity constant C(mu) = 7.52 ... 
 - a Optimal Arm Identification factor H_OI(mu) = 48.89% ...
 - with 'arms' represented as: $[B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)^*]$
Number of environments to try: 1


Evaluating environment: MAB(nbArms: 9, arms: [B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)], minArm: 0.1, maxArm: 0.9)
- Adding policy #1 = {'archtype': <class 'SMPyBandits.Policies.UCBalpha.UCBalpha'>, 'params': {'alpha': 1}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][0]' = {'archtype': <class 'SMPyBandits.Policies.UCBalpha.UCBalpha'>, 'params': {'alpha': 1}} ...
- Adding policy #2 = {'archtype': <class 'SMPyBandits.Policies.MOSS.MOSS'>, 'params': {}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][1]' = {'archtype': <class 'SMPyBandits.Policies.MOSS.MOSS'>, 'params': {}} ...
- Adding policy #3 = {'archtype': <class 'SMPyBandits.Policies.MOSSH.MOSSH'>, 'params': {'horizon': 1000}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][2]' = {'archtype': <class 'SMPyBandits.Policies.MOSSH.MOSSH'>, 'params': {'horizon': 1000}} ...
- Adding policy #4 = {'archtype': <class 'SMPyBandits.Policies.MOSSAnytime.MOSSAnytime'>, 'params': {'alpha': 1.35}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][3]' = {'archtype': <class 'SMPyBandits.Policies.MOSSAnytime.MOSSAnytime'>, 'params': {'alpha': 1.35}} ...
- Adding policy #5 = {'archtype': <class 'SMPyBandits.Policies.DMED.DMEDPlus'>, 'params': {}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][4]' = {'archtype': <class 'SMPyBandits.Policies.DMED.DMEDPlus'>, 'params': {}} ...
- Adding policy #6 = {'archtype': <class 'SMPyBandits.Policies.Thompson.Thompson'>, 'params': {'posterior': <class 'SMPyBandits.Policies.Posterior.Beta.Beta'>}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][5]' = {'archtype': <class 'SMPyBandits.Policies.Thompson.Thompson'>, 'params': {'posterior': <class 'SMPyBandits.Policies.Posterior.Beta.Beta'>}} ...
- Adding policy #7 = {'archtype': <class 'SMPyBandits.Policies.klUCB.klUCB'>, 'params': {'klucb': <built-in function klucbBern>}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][6]' = {'archtype': <class 'SMPyBandits.Policies.klUCB.klUCB'>, 'params': {'klucb': <built-in function klucbBern>}} ...
- Adding policy #8 = {'archtype': <class 'SMPyBandits.Policies.klUCBPlusPlus.klUCBPlusPlus'>, 'params': {'horizon': 1000, 'klucb': <built-in function klucbBern>}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][7]' = {'archtype': <class 'SMPyBandits.Policies.klUCBPlusPlus.klUCBPlusPlus'>, 'params': {'horizon': 1000, 'klucb': <built-in function klucbBern>}} ...
- Adding policy #9 = {'archtype': <class 'SMPyBandits.Policies.klUCBswitch.klUCBswitch'>, 'params': {'horizon': 1000, 'klucb': <built-in function klucbBern>, 'threshold': 'best'}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][8]' = {'archtype': <class 'SMPyBandits.Policies.klUCBswitch.klUCBswitch'>, 'params': {'horizon': 1000, 'klucb': <built-in function klucbBern>, 'threshold': 'best'}} ...
- Adding policy #10 = {'archtype': <class 'SMPyBandits.Policies.klUCBswitch.klUCBswitch'>, 'params': {'horizon': 1000, 'klucb': <built-in function klucbBern>, 'threshold': 'delayed'}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][9]' = {'archtype': <class 'SMPyBandits.Policies.klUCBswitch.klUCBswitch'>, 'params': {'horizon': 1000, 'klucb': <built-in function klucbBern>, 'threshold': 'delayed'}} ...
- Adding policy #11 = {'archtype': <class 'SMPyBandits.Policies.klUCBswitch.klUCBswitchAnytime'>, 'params': {'klucb': <built-in function klucbBern>, 'threshold': 'best'}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][10]' = {'archtype': <class 'SMPyBandits.Policies.klUCBswitch.klUCBswitchAnytime'>, 'params': {'klucb': <built-in function klucbBern>, 'threshold': 'best'}} ...
- Adding policy #12 = {'archtype': <class 'SMPyBandits.Policies.klUCBswitch.klUCBswitchAnytime'>, 'params': {'klucb': <built-in function klucbBern>, 'threshold': 'delayed'}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][11]' = {'archtype': <class 'SMPyBandits.Policies.klUCBswitch.klUCBswitchAnytime'>, 'params': {'klucb': <built-in function klucbBern>, 'threshold': 'delayed'}} ...
- Adding policy #13 = {'archtype': <class 'SMPyBandits.Policies.BayesUCB.BayesUCB'>, 'params': {'posterior': <class 'SMPyBandits.Policies.Posterior.Beta.Beta'>}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][12]' = {'archtype': <class 'SMPyBandits.Policies.BayesUCB.BayesUCB'>, 'params': {'posterior': <class 'SMPyBandits.Policies.Posterior.Beta.Beta'>}} ...
- Adding policy #14 = {'archtype': <class 'SMPyBandits.Policies.AdBandits.AdBandits'>, 'params': {'alpha': 0.5, 'horizon': 1000}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][13]' = {'archtype': <class 'SMPyBandits.Policies.AdBandits.AdBandits'>, 'params': {'alpha': 0.5, 'horizon': 1000}} ...
- Adding policy #15 = {'archtype': <class 'SMPyBandits.Policies.ApproximatedFHGittins.ApproximatedFHGittins'>, 'params': {'alpha': 0.5, 'horizon': 1100}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][14]' = {'archtype': <class 'SMPyBandits.Policies.ApproximatedFHGittins.ApproximatedFHGittins'>, 'params': {'alpha': 0.5, 'horizon': 1100}} ...
- Adding policy #16 = {'archtype': <class 'SMPyBandits.Policies.UCBoost.UCB_lb'>, 'params': {}} ...
  Creating this policy from a dictionnary 'self.cfg['policies'][15]' = {'archtype': <class 'SMPyBandits.Policies.UCBoost.UCB_lb'>, 'params': {}} ...


- Evaluating policy #1/16: UCB($\alpha=1$) ...

Estimated order by the policy UCB($\alpha=1$) after 1000 steps: [2 0 1 3 5 4 7 6 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 80.25% (relative success)...
  ==> Gestalt     distance from optimal ordering: 66.67% (relative success)...
  ==> Mean distance from optimal ordering: 73.46% (relative success)...


- Evaluating policy #2/16: MOSS ...

Estimated order by the policy MOSS after 1000 steps: [0 2 4 1 5 7 3 6 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 70.37% (relative success)...
  ==> Gestalt     distance from optimal ordering: 66.67% (relative success)...
  ==> Mean distance from optimal ordering: 68.52% (relative success)...


- Evaluating policy #3/16: MOSS-H($T=1000$) ...

Estimated order by the policy MOSS-H($T=1000$) after 1000 steps: [1 5 6 7 0 2 3 4 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 35.80% (relative success)...
  ==> Gestalt     distance from optimal ordering: 55.56% (relative success)...
  ==> Mean distance from optimal ordering: 45.68% (relative success)...


- Evaluating policy #4/16: MOSS-Anytime($\alpha=1.35$) ...

Estimated order by the policy MOSS-Anytime($\alpha=1.35$) after 1000 steps: [0 3 1 2 5 6 4 7 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 80.25% (relative success)...
  ==> Gestalt     distance from optimal ordering: 77.78% (relative success)...
  ==> Mean distance from optimal ordering: 79.01% (relative success)...


- Evaluating policy #5/16: DMED$^+$(Bern) ...

Estimated order by the policy DMED$^+$(Bern) after 1000 steps: [8 0 1 6 4 7 3 5 2] ...
  ==> Optimal arm identification: 33.33% (relative success)...
  ==> Manhattan   distance from optimal ordering: 35.80% (relative success)...
  ==> Gestalt     distance from optimal ordering: 44.44% (relative success)...
  ==> Mean distance from optimal ordering: 40.12% (relative success)...


- Evaluating policy #6/16: Thompson ...

Estimated order by the policy Thompson after 1000 steps: [4 3 2 1 0 5 7 6 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 65.43% (relative success)...
  ==> Gestalt     distance from optimal ordering: 44.44% (relative success)...
  ==> Mean distance from optimal ordering: 54.94% (relative success)...


- Evaluating policy #7/16: kl-UCB ...

Estimated order by the policy kl-UCB after 1000 steps: [3 0 1 4 5 2 6 7 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 75.31% (relative success)...
  ==> Gestalt     distance from optimal ordering: 77.78% (relative success)...
  ==> Mean distance from optimal ordering: 76.54% (relative success)...


- Evaluating policy #8/16: kl-UCB$^{++}$($T=1000$) ...

Estimated order by the policy kl-UCB$^{++}$($T=1000$) after 1000 steps: [3 4 5 6 0 1 2 7 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 40.74% (relative success)...
  ==> Gestalt     distance from optimal ordering: 66.67% (relative success)...
  ==> Mean distance from optimal ordering: 53.70% (relative success)...


- Evaluating policy #9/16: kl-UCB-switch($T=1000$) ...

Estimated order by the policy kl-UCB-switch($T=1000$) after 1000 steps: [2 4 3 0 1 7 6 5 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 60.49% (relative success)...
  ==> Gestalt     distance from optimal ordering: 44.44% (relative success)...
  ==> Mean distance from optimal ordering: 52.47% (relative success)...


- Evaluating policy #10/16: kl-UCB-switch($T=1000$, delayed f) ...

Estimated order by the policy kl-UCB-switch($T=1000$, delayed f) after 1000 steps: [0 1 2 3 4 5 7 6 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 95.06% (relative success)...
  ==> Gestalt     distance from optimal ordering: 88.89% (relative success)...
  ==> Mean distance from optimal ordering: 91.98% (relative success)...


- Evaluating policy #11/16: kl-UCB-switch ...

Estimated order by the policy kl-UCB-switch after 1000 steps: [1 4 0 2 3 5 7 6 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 75.31% (relative success)...
  ==> Gestalt     distance from optimal ordering: 66.67% (relative success)...
  ==> Mean distance from optimal ordering: 70.99% (relative success)...


- Evaluating policy #12/16: kl-UCB-switch(delayed f) ...

Estimated order by the policy kl-UCB-switch(delayed f) after 1000 steps: [0 1 6 2 3 5 7 4 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 75.31% (relative success)...
  ==> Gestalt     distance from optimal ordering: 77.78% (relative success)...
  ==> Mean distance from optimal ordering: 76.54% (relative success)...


- Evaluating policy #13/16: BayesUCB ...

Estimated order by the policy BayesUCB after 1000 steps: [4 0 1 2 3 6 5 7 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 75.31% (relative success)...
  ==> Gestalt     distance from optimal ordering: 77.78% (relative success)...
  ==> Mean distance from optimal ordering: 76.54% (relative success)...


- Evaluating policy #14/16: AdBandits($T=1000$, $\alpha=0.5$) ...

Estimated order by the policy AdBandits($T=1000$, $\alpha=0.5$) after 1000 steps: [7 0 5 8 4 1 2 3 6] ...
  ==> Optimal arm identification: 77.78% (relative success)...
  ==> Manhattan   distance from optimal ordering: 25.93% (relative success)...
  ==> Gestalt     distance from optimal ordering: 55.56% (relative success)...
  ==> Mean distance from optimal ordering: 40.74% (relative success)...


- Evaluating policy #15/16: ApprFHG($T=1100$) ...

Estimated order by the policy ApprFHG($T=1100$) after 1000 steps: [0 1 3 2 6 4 5 7 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 85.19% (relative success)...
  ==> Gestalt     distance from optimal ordering: 77.78% (relative success)...
  ==> Mean distance from optimal ordering: 81.48% (relative success)...


- Evaluating policy #16/16: $\mathrm{UCB}_{d=d_{lb}}$($c=0$) ...

Estimated order by the policy $\mathrm{UCB}_{d=d_{lb}}$($c=0$) after 1000 steps: [1 3 2 0 4 7 5 6 8] ...
  ==> Optimal arm identification: 100.00% (relative success)...
  ==> Manhattan   distance from optimal ordering: 75.31% (relative success)...
  ==> Gestalt     distance from optimal ordering: 66.67% (relative success)...
  ==> Mean distance from optimal ordering: 70.99% (relative success)...


Giving the vector of final regrets ...

  For policy #0 called 'UCB($\alpha=1$)' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 47.2
Mean of   last regrets R_T = 56.9
Median of last regrets R_T = 53.1
Max of    last regrets R_T = 68.8
STD of    last regrets R_T = 8.18

  For policy #1 called 'MOSS' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 30.5
Mean of   last regrets R_T = 44.1
Median of last regrets R_T = 42.7
Max of    last regrets R_T = 55.5
STD of    last regrets R_T = 6.68

  For policy #2 called 'MOSS-H($T=1000$)' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 32.9
Mean of   last regrets R_T = 49
Median of last regrets R_T = 46.3
Max of    last regrets R_T = 76.4
STD of    last regrets R_T = 10.9

  For policy #3 called 'MOSS-Anytime($\alpha=1.35$)' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 33.5
Mean of   last regrets R_T = 47.2
Median of last regrets R_T = 46.5
Max of    last regrets R_T = 72.8
STD of    last regrets R_T = 9.75

  For policy #4 called 'DMED$^+$(Bern)' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 23.1
Mean of   last regrets R_T = 35.7
Median of last regrets R_T = 34.8
Max of    last regrets R_T = 51.1
STD of    last regrets R_T = 7.69

  For policy #5 called 'Thompson' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 23.2
Mean of   last regrets R_T = 30.6
Median of last regrets R_T = 28.6
Max of    last regrets R_T = 43.3
STD of    last regrets R_T = 6.18

  For policy #6 called 'kl-UCB' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 20.9
Mean of   last regrets R_T = 37.4
Median of last regrets R_T = 37.8
Max of    last regrets R_T = 63.9
STD of    last regrets R_T = 9.45

  For policy #7 called 'kl-UCB$^{++}$($T=1000$)' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 24.3
Mean of   last regrets R_T = 34.9
Median of last regrets R_T = 30.2
Max of    last regrets R_T = 61.6
STD of    last regrets R_T = 10

  For policy #8 called 'kl-UCB-switch($T=1000$)' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 17.9
Mean of   last regrets R_T = 37.2
Median of last regrets R_T = 36.1
Max of    last regrets R_T = 64.5
STD of    last regrets R_T = 9.95

  For policy #9 called 'kl-UCB-switch($T=1000$, delayed f)' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 23.5
Mean of   last regrets R_T = 34.8
Median of last regrets R_T = 33.3
Max of    last regrets R_T = 49.3
STD of    last regrets R_T = 6.94

  For policy #10 called 'kl-UCB-switch' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 25.3
Mean of   last regrets R_T = 32.4
Median of last regrets R_T = 30.5
Max of    last regrets R_T = 43
STD of    last regrets R_T = 5.29

  For policy #11 called 'kl-UCB-switch(delayed f)' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 22.2
Mean of   last regrets R_T = 34.8
Median of last regrets R_T = 33.2
Max of    last regrets R_T = 48.6
STD of    last regrets R_T = 8.47

  For policy #12 called 'BayesUCB' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 10.1
Mean of   last regrets R_T = 20.9
Median of last regrets R_T = 20.1
Max of    last regrets R_T = 35.5
STD of    last regrets R_T = 5.93

  For policy #13 called 'AdBandits($T=1000$, $\alpha=0.5$)' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 12.2
Mean of   last regrets R_T = 20.5
Median of last regrets R_T = 19.6
Max of    last regrets R_T = 33.6
STD of    last regrets R_T = 5.06

  For policy #14 called 'ApprFHG($T=1100$)' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 41
Mean of   last regrets R_T = 51.9
Median of last regrets R_T = 48
Max of    last regrets R_T = 73
STD of    last regrets R_T = 10.1

  For policy #15 called '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' ...
  Last regrets (for all repetitions) have:
Min of    last regrets R_T = 17.8
Mean of   last regrets R_T = 24.8
Median of last regrets R_T = 24.6
Max of    last regrets R_T = 30.5
STD of    last regrets R_T = 3.61


Giving the final ranks ...

Final ranking for this environment #0 :
- Policy 'AdBandits($T=1000$, $\alpha=0.5$)'	was ranked	1 / 16 for this simulation (last regret = 20.463).
- Policy 'BayesUCB'	was ranked	2 / 16 for this simulation (last regret = 20.75).
- Policy '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)'	was ranked	3 / 16 for this simulation (last regret = 24.831).
- Policy 'Thompson'	was ranked	4 / 16 for this simulation (last regret = 30.525).
- Policy 'kl-UCB-switch'	was ranked	5 / 16 for this simulation (last regret = 32.244).
- Policy 'kl-UCB-switch($T=1000$, delayed f)'	was ranked	6 / 16 for this simulation (last regret = 34.663).
- Policy 'kl-UCB-switch(delayed f)'	was ranked	7 / 16 for this simulation (last regret = 34.819).
- Policy 'kl-UCB$^{++}$($T=1000$)'	was ranked	8 / 16 for this simulation (last regret = 34.881).
- Policy 'DMED$^+$(Bern)'	was ranked	9 / 16 for this simulation (last regret = 35.675).
- Policy 'kl-UCB-switch($T=1000$)'	was ranked	10 / 16 for this simulation (last regret = 37.138).
- Policy 'kl-UCB'	was ranked	11 / 16 for this simulation (last regret = 37.338).
- Policy 'MOSS'	was ranked	12 / 16 for this simulation (last regret = 43.756).
- Policy 'MOSS-Anytime($\alpha=1.35$)'	was ranked	13 / 16 for this simulation (last regret = 47.119).
- Policy 'MOSS-H($T=1000$)'	was ranked	14 / 16 for this simulation (last regret = 49).
- Policy 'ApprFHG($T=1100$)'	was ranked	15 / 16 for this simulation (last regret = 51.888).
- Policy 'UCB($\alpha=1$)'	was ranked	16 / 16 for this simulation (last regret = 56.638).


Giving the mean and std running times ...

For policy #2 called 'MOSS-H($T=1000$)' ...
    628 ms ± 13.4 ms per loop (mean ± std. dev. of 16 runs)

For policy #3 called 'MOSS-Anytime($\alpha=1.35$)' ...
    679 ms ± 24.6 ms per loop (mean ± std. dev. of 16 runs)

For policy #5 called 'Thompson' ...
    716 ms ± 54 ms per loop (mean ± std. dev. of 16 runs)

For policy #14 called 'ApprFHG($T=1100$)' ...
    722 ms ± 85.4 ms per loop (mean ± std. dev. of 16 runs)

For policy #1 called 'MOSS' ...
    811 ms ± 127 ms per loop (mean ± std. dev. of 16 runs)

For policy #10 called 'kl-UCB-switch' ...
    814 ms ± 27.3 ms per loop (mean ± std. dev. of 16 runs)

For policy #7 called 'kl-UCB$^{++}$($T=1000$)' ...
    819 ms ± 10.4 ms per loop (mean ± std. dev. of 16 runs)

For policy #4 called 'DMED$^+$(Bern)' ...
    819 ms ± 87.5 ms per loop (mean ± std. dev. of 16 runs)

For policy #12 called 'BayesUCB' ...
    821 ms ± 77.1 ms per loop (mean ± std. dev. of 16 runs)

For policy #9 called 'kl-UCB-switch($T=1000$, delayed f)' ...
    828 ms ± 33.3 ms per loop (mean ± std. dev. of 16 runs)

For policy #0 called 'UCB($\alpha=1$)' ...
    832 ms ± 196 ms per loop (mean ± std. dev. of 16 runs)

For policy #8 called 'kl-UCB-switch($T=1000$)' ...
    839 ms ± 50.6 ms per loop (mean ± std. dev. of 16 runs)

For policy #11 called 'kl-UCB-switch(delayed f)' ...
    863 ms ± 65.3 ms per loop (mean ± std. dev. of 16 runs)

For policy #15 called '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' ...
    871 ms ± 75.2 ms per loop (mean ± std. dev. of 16 runs)

For policy #13 called 'AdBandits($T=1000$, $\alpha=0.5$)' ...
    872 ms ± 73.4 ms per loop (mean ± std. dev. of 16 runs)

For policy #6 called 'kl-UCB' ...
    904 ms ± 86.6 ms per loop (mean ± std. dev. of 16 runs)
Done for simulations main.py ...
Filename: /home/lilian/ownCloud/owncloud.crans.org/Crans/These_2016-17/src/SMPyBandits/SMPyBandits/Environment/Evaluator.py

Line #    Mem usage    Increment   Line Contents
================================================
   728  180.559 MiB  180.148 MiB   @profile
   729                             def delayed_play(env, policy, horizon,
   730                                              random_shuffle=random_shuffle, random_invert=random_invert, nb_random_events=nb_random_events,
   731                                              seed=None, allrewards=None, repeatId=0,
   732                                              useJoblib=False):
   733                                 """Helper function for the parallelization."""
   734  180.559 MiB    0.000 MiB       start_time = time.time()
   735  180.559 MiB    0.000 MiB       start_memory = getCurrentMemory(thread=useJoblib)
   736                                 # Give a unique seed to random & numpy.random for each call of this function
   737  180.559 MiB    0.000 MiB       try:
   738  180.559 MiB    0.000 MiB           if seed is not None:
   739                                         random.seed(seed)
   740                                         np.random.seed(seed)
   741                                 except (ValueError, SystemError):
   742                                     print("Warning: setting random.seed and np.random.seed seems to not be available. Are you using Windows?")  # XXX
   743                                 # We have to deepcopy because this function is Parallel-ized
   744  180.559 MiB    0.000 MiB       if random_shuffle or random_invert:
   745                                     env = deepcopy(env)    # XXX this uses a LOT of RAM memory!!!
   746  180.559 MiB    0.000 MiB       means = env.means
   747  180.559 MiB    0.000 MiB       if env.isDynamic:
   748                                     means = env.newRandomArms()
   749  180.559 MiB    0.258 MiB       policy = deepcopy(policy)  # XXX this uses a LOT of RAM memory!!!
   750                             
   751  180.559 MiB    0.000 MiB       indexes_bestarm = np.nonzero(np.isclose(env.means, env.maxArm))[0]
   752                             
   753                                 # Start game
   754  180.559 MiB    0.000 MiB       policy.startGame()
   755  180.559 MiB    0.000 MiB       result = Result(env.nbArms, horizon, indexes_bestarm=indexes_bestarm, means=means)  # One Result object, for every policy
   756                             
   757                                 # XXX Experimental support for random events: shuffling or inverting the list of arms, at these time steps
   758  180.559 MiB    0.000 MiB       t_events = [i * int(horizon / float(nb_random_events)) for i in range(nb_random_events)]
   759  180.559 MiB    0.000 MiB       if nb_random_events is None or nb_random_events <= 0:
   760                                     random_shuffle = False
   761                                     random_invert = False
   762                             
   763  180.559 MiB    0.000 MiB       prettyRange = tqdm(range(horizon), desc="Time t") if repeatId == 0 else range(horizon)
   764  180.559 MiB    0.000 MiB       for t in prettyRange:
   765  180.559 MiB    0.008 MiB           choice = policy.choice()
   766                             
   767                                     # XXX do this quicker!?
   768  180.559 MiB    0.000 MiB           if allrewards is None:
   769  180.559 MiB    0.000 MiB               reward = env.draw(choice, t)
   770                                     else:
   771                                         reward = allrewards[choice, repeatId, t]
   772                             
   773  180.559 MiB    0.000 MiB           policy.getReward(choice, reward)
   774                             
   775                                     # Finally we store the results
   776  180.559 MiB    0.145 MiB           result.store(t, choice, reward)
   777                             
   778                                     # XXX Experimental : shuffle the arms at the middle of the simulation
   779  180.559 MiB    0.000 MiB           if random_shuffle and t in t_events:
   780                                             indexes_bestarm = env.new_order_of_arm(shuffled(env.arms))
   781                                             result.change_in_arms(t, indexes_bestarm)
   782                                             if repeatId == 0:
   783                                                 print("\nShuffling the arms at time t = {} ...".format(t))  # DEBUG
   784                                     # XXX Experimental : invert the order of the arms at the middle of the simulation
   785  180.559 MiB    0.000 MiB           if random_invert and t in t_events:
   786                                             indexes_bestarm = env.new_order_of_arm(env.arms[::-1])
   787                                             result.change_in_arms(t, indexes_bestarm)
   788                                             if repeatId == 0:
   789                                                 print("\nInverting the order of the arms at time t = {} ...".format(t))  # DEBUG
   790                             
   791                                 # Print the quality of estimation of arm ranking for this policy, just for 1st repetition
   792  180.559 MiB    0.000 MiB       if repeatId == 0 and hasattr(policy, 'estimatedOrder'):
   793  180.426 MiB    0.000 MiB           order = policy.estimatedOrder()
   794  180.426 MiB    0.000 MiB           print("\nEstimated order by the policy {} after {} steps: {} ...".format(policy, horizon, order))
   795  180.426 MiB    0.000 MiB           print("  ==> Optimal arm identification: {:.2%} (relative success)...".format(weightedDistance(order, env.means, n=1)))
   796  180.426 MiB    0.000 MiB           print("  ==> Manhattan   distance from optimal ordering: {:.2%} (relative success)...".format(manhattan(order)))
   797                                     # print("  ==> Kendell Tau distance from optimal ordering: {:.2%} (relative success)...".format(kendalltau(order)))
   798                                     # print("  ==> Spearman    distance from optimal ordering: {:.2%} (relative success)...".format(spearmanr(order)))
   799  180.426 MiB    0.000 MiB           print("  ==> Gestalt     distance from optimal ordering: {:.2%} (relative success)...".format(gestalt(order)))
   800  180.426 MiB    0.000 MiB           print("  ==> Mean distance from optimal ordering: {:.2%} (relative success)...".format(meanDistance(order)))
   801                             
   802                                 # Finally, store running time and consumed memory
   803  180.559 MiB    0.000 MiB       result.running_time = time.time() - start_time
   804  180.559 MiB    0.000 MiB       result.memory_consumption = getCurrentMemory(thread=useJoblib) - start_memory
   805  180.559 MiB    0.000 MiB       return result