Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tranquilizer (K67R) #1126

Merged
merged 32 commits into from
Sep 3, 2017
Merged

Conversation

MHakem
Copy link
Member

@MHakem MHakem commented Aug 24, 2017

This implements the Tranquilizer strategy as referenced in issue #1103 (one of the strategies in Axelrod's second tournament).

Reverse engineering was a joint contribution with Will Guo.

This strategy is stochastic, so testing that the program is the correct implementation of the Fortran code is not immediate. However, as indicated by @drvinceknight I have ran the fingerprint for the Tranquilizer program:

k67r fingerprint new

We added tranquiliser to the import so that it is known to the Axelrod
library.
Add to axelrod second, fix test_tranquiliserii
No longer needed after an upstream change.
Add more tests
…e names), added to axelrod_second and test_axelrod_second and deleted standalone tranquiliser.py and test_tranquiliser.py files
Signed-off-by: Mansour Hakem <mansourhakem@googlemail.com>
Signed-off-by: Mansour Hakem <mansourhakem@googlemail.com>
@drvinceknight
Copy link
Member

Here is the fingerprint for the fortran strategy:

https://github.com/Axelrod-Python/Axelrod-fingerprint#k67r

Copy link
Member

@drvinceknight drvinceknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @MHakem: some initial comments on style. 👍

Has a variable 'NK' which increases each time a move is played whilst in state FD = 2. It has an initial value of 1.
Has a variable 'AD' with an initial value of 5
Has a variable 'NO with an initial value of 0
When FD = 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently your docs are failing on the automated testing: https://travis-ci.org/Axelrod-Python/Axelrod/jobs/267887570

This is because of how sphinx formats things. The following will fix it:


 232     Has a variable 'NO with an initial value of 0                                                                                                                                                                                                                       
~ 233                                                                                                                                                                                                                                                                         
~ 234                                                                                                                                                                                                                                                                         
~ 235     The strategy follows the following algorithm::                                                                                                                                                                                                                      
~ 236                                                                                                                                                                                                                                                                         
~ 237         When FD = 0:                                                                                                                                                                                                                                                    
~ 238                                                                                                                                                                                                                                                                         
~ 239             If the opponent's last move (JA) was Cooperate, increase the value of C by 1                                                                                                                                                                                
~ 240             If Score (K) < 1.75 * Move Number (M), play opponent's last move                                                                                                                                                                                            
~ 241             If (1.75 * M) <= K < (2.25 * M):                                                                                                                                                                                                                            
~ 242                                                                                                                                                                                                                                                                         
~ 243                 Calculate Probability P:                                                                                                                                                                                                                                
~ 244                 P = 0.25 + C/M - 0.25*S + (K - L)/100 + 4/M                                                                                                                                                                                                             
~ 245                 Where L is the opponent's score so far                                                                                                                                                                                                                  
~ 246                 If Random (R) <= P:                                                                                                                                                                                                                                     
~ 247                                                                                                                                                                                                                                                                         
~ 248                     Cooperate                                                                                                                                                                                                                                           
~ 249                 Else:                                                                                                                                                                                                                                                   
~ 250                                                                                                                                                                                                                                                                         
~ 251                     Defect                                                                                                                                                                                                                                              
~ 252                                                                                                                                                                                                                                                                         
~ 253             If K >= (2.25 * M):                                                                                                                                                                                                                                         
~ 254                                                                                                                                                                                                                                                                         
~ 255                 Calculate probability P:                                                                                                                                                                                                                                
~ 256                 P = 0.95 - (AD + NO - 5)/15 + 1/M**2 - J/4                                                                                                                                                                                                              
~ 257                 Where J is the opponent's last move                                                                                                                                                                                                                     
~ 258                                                                                                                                                                                                                                                                         
~ 259                 If Random (R) <= P:                                                                                                                                                                                                                                     
~ 260                                                                                                                                                                                                                                                                         
~ 261                     Cooperate                                                                                                                                                                                                                                           
~ 262                                                                                                                                                                                                                                                                         
~ 263                 Else:                                                                                                                                                                                                                                                   
~ 264                                                                                                                                                                                                                                                                         
~ 265                     Set FD = 1                                                                                                                                                                                                                                          
~ 266                     Defect                                                                                                                                                                                                                                              
~ 267                                                                                                                                                                                                                                                                         
+ 268         When FD = 1:                                                                                                                                                                                                                                                    
+ 269                                                                                                                                                                                                                                                                         
+ 270             Set FD = 2                                                                                                                                                                                                                                                  
+ 271             Set the variable 'AD':                                                                                                                                                                                                                                      
+ 272             AD = ((AD * AK) + 3 - (3 * J) + (2 * JA) - (JA * J)) / (AK + 1)                                                                                                                                                                                             
+ 273             Where JA is the strategy's last move and J is the opponent's last move (C = 0, D = 1)                                                                                                                                                                       
+ 274             Increase the value of AK by 1                                                                                                                                                                                                                               
+ 275             Cooperate                                                                                                                                                                                                                                                   
+ 276                                                                                                                                                                                                                                                                         
+ 277         When FD = 2:                                                                                                                                                                                                                                                    
+ 278                                                                                                                                                                                                                                                                         
+ 279             Set FD = 0                                                                                                                                                                                                                                                  
+ 280             Set the variable 'NO':                                                                                                                                                                                                                                      
+ 281             NO = ((NO * NK) + 3 - (3 * J) + (2 * JA) - (JA * J) / (NK + 1)                                                                                                                                                                                              
+ 282             Where JA the strategy's last move and J is the opponent's last move (C = 0, D = 1)                                                                                                                                                                          
+ 283             Increase the value of NK by 1                                                                                                                                                                                                                               
+ 284             Cooperate                                                                                                                                                                                                                                                   
+ 285                                                                                                                                                                                                                                                                         
  286     Tranquilizer came in 27th place in Axelrod's second torunament. 


Names:

- Craig Feathers: [Axelrod1980]_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for thine: Craig is the name of the author and you've got it in the above docstring.



Has a variable, 'FD' which can be 0, 1 or 2. It has an initial value of 0
Has a variable 'S', which counts the consecutive number of times the opponent has played D (i.e. it is reset to 0 if the opponent plays C). It has an initial value of 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reduce these to 80 characters please (PEP8).


if self.FD == 2:
self.FD = 0
self.ratio_FD2 = ((self.ratio_FD2 * self.ratio_FD2_count + 3 - 3 * self.dict[opponent.history[-1]]) + 2 * self.dict[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and similar lines) are breaking PEP8 (limit to 80 characters) in quite a few places. This is mainly because this is such a mouthful.

Try and fix it by modifying things so that you create the summation in multiple steps perhaps.

self.dict = {C: 0, D: 1}


def update_stateFD(self, opponent): # Calculates the ratioFD values and P values, as well as sets the states of FD at the start of each turn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take the comment you have at the end of this and put it in a doctsing:

def update(self, opponent):
    """
    Calculates the ...
    """

@@ -5,6 +5,8 @@
import axelrod
from .test_player import TestPlayer

from axelrod.interaction_utils import compute_final_score
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe you are using this in your tests (so you can delete the line).

Remove unnecessary code.

Signed-off-by: Mansour Hakem <mansourhakem@googlemail.com>
@MHakem
Copy link
Member Author

MHakem commented Aug 24, 2017

Thanks for styling suggestions @drvinceknight.

self.ratio_FD2_count += 1
elif self.FD == 1:
self.FD = 2
self.ratio_FD1 = ((self.ratio_FD1 * self.ratio_FD1_count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a syntax error here. You have opened two brackets and only closed one.

It's probably always a good idea to run your tests locally (even if you only run the strategy specific ones) to make sure small typos like this don't get in :) (They get picked up by the automated test runners anyway so it's not a big deal but it'll help you get feedback faster :)) 👍

Signed-off-by: Mansour Hakem <mansourhakem@googlemail.com>
Signed-off-by: Mansour Hakem <mansourhakem@googlemail.com>
Signed-off-by: Mansour Hakem <mansourhakem@googlemail.com>
Signed-off-by: Mansour Hakem <mansourhakem@googlemail.com>
@marcharper
Copy link
Member

Although the fingerprints are quite similar, they seem a little off on the right side. Do we have a standard in this case for "strategies are equivalent"? I think we need an official policy for the Fortran translations.

@drvinceknight
Copy link
Member

drvinceknight commented Aug 25, 2017

Although the fingerprints are quite similar, they seem a little off on the right side. Do we have a standard in this case for "strategies are equivalent"? I think we need an official policy for the Fortran translations.

The fingerprint is just there as a quick indication that this is worth spending time to review. We do have an official policy for the Fortran strategies which is just the same policy as for other strategies: we need to review the actual code and ensure the logic is correct.

@MHakem (a student working with me) has spent quite a while working with this and reverse engineering the strategy so I'm pretty certain it's going to be if not correct - which it seems like it's not quite - very close to correct and just needs one of us to go through with a fine tooth comb. :)

Prisoner's Dilemma" paper: The rule normally cooperates but
is ready to defect if the other player defects too often.
Thus the rule tends to cooperate for the first dozen or two moves
if the other player is cooperating, but then it throws in a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs are still breaking, sphinx is super sensitive with blank space :)

Here you need to remove the space before the if.


- Has a variable, 'FD' which can be 0, 1 or 2. It has an initial value of 0
- Has a variable 'S', which counts the consecutive number of
times the opponent has played D (i.e. it is reset to 0 if the opponent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all of this, when you bullet overhangs it needs to start at the same place:

- Has a variable 'S', which counts the consecutive number of 
  times the opponent has played D (i.e. it is reset to 0 if the opponent 

- Has a variable 'AD' with an initial value of 5
- Has a variable 'NO with an initial value of 0

Has a variable 'NO with an initial value of 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a bullet (so remove the blank line in between this and the previous bullet).

@MHakem
Copy link
Member Author

MHakem commented Aug 25, 2017

flowchart

Dictionaries have been implemented to allow for the calculation of the AD and NO values, which both require the numerical value of C and D.

current_score = K
self.history = JA
opponent.history = J
consecutive_defections = S
randomValue = R
ratio_FD1 = AD
ratio_FD2 = NO
ratio_FD1_count = AK
ratio_FD2_count = NK
P = P (I'm in agreement with @meatballs that P most likely means probability)

I'll investigate the reason behind the disparity with regards to the fingerprints.

@meatballs
Copy link
Member

Nice work, @MHakem !!!

@marcharper
Copy link
Member

There are some supposed match outcomes for tranquilizer here, may be worth testing (but they may not be valid).

@drvinceknight
Copy link
Member

Sorry for taking a while to get to this @MHakem, my plan is to go over things in detail tomorrow morning (at a conference and have some down time tomorrow). 👍

Copy link
Member

@drvinceknight drvinceknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've definitely found one clear error @MHakem. Of course, there might be others but it's definitely a step in the correct direction.

return self.history[-1]
else:
if self.score == "good":
self.num_turns_after_good_defection = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (the change of state from 0 to 1) took me a while to find. Could it (it might not be worth it so just a suggestion) be moved to inside the update_state method? That way anything relating to updating the state will be in there.

Copy link
Member Author

@MHakem MHakem Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I just realized, that the state can only update given the R <= P, which is not known within the update_state method, would it be worth setting the value of R and checking within that method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes good point. Let's leave it as it is for now. Let's aim to get the code to be right first :) 👍

else:
Tranquilizer.update_state(self, opponent)
if opponent.history[-1] == D:
self.consecutive_defections += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this opponent_consecutive_defections.

if self.consecutive_defections == 0:
return C
else:
return self.history[-1]
Copy link
Member

@drvinceknight drvinceknight Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where this line comes from in the description at #1103 and/or in the Fortran code. I believe this is causing an error as it allows for the strategy to defect three times in a row.

Here is a particular match against Bully (http://axelrod.readthedocs.io/en/stable/_modules/axelrod/strategies/titfortat.html#Bully) that shows difference of behaviour.

Here is what the Fortran strategy does (not that this happens across all seeds so there's no variation here):

>>> import axelrod as axl
>>> import axelrod_fortran as axlf
>>> fortran_player = axlf.Player("k67r")
>>> bully = axl.Bully()
>>> for seed in range(10):
...     axl.seed(seed)
...     match = axl.Match((fortran_player, bully), turns=8)
...     results = match.play()
...     print(list(zip(*results))[0])  # Printing only the actions of `k67r`
(C, D, D, C, C, D, D, C)
(C, D, D, C, C, D, D, C)
(C, D, D, C, C, D, D, C)
(C, D, D, C, C, D, D, C)
(C, D, D, C, C, D, D, C)
(C, D, D, C, C, D, D, C)
(C, D, D, C, C, D, D, C)
(C, D, D, C, C, D, D, C)
(C, D, D, C, C, D, D, C)
(C, D, D, C, C, D, D, C)

note how it's last 3 turns are always two Ds followed by a C

Here is your implementation:

>>> axl_player = axl.Tranquilizer()
>>> bully = axl.Bully()
>>> for seed in range(10):
...     axl.seed(seed)
...     match = axl.Match((axl_player, bully), turns=8)
...     results = match.play()
...     print(list(zip(*results))[0])
(C, D, D, C, C, D, D, D)
(C, D, D, C, C, D, D, D)
(C, D, D, C, C, D, D, D)
(C, D, D, C, C, D, D, C)
(C, D, D, C, C, D, D, C)
(C, D, D, C, C, D, D, D)
(C, D, D, C, C, D, D, D)
(C, D, D, C, C, D, D, C)
(C, D, D, C, C, D, D, D)
(C, D, D, C, C, D, D, C)

We see here that the strategy sometimes defects 3 times in a row (explicitly said to not happen).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"But as long as TRANQUILIZER is maintaining an average payoff of at least 2.25 points per move, it will never defect twice in succession". I may have made a calculation error, but I do believe that the average score per turn is lower than 2.25.

If R is <= P, then the program will return the value of K67R (which holds either 0 or 1). The only other times when the value of K67R is changed is when R >= P , the opponent's last move was a cooperation (i.e. consecutive_defections = 0) or if the score is lower than 1.75 per turn.

From lines 402 to 405, we consider the case of when R <= P as when this is true, it implies that only possible process that could have actually changed the value of K67R was whether the opponent's last move was a cooperation - if it was, then K67R is changed to 0 and hence return C, if it wasn't then the value of K67R is unaltered and remains the same as the last turn so return self.history[-1].

Copy link
Member

@drvinceknight drvinceknight Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining that. It makes sense but as you can see it's not matching up with the results against Bully above. I'm not entirely sure where/why, I suggest you see if you can track it down and I'll do the same.

Copy link
Member

@drvinceknight drvinceknight Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is fixed, let's add the following test for behaviour against Bully:

actions = [(C, D), (D, D), (D, C), (C, C), (C, D), (D, D), (D, C), (C, C)]
self.versus_test(axl.Bully(), expected_actions=actions, seed=0)

@drvinceknight
Copy link
Member

If helpful, I was able to identify the above incorrect result against Bully by using the new fingerprints recently added to the library (you can access these if you pull the latest master branch and merge or rebase - ping me on our slack channel if you'd like me to help with this but you don't necessarily need to, you can just aim to get the matches against bully working for now).

Here is how they work:

>>> opponents = [s() for s in axl.basic_strategies]
>>> fortran_tf_v_short = axl.TransitiveFingerprint(fortran_player, opponents=opponents)
>>> fortran_tf_v_short.fingerprint()
>>> fortran_tf_v_short.plot(display_names=True);

fortran

>>> axl_tf_v_short = axl.TransitiveFingerprint(axl_player, opponents=opponents)
>>> axl_tf_v_short.fingerprint()
>>> axl_tf_v_short.plot(display_names=True);

axl

>>> import matplotlib.pyplot as plt
>>> plt.imshow(fortran_tf_v_short.data - axl_tf_v_short.data)  # manually looking at the difference

diff

(The names don't look great because I'm using a small strategy list but I was still able to make out that something wasn't right with Bully).

Copy link
Member

@drvinceknight drvinceknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nikoleta-v3 and I have just finished a big session reading through and debugging and we have found a couple of things wrong (in particular these fix the results against Bully but there are still other things not quite right). Our suggestion is to properly separate the stochastic state and the cooperative state updates.

Let me know if this is unclear.

two_turns_after_good_defection_ratio and the probability values,
as well as sets the value of num_turns_after_good_defection.
"""
self.current_score = compute_final_score(zip(self.history, opponent.history))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this from this method and just do it in the strategy (see my comment there).


def strategy(self, opponent: Player) -> Action:

current_score = compute_final_score(zip(self.history, opponent.history))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to:

self.current_score = compute_final_score(zip(self.history, opponent.history))

remove the same calculation in update_state (see my comment there).

self.score = "good"
elif (self.current_score[0] / ((len(self.history)) + 1)) >= 1.75:
self.probability = (
(.25 + (opponent.cooperations / ((len(self.history)) + 1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually needs to be opponent.cooperations + 1: the Fortran strategies are passed a dummy initial "cooperation" so the C in that code is in fact opponent.cooperations + 1.

if len(self.history) == 0:
return C
else:
Tranquilizer.update_state(self, opponent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be:

self.update_state(opponent)

return C
else:
Tranquilizer.update_state(self, opponent)
if opponent.history[-1] == D:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing a problem with your current code. Your update_state method is also using (for the case when FD is 0) opponent_consecutive_defections however because of the order in which you have done things this is using the count of consecutive defections from the previous round.

The way I suggest you fix this (and this will require a bit of work) is to leave this where it is but:

  • Rename update_state to be update_cooperative_state and ONLY do the cases corresponding to FD==1 and FD==2 there.
  • Move the code for the other state (the probabilistic one for the calculation of self.probability) to a method called update_stochastic_state and that can be called after you have updated self.opponent_consecutive_defections.

…ify probability equation and hence modify tests.
Copy link
Member

@drvinceknight drvinceknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting quite a bit of a refactor here @MHakem. I suggest you add the extra test first and then commit that (checking that the tests pass). After you've done that make small piece meal changes checking things as you go and committing if you see fit.

Perhaps first though have a read through what I'm suggesting (which I think cleans up the code quite a bit) before you implement it :) Any questions: get in touch! :)

"two_turns_after_good_defection_ratio": 0,
"one_turn_after_good_defection_ratio_count": 1,
"two_turns_after_good_defection_ratio_count": 1})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add:

opponent = axelrod.Bully()
actions = [(D, C), (D, D), (C, D), (C, C), (D, C), (D, D), (C, D), (C, C)]
self.versus_test(opponent, ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MHakem can you add this test please.


if len(self.history) == 0:
return C
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for else, you can just unindent everything (the previous return ends the function there and then).


self.current_score = compute_final_score(zip(self.history, opponent.history))

if len(self.history) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to before the self.current_score (basically on the first move: do nothing at all and just return a C).

return C
else:
self.update_cooperative_state(opponent)
if opponent.history[-1] == D:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this the update of consecutive opponent defections to inside the update_cooperative_state.

if len(self.history) == 0:
return C
else:
self.update_cooperative_state(opponent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this line, let's go for a:

if  self.num_turns_after_good_defection in [1, 2]:
    self.update_cooperative_state(opponent)
    return C

This is a nice quick and easy to debug way to say that if we're in a cooperative state, update the cooperative state and then return a C. The whole rest of the code can then be cleaned up quite a lot to consider the case of when we're not in a cooperative state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I may have misunderstood something here, but how did you come to reason that given self.num_turns_after_good_defection != 0, the response would be to return C? I found the response to be dependent on the action of the opponent on the last turn (due to the fact that the score may drop below 2.25 and hence allow for double defection).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're both a bit wrong here.

These lines: https://github.com/Axelrod-Python/TourExec/blob/v0.3.0/src/strategies/k67r.f#L29:

545   K67R = 0
      IF (ABS(FD - 1.5) .EQ. .5) GOTO 599

mean that if FD is 1 or 2 (what ABS(FD - 1.5) .EQ. .5) is equivalent to then we go to 599 (which is just a return.

Where I am a bit wrong is that this needs to happen after we update the cooperative states. So: if after updating the cooperative states the state is in [1, 2] then return C:

self.update_cooperative_state(opponent)
if  self.num_turns_after_good_defection in [1, 2]:
    return C

This is essentially what lines 15 to 29 do in the Fortran code.

self.opponent_consecutive_defections += 1
else:
self.opponent_consecutive_defections = 0
self.update_stochastic_state(opponent)
Copy link
Member

@drvinceknight drvinceknight Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of update_stochastic_state completely I think and just have the logic for the stochastic state directly here. The rest of the strategy then just becomes (please check carefully):

if (self.current_score[0] / ((len(self.history)) + 1)) >= 2.25:
                probability = (
                    (.95 - (((self.one_turn_after_good_defection_ratio)
                    + (self.two_turns_after_good_defection_ratio) - 5) / 15)) 
                    + (1 / (((len(self.history))+1) ** 2))
                    - (self.dict[opponent.history[-1]] / 4)
                    )
               if random.random() <= probability:  # I have plans for changing this a bit but let's take that one step at a time
                   return C
               self.num_turns_after_good_defection = 1
               return D
if (self.current_score[0] / ((len(self.history)) + 1)) >= 1.75:
                probability = (
                   (.25 + ((opponent.cooperations + 1) / ((len(self.history)) + 1)))
                   - (self.opponent_consecutive_defections * .25) 
                   + ((self.current_score[0] 
                   - self.current_score[1]) / 100) 
                   + (4 / ((len(self.history)) + 1))
                   )
               if random.random() <= probability:
                   return C
               self.num_turns_after_good_defection = 1
               return D
return opponent.history[-1]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this in turn gets rid of a few of the internal variables (self.probability and self.score) you should change the init method to reflect that.

Copy link
Member

@drvinceknight drvinceknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor tweak that will require tests changes.

Definitely looking good now!

self.two_turns_after_good_defection_ratio= 0
self.one_turn_after_good_defection_ratio_count = 1
self.two_turns_after_good_defection_ratio_count = 1
self.current_score = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this (it doesn't need to be an attribute and can just be calculated on the fly).

self.dict = {C: 0, D: 1}


def update_cooperative_state(self, opponent):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now updates more than the cooperative state (it also updates the consecutive defections) so let's go back to calling it:

def update_state

self.update_cooperative_state(opponent)
if self.num_turns_after_good_defection in [1, 2]:
return C

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move self.current_score = compute_final_score(zip(self.history, opponent.history)) to here. IE move it to after the code block that returns C when we're in a cooperative state.

)
if random.random() <= probability:
return C
self.num_turns_after_good_defection = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be here. As you can see here: https://github.com/Axelrod-Python/TourExec/blob/v0.3.0/src/strategies/k67r.f#L38 we don't change state if this random test fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require unit tests to be fixed.

Copy link
Member

@drvinceknight drvinceknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A further request for the change of the docstring.

one-quarter of the time.


- Has a variable, 'FD' which can be 0, 1 or 2. It has an initial value of 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove all of this part of the docstring and replace it with something along the lines of:

The strategy starts by cooperating this strategy has 3 states.

At the start of the strategy it updates it's states:

- It counts the number of consecutive defections by the opponent.
- If it was in state 2 it moves to state 0 and calculates the following quantities [INCLUDE THE CALCULATIONS]
- If it was in state 1 it moves to state 2 and calculates the following quantities [INCLUDE THE CALCULATIONS]

If after this it is in state 1 or 2 then it cooperates.

If it is in state 0 it will potentially perform 1 of the 2 following stochastic tests:

1. If [CONDITION ON THE SCORE] then it calculates a value [DETAILS ABOUT CALCULATION OF probability] and will cooperate if a random sampled number is less than that value. If it does not cooperate then the strategy moves to state 1 and defects.
2. If [CONDITION ON THE SCORE] then it calculates a value  [DETAILS ABOUT CALCULATION OF probability] and will cooperate if a random sampled number is less than that value. If not, it defects.

If none of the above holds the player simply plays tit for tat.

@drvinceknight
Copy link
Member

We have fingerprinted our suggested changes and the strategies match up (10000 reps).

axl:

download 36

axlf:

download 37

The difference (at most 200th of a percent which is just noise):

download 38

Copy link
Member

@drvinceknight drvinceknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there @MHakem.

Mainly style changes from me at this point as well as that request for the extra test against Bully (see comment with the code).


"""
Submitted to Axelrod's second tournament by Craig Feathers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with this:

Description given in Axelrod's "More Effective Choice in the 
Prisoner's Dilemma" paper: The rule normally cooperates but 
is ready to defect if the other player defects too often. 
Thus the rule tends to cooperate for the first dozen or two moves
if the other player is cooperating, but then it throws in a 
defection. If the other player continues to cooperate, then defections 
become more frequent. But as long as Tranquilizer is maintaining an 
average payoff of at least 2.25 points per move, it will never defect 
twice in succession and it will not defect more than 
one-quarter of the time.

"""
Submitted to Axelrod's second tournament by Craig Feathers

This strategy is based on the reverse engineering of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strategy -> This implementation


- It counts the number of consecutive defections by the opponent.
- If it was in state 2 it moves to state 0 and calculates the
following quantities two_turns_after_good_defection_ratio and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need two spaces:

- If it was ...
  following...
  two_turns_...

also though include the formulae for two_turns_after_good_defection_ratio and two_turns_after_good_defection_ratio_count.

following quantities two_turns_after_good_defection_ratio and
two_turns_after_good_defection_ratio_count.
- If it was in state 1 it moves to state 2 and calculates the
following quantities one_turn_after_good_defection_ratio and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to have two extra spaces (as above). Also as above, include the formulae for the two quantities.

If none of the above holds the player simply plays tit for tat.


The strategy follows the following algorithm::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all of this algorithm (it's just a repetition of above).


def __init__(self):
super().__init__()
self.num_turns_after_good_defection = 0 # equal to FD variable in Fortran code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... in original Fortran code

def __init__(self):
super().__init__()
self.num_turns_after_good_defection = 0 # equal to FD variable in Fortran code
self.opponent_consecutive_defections = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a similar inline comment to all of these please?

# equal to ...

- current_score[1]) / 100)
+ (4 / ((len(self.history)) + 1))
)
if random.random() <= probability:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note for whoever reviews this after me: we could be tempted to use random_choice but probability is not really a well defined probability (it can be greater than 1). I suggest we leave it as it is for the sake of simplicity of the code.


opponent = axelrod.Defector()
actions = [(C, D)] + [(D, D)] * 20
self.versus_test(opponent, expected_actions=actions, attrs={"num_turns_after_good_defection": 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fix the PEP8 on all of these. Two options, I have a slight preference for option 2:

Option 1:

`       opponent = axelrod.Defector()                                                                                                               
        actions = [(C, D)] + [(D, D)] * 20                                                                                                          
        self.versus_test(opponent, expected_actions=actions,                                                                                        
                           attrs={"num_turns_after_good_defection": 0,                                                                                
                                  "one_turn_after_good_defection_ratio": 5,                                                                           
                                  "two_turns_after_good_defection_ratio": 0,                                                                          
                                  "one_turn_after_good_defection_ratio_count": 1,                                                                     
                                  "two_turns_after_good_defection_ratio_count": 1})

Option 2:

`       opponent = axelrod.Defector()                                                                                                               
        actions = [(C, D)] + [(D, D)] * 20      
        expected_attrs = {"num_turns_after_good_defection": 0,                                                                                
                          "one_turn_after_good_defection_ratio": 5,                                                                           
                          "two_turns_after_good_defection_ratio": 0,                                                                          
                          "one_turn_after_good_defection_ratio_count": 1,                                                                     
                          "two_turns_after_good_defection_ratio_count": 1})                                                                                                    
        self.versus_test(opponent, expected_actions=actions, attrs=expected_attrs)                                                                                    
                           

"two_turns_after_good_defection_ratio": 0,
"one_turn_after_good_defection_ratio_count": 1,
"two_turns_after_good_defection_ratio_count": 1})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MHakem can you add this test please.

return C
self.num_turns_after_good_defection = 1
return D
elif (current_score[0] / ((len(self.history)) + 1)) >= 1.75:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not need to be an elif: just an if. Also include a blank line to separate things out a bit (just after the return D).

@drvinceknight
Copy link
Member

For completeness, here are some more fingerprints differences which show agreement to within a percent. This coupled with careful examination of the implementation mean that I'm now sure this is correctly implemented.

  • reps = 20k, opponents = axl.basic_strategies:

download 39

  • reps = 20k, opponents = random spectrum:

download 40

Copy link
Member

@drvinceknight drvinceknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final thing from me, could you modify the table at here: https://github.com/Axelrod-Python/Axelrod/blob/master/docs/reference/overview_of_strategies.rst#axelrods-second-tournament

to note that the strategy is implemented (see how the other implemented strategies are implemented).

@drvinceknight
Copy link
Member

I believe this is now an accurate implementation of the Fortran strategy. Both based on the fingerprints but more so because of the fact that @Nikoleta-v3 and I very carefully examined the original code.

Fortran strategy K67R from Axelrod's second tournament.
Reversed engineered by: Owen Campbell, Will Guo and Mansour Hakem.

The strategy starts by cooperating this strategy has 3 states.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strategy starts by cooperating and has 3 states.


The strategy starts by cooperating this strategy has 3 states.

At the start of the strategy it updates it's states:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its rather than it's

"""
Calculates the ratio values for the one_turn_after_good_defection_ratio,
two_turns_after_good_defection_ratio and the probability values,
as well as sets the value of num_turns_after_good_defection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'and' instead of 'as well as'


def strategy(self, opponent: Player) -> Action:

if len(self.history) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be neater as if not self.history

@meatballs
Copy link
Member

Looking good! Just some minor stuff from me - mainly grammatical nit picking!!

@meatballs meatballs merged commit 3fd3250 into Axelrod-Python:master Sep 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants