# Introduction to basic agents and games
This tutorial gives a brief introduction to the basic agents and payoff matreces implemented in the tomsup package. It describes the decision making function and its parameters on an overall basis. For an introduction to the more complex theory of mind (ToM) agent, see *introduction_to_tom.ipynb*. For a more in-depth introduction to individual agent see ```ts.valid_agents()```

In [2]:
#assuming you are in the github folder change the path - not relevant if tomsup is installed via. pip
import os
os.chdir("..") # go out of the tutorials folder
import tomsup as ts # and import tomsup

In [None]:
penny_comp = ts.PayoffMatrix(name = "penny_competitive")

## The Random Bias (RB) Agent
The RB agent is the simplest possible agent. It simply selects randomly between options 0 and 1, with a specified bias parameter. The bias parameter must be between 0 and 1, and specifies the probability of the RB agent choosing 1. If nothing else is specified, RB uses a default bias parameter of 0.5.

In [None]:
#Create the RB agent
Sir_RB = ts.RB(bias = 0.8)
#Have it make its choice in the first round
Sir_RB.compete(p_matrix = penny, agent = 0, op_choice = None)
#Have it make its choice in the second round.
Sir_RB.compete(p_matrix = penny, agent = 0, op_choice = 1)

## The Tit-for-Tat (TFT) Agent
The TFT agent rewards cooperative behavior with cooperation, and punishes deceptive behavior with deception. This is operationalized generally as choosing what the opponent chose in the last round. The TFT has one parameter: the probability of it copying the opponent. As default, this probability is 1, but it can be changed to allow for probabilistic behavior.

In [None]:
#Create the RB agent
Sir_TFT = ts.TFT(copy_prob = 0.8)
#Have it make its choice in the first round
Sir_TFT.compete(p_matrix = penny, agent = 0, op_choice = None)
#Have it make its choice in the second round.
Sir_TFT.compete(p_matrix = penny, agent = 0, op_choice = 1)
#Have it make its choice in the third round.
Sir_TFT.compete(p_matrix = penny, agent = 0, op_choice = 1)

## The Win-Stay Loose-Switch (WSLS) Agent
The WSLS agent uses the Win-Stay Loose-Switch strategy. This means that it will select the same option during the round after a round which it won, and switch options after those in which it lost. Winning a round is operationalized generally as getting more points than the average of the payoff matrix, which usually is consistent with game-specific operationalisations. During the first round, the WSLS selects a random option. The WSLS agent has two parameters: it's probability of staying when winning and of switching when losing. As default, both parameters are 1, but they can be changed to allow for probabilistic behavior.

In [None]:
#Create the RB agent
Sir_WSLS = ts.WSLS(prob_stay = 0.9, prob_switch = 1)
#Have it make its choice in the first round
Sir_WSLS.compete(p_matrix = penny, agent = 0, op_choice = None)

#Have it make its choice in the second round.
Sir_WSLS.compete(p_matrix = penny, agent = 0, op_choice = 1)
#Have it make its choice in the third round.
Sir_WSLS.compete(p_matrix = penny, agent = 0, op_choice = 1)



## The Q-Learning (QL) Agent
The QL agent implements a simple reinforcement learning algorithm to update beliefs of the value of the two choice options based on experience. It only updates the believed value of the chosen option each turn, depending on the reward it gets. It has two important model parameters. One is the learning rate, which must be between 0 and 1, and determines the size of the belief updates (where 0 is no update and 1 is complete update to the last experienced value). The other is the behavioral temperature, which must be above 0, and which determines the degree to which the behavior of the QL agent is random. Lower temperature values result in deterministic behavior where the agent simply chooses the option with the highest believed value, while higher temperatures giver more probabilistic behavior. During the first round, the QL agent makes a choice based on its starting beliefs, which can also be specified as a model parameter. As default, the QL uses a medium learning rate of 0.5, a very low temperature of 0.001, and agnostic starting beliefs of 0.5 for both options.

In [None]:
#Create the RB agent
Sir_QL = ts.QL(learning_rate = 0.7, b_temp = 0.3, expec_val = [0.9, 0.5])
#Have it make its choice in the first round
Sir_QL.compete(p_matrix = penny, agent = 0, op_choice = None)
#Have it make its choice in the second round.
Sir_QL.compete(p_matrix = penny, agent = 0, op_choice = 1)
#Have it make its choice in the third round.
Sir_QL.compete(p_matrix = penny, agent = 0, op_choice = 1)
