Skip to content

Latest commit

 

History

History
156 lines (133 loc) · 5.65 KB

README.md

File metadata and controls

156 lines (133 loc) · 5.65 KB

Why choose Sheepshead as an Example?

In this tutorial we use a certain variant of the bavarian card game sheeps head(Schafkopf) to learn and understand basic machine learning/reinforcement learning techniques. Sheepshead:

  • is an imperfect information game (you do not know the cards of your enemys)
  • this makes it much harder compared to chess, go etc.
  • you have many diffrent possibilities of dealing 32 cards to 4 players:
    • 32 x 24 x 16 = 12.288
  • you have to decide in step1 what game(DURCHMARSCH OR NOT) you choose (for RAMSCH) which is very hard to learn!
  • if you extend this game to RUF you have to learn to play with another player in a team.

Sheepshead Rules

Here is a quick overview of the rules. There are 4 player having each 8 cards. rules_schafkopf

The game consists of 2 phases:

  1. Declaration Phase
    • for simplicity only weg/weiter is allowed for each player
    • this means it is alwayse played RAMSCH
    • In this RAMSCH variant hearts is trump!
    • each player plays on his own.
  2. Playing Phase
    • ech player plays a card after each other
    • afer a round the highest card is evaulated
    • the player with the highest card gets the trick(all 4 cards)
    • if all cards are played you count the points in your tricks
    • normally you aim to have no tricks at all!
    • for more details see here
  3. Money Phase : Ramsch
    • Case 1 DURCHMARSCH: if one player has >90 points
      • the one with >90 points wins 45 cents
      • all others loose 15 cents
    • Case 2 NORMAL GAME: nobody has >90 points
      • if you have 0 points you get 15cents
      • if you have <30 points you get 10cents
      • if you have >30 points and you did not loose you get 5cents
      • the looser gets all minus cents what other players earned

Monte Carlo Tree Search

A Monte Carlo Tree Search algorithm consists of the following 4 steps mcts_algorithm

An example for Schafkopf would be: mcts_nodes_explained

if you want to save the mcts trees use option "save_tree": 1

Subsampling:

  • Unfortunately Sheepshead is an imperfect information game.
  • This means we do not know the cards of the other players.
  • That is why we can estimate (subsample) the cards of our enemys.
  • This means we can take a best guess (based on the cards that were played already) mcts_nodes_explained

Monte Carlo Tree Search Player

Player type: "MCTS_OFF_10_50"

  • OFF:
    • OFF means no subsampling: it is assumed that the MCTS Player knows all other cards
    • if you put a number here e.g. MCTS_20_10_50 --> the other player cards are subsampled 20 times and for each subsample the best card is evaluated
  • 10:
    • number of playouts (the higher the better)
    • if this number is higher the tree is more deep
  • 50: ucb constant
    • if you choose 1 --> exploitation
    • if you choose 1000 --> exploration

Exploration vs Exploitation

Let's take this example:

ddd

  • Player 1 has ucb_const = 1: MCTS_20_50_1: ddd ddd

  • If Player has ucb_const = 1000: MCTS_20_50_1000: ddd ddd

  • You can see that the tree for ucb=1 is more deep and the apprently best option GA is exploited more!

  • The GA has 23 visits and many other options as e.g. HA, H9, EU is only visited once

  • For ucb=1000 you can see that the tree is not so deep but more spread out.

  • The GA has 7 visits which is only one more compared to all other nodes

  • You should find a good balance between exploraion and exploitation that is why I advise to use ucb_const=200: ddd

Proximal Policy Optimization

will soon come

Transformers Network

will come in the future

Example Game

Max's hand: [EO, GO, EU, GU, HU, EA, E9, SA] type: RANDOM
Lea's hand: [HO, H9, EK, GA, GX, GK, SX, S8] type: RANDOM
Jo's hand: [SU, HA, H8, H7, EX, G9, G7, S7] type: RANDOM
Tim's hand: [SO, HX, HK, E8, E7, G8, SK, S9] type: RANDOM
0-0: Max RANDOM declares weg
0-1: Lea RANDOM declares weg
0-2: Jo RANDOM declares weg
0-3: Tim RANDOM declares weg

1-4: Max RANDOM plays E9
1-5: Lea RANDOM plays H9
1-6: Jo RANDOM plays EX
1-7: Tim RANDOM plays S9
        Winner: Lea with H9 --> 10

2-8: Lea RANDOM plays GX
2-9: Jo RANDOM plays HA
2-10: Tim RANDOM plays G8
2-11: Max RANDOM plays GU
        Winner: Max with GU --> 23

3-12: Max RANDOM plays HU
3-13: Lea RANDOM plays S8
3-14: Jo RANDOM plays H8
3-15: Tim RANDOM plays SK
        Winner: Max with HU --> 6

4-16: Max RANDOM plays EA
4-17: Lea RANDOM plays SX
4-18: Jo RANDOM plays G7
4-19: Tim RANDOM plays E8
        Winner: Max with EA --> 21

5-20: Max RANDOM plays GO
5-21: Lea RANDOM plays EK
5-22: Jo RANDOM plays SU
5-23: Tim RANDOM plays SO
        Winner: Max with GO --> 12

6-24: Max RANDOM plays SA
6-25: Lea RANDOM plays HO
6-26: Jo RANDOM plays S7
6-27: Tim RANDOM plays E7
        Winner: Lea with HO --> 14

7-28: Lea RANDOM plays GK
7-29: Jo RANDOM plays G9
7-30: Tim RANDOM plays HX
7-31: Max RANDOM plays EO
        Winner: Max with EO --> 17

8-32: Max RANDOM plays EU
8-33: Lea RANDOM plays GA
8-34: Jo RANDOM plays H7
8-35: Tim RANDOM plays HK
        Winner: Max with EU --> 17

Max's hand: [] offhand: [[GU, GX, HA, G8], [HU, S8, H8, SK], [EA, SX, G7, E8], [GO, EK, SU, SO], [EO, GK, G9, HX], [EU, GA, H7, HK]] points: 96 --> 45$
Lea's hand: [] offhand: [[E9, H9, EX, S9], [SA, HO, S7, E7]] points: 24 --> -15$
Jo's hand: [] offhand: [] points: 0 --> -15$
Tim's hand: [] offhand: [] points: 0 --> -15$