Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add error count to contrib/ci_engine #802

Closed
Debilski opened this issue Jun 27, 2024 · 9 comments · Fixed by #813
Closed

Add error count to contrib/ci_engine #802

Debilski opened this issue Jun 27, 2024 · 9 comments · Fixed by #813

Comments

@Debilski
Copy link
Member

The CI engine should take note when a bot fails with a fatal error to help with debugging. (Also the seed should be stored with the game info.)

@otizonaizit
Copy link
Member

well yes, the CI engine needs a serious revamp... should I work on it or are you already doing stuff? I'd like for the thing to at least read the same conf file as the pelita-server, so that we don't have to specify the list of players twice.

@Debilski
Copy link
Member Author

Yeah, I’ll do some minor refactorings later to make it more useful. Scores so far:

                # name matches score (1/0/-1)
                aspp2021_4  219   0.88
                aspp2023_2  217   0.75
                aspp2022_2  218   0.64
                aspp2021_3  218   0.61
                aspp2021_1  217   0.45
                aspp2019_3  218   0.37
                aspp2022_0  217   0.31
                aspp2023_4  217   0.18
                aspp2021_0  217  -0.02
                aspp2021_2  218  -0.32
                aspp2019_1  218  -0.37
                aspp2022_1  217  -0.41
                aspp2019_4  217  -0.47
                aspp2022_4  218  -0.49
                aspp2022_3  217  -0.58
                aspp2019_2  217  -0.59
                aspp2019_0  218  -0.94

@otizonaizit
Copy link
Member

oh, and the TU players are not performing at all? Wouldn't it be easier to interpret the results if instead of score one would show percent-win? Otherwise it is difficult to distinguish a bot who draws all the time versus a bot who either wins or loses with 50% probability. Also percent-win will then be more independent from the number of matches than score is...

@Debilski
Copy link
Member Author

(I forgot to add the TU players to the config)

Yeah, the output has a much bigger table with all this info (more or less). But it is 2d and needs to be shrunk :)

@Debilski
Copy link
Member Author


┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name                         ┃ # Matches ┃ # Wins ┃ # Draws ┃ # Losses ┃ Score                ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ aspp2021_4                   │ 262       │ 233    │ 6       │ 23       │ 0.8015267175572519   │
│ tube2024_0                   │ 139       │ 118    │ 8       │ 13       │ 0.7553956834532374   │
│ bayes_avengers               │ 139       │ 116    │ 8       │ 15       │ 0.7266187050359713   │
│ aspp2023_2                   │ 248       │ 201    │ 8       │ 39       │ 0.6532258064516129   │
│ aspp2021_3                   │ 251       │ 191    │ 3       │ 57       │ 0.5338645418326693   │
│ aspp2022_2                   │ 266       │ 200    │ 1       │ 65       │ 0.5075187969924813   │
│ tube2024_1                   │ 139       │ 99     │ 4       │ 36       │ 0.45323741007194246  │
│ shake_dat_botty              │ 139       │ 95     │ 3       │ 41       │ 0.38848920863309355  │
│ aspp2021_1                   │ 256       │ 174    │ 6       │ 76       │ 0.3828125            │
│ aspp2019_3                   │ 257       │ 169    │ 3       │ 85       │ 0.32684824902723736  │
│ trilobots                    │ 138       │ 80     │ 7       │ 51       │ 0.21014492753623187  │
│ aspp2022_0                   │ 251       │ 146    │ 8       │ 97       │ 0.1952191235059761   │
│ tube2024_3                   │ 139       │ 80     │ 3       │ 56       │ 0.17266187050359713  │
│ aspp2023_4                   │ 258       │ 131    │ 14      │ 113      │ 0.06976744186046512  │
│ too_bot_to_handle            │ 140       │ 72     │ 2       │ 66       │ 0.04285714285714286  │
│ aspp2021_0                   │ 242       │ 97     │ 10      │ 135      │ -0.15702479338842976 │
│ drbabydangers                │ 138       │ 48     │ 18      │ 72       │ -0.17391304347826086 │
│ group4_2022_this_time_moving │ 138       │ 53     │ 6       │ 79       │ -0.18840579710144928 │
│ dogues_de_bordeaux           │ 138       │ 43     │ 4       │ 91       │ -0.34782608695652173 │
│ aspp2021_2                   │ 256       │ 68     │ 29      │ 159      │ -0.35546875          │
│ tube2024_2                   │ 139       │ 41     │ 6       │ 92       │ -0.3669064748201439  │
│ aspp2019_1                   │ 243       │ 35     │ 67      │ 141      │ -0.43621399176954734 │
│ aspp2022_1                   │ 266       │ 53     │ 32      │ 181      │ -0.48120300751879697 │
│ aspp2022_4                   │ 244       │ 29     │ 43      │ 172      │ -0.5860655737704918  │
│ aspp2019_4                   │ 245       │ 48     │ 1       │ 196      │ -0.6040816326530613  │
│ aspp2022_3                   │ 251       │ 44     │ 6       │ 201      │ -0.6254980079681275  │
│ aspp2019_2                   │ 256       │ 35     │ 2       │ 219      │ -0.71875             │
│ aspp2019_0                   │ 139       │ 2      │ 8       │ 129      │ -0.9136690647482014  │
└──────────────────────────────┴───────────┴────────┴─────────┴──────────┴──────────────────────┘

aspp2019_0 is definitely a little underwhelming? (They remove nodes with enemy bots from the graph and simply stop when this means that the graph is disconnected. I’m close to helping them out a little to perform better. :) )

@otizonaizit
Copy link
Member

otizonaizit commented Jul 1, 2024 via email

@Debilski
Copy link
Member Author

Debilski commented Jul 1, 2024

But the logic already does that. It just takes a while.

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ Name                         ┃ # Matches ┃ # Wins ┃ # Draws ┃ # Losses ┃ Score  ┃ ELO  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│ aspp2021_4                   │ 669       │ 569    │ 25      │ 75       │  0.738 │ 1965 │
│ tube2024_0                   │ 669       │ 563    │ 35      │ 71       │  0.735 │ 1993 │
│ bayes_avengers               │ 669       │ 561    │ 37      │ 71       │  0.732 │ 1920 │
│ aspp2023_2                   │ 669       │ 524    │ 22      │ 123      │  0.599 │ 1839 │
│ tube2024_1                   │ 668       │ 470    │ 27      │ 171      │  0.448 │ 1858 │
│ aspp2021_3                   │ 670       │ 479    │ 11      │ 180      │  0.446 │ 1722 │
│ shake_dat_botty              │ 670       │ 464    │ 12      │ 194      │  0.403 │ 1680 │
│ aspp2022_2                   │ 670       │ 457    │ 11      │ 202      │  0.381 │ 1767 │
│ aspp2021_1                   │ 670       │ 422    │ 10      │ 238      │  0.275 │ 1645 │
│ aspp2019_3                   │ 669       │ 418    │ 9       │ 242      │  0.263 │ 1616 │
│ trilobots                    │ 668       │ 401    │ 22      │ 245      │  0.234 │ 1697 │
│ aspp2022_0                   │ 670       │ 398    │ 21      │ 251      │  0.219 │ 1528 │
│ tube2024_3                   │ 671       │ 400    │ 10      │ 261      │  0.207 │ 1633 │
│ too_bot_to_handle            │ 668       │ 359    │ 6       │ 303      │  0.084 │ 1566 │
│ aspp2023_4                   │ 669       │ 321    │ 25      │ 323      │ -0.003 │ 1415 │
│ group4_2022_this_time_moving │ 669       │ 304    │ 22      │ 343      │ -0.058 │ 1492 │
│ aspp2021_0                   │ 669       │ 262    │ 19      │ 388      │ -0.188 │ 1362 │
│ drbabydangers                │ 671       │ 222    │ 84      │ 365      │ -0.213 │ 1381 │
│ dogues_de_bordeaux           │ 669       │ 254    │ 12      │ 403      │ -0.223 │ 1454 │
│ aspp2021_2                   │ 669       │ 198    │ 47      │ 424      │ -0.338 │ 1327 │
│ tube2024_2                   │ 668       │ 187    │ 35      │ 446      │ -0.388 │ 1272 │
│ aspp2019_1                   │ 669       │ 103    │ 179     │ 387      │ -0.425 │ 1262 │
│ aspp2022_1                   │ 669       │ 123    │ 97      │ 449      │ -0.487 │ 1228 │
│ aspp2022_4                   │ 669       │ 81     │ 104     │ 484      │ -0.602 │ 1182 │
│ aspp2022_3                   │ 669       │ 121    │ 14      │ 534      │ -0.617 │ 1147 │
│ aspp2019_4                   │ 669       │ 120    │ 8       │ 541      │ -0.629 │ 1146 │
│ aspp2019_2                   │ 670       │ 109    │ 7       │ 554      │ -0.664 │ 1128 │
│ aspp2019_0                   │ 671       │ 10     │ 29      │ 632      │ -0.927 │  778 │
└──────────────────────────────┴───────────┴────────┴─────────┴──────────┴────────┴──────┘

@Debilski
Copy link
Member Author

Debilski commented Jul 1, 2024

(Parallelisation is by the way a relative non-issue. Thanks to having a proper database we can just run a bunch of ci_engines at the same time.)

@Debilski
Copy link
Member Author

Debilski commented Aug 6, 2024

For reference, it is now possible to extract all games with errors from the database:

select * from games where json_extract(final_state, '$.num_errors') != '[0,0]' ;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants