Towards the end of August 2024, Lichess forbade people from choosing their color in lobby seeks.
Time to investigate how much of an effect it had using the Lichess database dumps.

In [1]:
import pandas as pd


def read_csv(file: str) -> pd.DataFrame:
    df = pd.read_csv(
        file,
        engine="pyarrow",
        dtype_backend="pyarrow",
    )
    df["white"] = df["white"].str.lower()
    df["black"] = df["black"].str.lower()
    return df


def split_black_white(df: pd.DataFrame):
    return df.groupby(["white"])["id"].count(), df.groupby(["black"])["id"].count()


def combine_black_white(white_games: pd.Series, black_games: pd.Series) -> pd.DataFrame:
    games = pd.merge(
        white_games, black_games, left_index=True, right_index=True, how="outer"
    )
    games.columns = ["white", "black"]
    games.fillna(0, inplace=True)
    games["total"] = games["white"] + games["black"]
    return games


df = read_csv("F:\\lichess_dbs\\lichess_db_standard_rated_2024-08.pgn.zst.csv")

In [2]:
games = combine_black_white(*split_black_white(df))


A picky player is an active player* that has played at least 80% of their games as one color only
###### *An active player is a player that has played at least 10 games in a given month. Any future references to players or playerbase will only look at active players

In [3]:
picky = games.loc[(games['total']>=10) & (~(games['black']/games['total']).between(0.2,0.8))]
print('picky players:', len(picky), 'total players:', len(games.loc[games['total']>=10]))
print('picky players games:', picky['total'].sum(), 'total players games:', games.loc[games['total']>=10]['total'].sum())

picky players: 5232 total players: 1154317
picky players games: 588553 total players games: 182163175


There were 5232 picky players in August. That is 0.45% of active players in August

In [5]:
5232 / 1154317 * 100

0.453255041725973

They were part of/responsible for 0.32% of games played by active players in August

In [6]:
588553  / 182163175 * 100

0.3230910967598144

Going into this, I was under the assumption that of the picky players, those favouring white far outnumbered those who preferred black. Turns out, I was right and that number is about 73%.

In [7]:
prefers_white = picky.loc[(picky['black']/picky['total'])<=0.2]
prefers_black = picky.loc[(picky['black']/picky['total'])>=0.8]
print(len(prefers_white), len(prefers_white) / (len(prefers_white)+len(prefers_black)) * 100)

3846 73.5091743119266


Do  it all again for September

In [8]:
del df
df = read_csv(
    "F:\\lichess_dbs\\lichess_db_standard_rated_2024-09.pgn.zst.csv"
)  # Reuse variables because RIP my RAM

In [9]:
games = combine_black_white(*split_black_white(df))

Players still managed to remain picky in September by finding the yin to their yang to serially direct challenge ☯️

In [10]:
picky_sept = games.loc[(games['total']>=10) & (~(games['black']/games['total']).between(0.2,0.8))]
print('picky players:', len(picky_sept), 'total players:', len(games.loc[games['total']>=10]))
print('picky players games:', picky_sept['total'].sum(), 'total players games:', games.loc[games['total']>=10]['total'].sum())

picky players: 420 total players: 1165865
picky players games: 21032 total players games: 173066143


There were still 420 picky players in September. That is 0.036% of the active players in September. (A 92% drop from August)🔻

In [12]:
print(420 / 1165865 * 100)
print((0.45-0.036)/0.45 * 100)

0.036024754152496216
92.0


They were responsible for 0.01% of the games played by active players in September. (A 96% drop from August)🔻

In [15]:
print(21032 / 173066143 * 100)
print((0.32-0.012)/0.32 * 100)

0.012152579144263936
96.25


Of the 420 picky players in September, 294 (=79%) were not seen before in August. i.e. they either developed the sudden urge to play with one color in Sept. Or they returned from a hiatus from August. Or their accounts are brand new trying to avoid repaying their color debt. 

In [16]:
print('new picky players in Sept:', len(picky_sept.loc[~picky_sept.index.isin(picky.index)]))
print(282/357*100)

new picky players in Sept: 294
78.99159663865547


A reformer is a former picky player that has since adopted the ways of forced 50/50 in September.

In [23]:
reformers = games.loc[(games['total']>=10) & (games.index.isin(picky.index)) & (~games.index.isin(picky_sept.index))]
print(len(reformers), reformers['total'].sum())

3429 428864


Of the 5232 picky players in August, 3429(=65.5%) have reformed (!).

In [18]:
print(3429 / 5232 * 100)

65.53899082568807


What about the remaining 1803 picky players?
1. 773 never played a game at all in September. Maybe they were unwilling to adapt.
2. 904 never become active in September. Maybe they tried some games, saw they could not force their choice and gave up.
3. 126 remained picky in September as well. As detailed earlier ☯️


In [39]:
print(len(picky.loc[~picky.index.isin(games.index)])) # August picky players that have not played in September
print(len(games.loc[(games.index.isin(picky.index) & (games['total'] < 10))])) # August picky players that did not become active in September
print(len(games.loc[(games.index.isin(picky.index) & (games.index.isin(picky_sept.index)))])) # Remained picky

773
904
126


Endnote: Because we are limited to data from the database exports, we can only look at rated standard games. Casual games and anonymous are out of our purview.
Likewise, we cannot look at games played against the lichess ai (as I saw some accounts did use as a sort of cope)