# Clash Royale: What Deck is Best?

[Clash Royale](https://clashroyale.com/) is a free-to-play real-time strategy mobile game released by [Supercell](https://supercell.com/). After the success of their game [Clash of Clans](https://supercell.com/en/games/clashofclans/), they released Clash Royale in early 2016 using many of the same characters, items, and mechanics. In a match of Clash Royale, each player selects a deck of 8 'cards'. Each card corresponds to a character with unique stats and abilities. These cards cost 'elixir' to use, a currency that slowly replenishes throughout the match. The goal of the game is to use your cards to simultaneously destroy the opponent's towers and defend your own. Each player has a 'King Tower' which is flanked by two 'Princess Towers'. The player who destroys their opponent's 'King Tower' first wins. If time runs out, the player who has destroyed more of their opponent's Princess Towers wins.

<div style='width:100%;text-align:center;'>
    <img src='images/Game1.png' width=300px style='padding-right:20px;'>
    <img src='images/Game2.png' width=300pxstyle='padding-left:20px;'>
</div>

There are many different types of matches that occur in Clash Royale, but I will be focusing on Ladder matches. The Ladder, also known as Trophy Road, is the central focus of the game, making up most of the player's progression. Ladder matches are 1v1 matches in which the winner wins 'trophies' and the loser loses trophies. Players start with 0 trophies, and they gain more by winning matches. The Ladder is divided into 20 arenas, which players unlock by reaching a certain number of trophies (e.g. Arena 1 requires 0 trophies, Arena 2 requires 300, ... Arena 20 requires 7500). When a player unlocks a new arena, they also unlock 4 to 8 new cards that they can add to their deck. At the start of Arena 1, the player only has 8 cards available, so every deck is the same; however, as more cards are unlocked, decks become very customizable.

Currently, I am in Arena 15, but I have been using the same deck since Arena 7. It used to perform quite well, but now it's outdated, so my progress has begun to stagnate. There are dozens more cards available to me now, so I wanted to create a new deck, but decision anxiety is preventing me from committing to anything. How would I know if a deck is good enough? Should I use a popular deck or create my own? What if switching out a couple cards would make me the best player in the world? The best way to answer these questions is through data.

## Data Collection and Parsing

There aren't a lot of datasets out there for Clash Royale matches, and the ones that exist are very outdated. This is an issue because Supercell recently [released](https://clashroyale.com/blog/release-notes/new-update-october-2022.html) new cards and mechanics, which would have meaningful effects on the data and conclusions. Thus, I chose to create my own by scraping Clash Royale's [developer API](https://developer.clashroyale.com/#/). This will allow me to get data on millions of matches from the past few days alone.

### Clan Approach

In Clash Royale, players can join a 'Clan' - a group of players that trade cards, chat, and fight wars against other Clans. My initial approach was to loop through hundreds of clans, get every player in each clan, and then scan their recent matches for Ladder matches fought since the most recent [balancing update](https://clashroyale.com/blog/release-notes/balance-changes.html) (December 7). (*Note: there was an update on December 11 that included a small balance change to one card, but since this was a very small change and I wanted to maximize data, I chose to include matches both before and after this update.*)

I first import `requests` to get the data, `datetime` to ensure matches were played after the 12/7 update, and `pandas` to use DataFrames.

In [1]:
import requests
from datetime import datetime
import pandas as pd

I create constants for the API URL, my API auth token and resulting auth header, and the cutoff time.

In [2]:
API_URL = 'https://api.clashroyale.com/v1'
# GET RID OF THIS BEFORE SUBMISSION
API_TOKEN = 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiIsImtpZCI6IjI4YTMxOGY3LTAwMDAtYTFlYi03ZmExLTJjNzQzM2M2Y2NhNSJ9.eyJpc3MiOiJzdXBlcmNlbGwiLCJhdWQiOiJzdXBlcmNlbGw6Z2FtZWFwaSIsImp0aSI6IjQ5NGIxNDJjLWNmZGYtNGE2NS04NzUwLWIxZjg1YTYyZTRiZiIsImlhdCI6MTY3MDczMDA1OSwic3ViIjoiZGV2ZWxvcGVyL2I2MGMwYTQ2LTNkYjItMWM3ZC03YTFhLWJlNWVmMmMwMjQ0YSIsInNjb3BlcyI6WyJyb3lhbGUiXSwibGltaXRzIjpbeyJ0aWVyIjoiZGV2ZWxvcGVyL3NpbHZlciIsInR5cGUiOiJ0aHJvdHRsaW5nIn0seyJjaWRycyI6WyIxMjkuMi4xODEuMTMyIl0sInR5cGUiOiJjbGllbnQifV19.Kj8av7MERENbYuj_OgWRPAhg1W49AX87asOZoVwnXC-1IB0LDKyjxTMvVfRwN9jJY_QL0OtHCfSPklC45wUclQ'
AUTH_HEADER = {'Authorization': f'Bearer {API_TOKEN}'}
CUTOFF_TIME = datetime(2022, 12, 8) # Midnight December 8

I define a helper function to convert time from the format used by the API into a `datetime` object.

In [3]:
def to_datetime(date_str):
    """ 
    Parameters:
        date_str (string): A date in the format given by the CR API
    Yields:
        date (datetime): The same date as a datetime object
    """
    
    return datetime.strptime(date_str, '%Y%m%dT%H%M%S.000Z')

To avoid storing information for hundreds or thousands of clans in memory, I create a generator for clan tags. The `/clans` API endpoint returns information about clans, including the clan tag, which is a string that uniquely identifies a Clan. We will need this later. I choose to generate clans with at least 40 members (max for a clan is 50) because the API requires at least one filter, and because this will help maximize the number of players I can get per call to the API.

In [11]:
def get_clans(num_clans):
    """ 
    Parameters:
        num_clans (int): The maximum number of clans to return tags for
    Yields:
        tag (string): Clan tag for a single clan
    """
    
    min_members = 40 # Clans with at least 40 members
    url = API_URL + f'/clans?minMembers={min_members}&limit={num_clans}'

    response = requests.get(url, headers=AUTH_HEADER).json()
    return [clan['tag'] for clan in response['items']] # Return clan tags

Players also have unique tags that we can use to identify them. Once we have clan information, we can use the Clan's tag to generate a list of the tags of each member using the `/clans/{tag}/members` API endpoint. (*Note: `%23` and `[1:]` in any of the following code is because player and clan tags are of the format `#XXXXXXXX` (where X are alphanumeric), and that `#` needs to be URL encoded. Easier in my opinion to remove the `#` and add `%23` than to URL encode the entire string.*)

In [7]:
def player_tags_from_clan(clan_tag):
    """
    Parameters:
        clan_tag (string): The tag of the clan to search
    Returns:
        tags (list): The tags of valid players in the clan
    """
    
    url = API_URL + f'/clans/%23{clan_tag[1:]}/members' # API call to get clan members
    response = requests.get(url, headers=AUTH_HEADER).json()

    tags = [] # List to store tags in 
    for player in response['items']: # Loop through members
        # If member has not been online since the update, no point in getting their matches
        if to_datetime(player['lastSeen']) >= CUTOFF_TIME:
            tags.append(player['tag'])

    return tags

Now that we have players, we can easily get information about their recent battles. Supercell stores up to 35 battles for each player, which we can access using the `/players/{tag}/battlelog` API endpoint.

In [None]:
def battles_from_player(player_tag):
    """
    Parameters:
        player_tag (string): The tag of the player to search
    Returns:
        battles (list): The player's recent battles
    """
    
    url = API_URL + f'/players/%23{player_tag[1:]}/battlelog'

    return requests.get(url, headers=AUTH_HEADER).json()

For each battle, we need to specify what data to put in our dataframe and how to parse it. I first filter out any battles that aren't Ladder matches or that occurred before the recent update. I choose to keep the time information, the trophy count, deck, and score for both players, whether or not the player won, and an ID to uniquely identify each battle by its time and player tag. (The assumption is that no two matches will be played by the same player at the same time, leading to a unique value). We then add this to a 1-row dataframe and append it to an existing dataframe for every match.

In [None]:
def add_battle_to_df(df, battle):
    """
    Parameters:
        df (DataFrame): DataFrame to accumulate match info
        battle (dictionary): Match information returned from the CR API
    Returns:
        df (DataFrame): df with an added row containing the new battle's information
    """

    if battle['gameMode']['name'] != 'Ladder': # Verify ladder match
        return df
    
    time = to_datetime(battle['battleTime']) 
    if time < CUTOFF_TIME: # Check if match is after most recent update
        return df
    
    row = pd.DataFrame() # Create row to append
    
    row['time'] = [time] # Add time column
    
    # Get player and opponent data
    player = battle['team'][0]
    opponent = battle['opponent'][0]
    row['player_trophies'] = player['startingTrophies']
    row['opponent_trophies'] = opponent['startingTrophies']

    # Get name of cards for decks on each side
    player_deck = [c['name'] for c in player['cards']]
    opponent_deck = [c['name'] for c in opponent['cards']]
    # Convert to tuple and sort so elements are hashable, compatible with groupby
    row['player_deck'] = [tuple(sorted(player_deck))]
    row['opponent_deck'] = [tuple(sorted(opponent_deck))]

    # Get final score for both sides and whether player won
    row['player_score'] = player['crowns']
    row['opponent_score'] = opponent['crowns']
    row['win'] = 1 if player['crowns'] > opponent['crowns']\
                    else 0 if player['crowns'] < opponent['crowns']\
                    else None

    # Generate a unique ID for this battle
    row['battle_id'] = hash(battle['battleTime'] + player['tag'])
    
    return pd.concat([df, row], ignore_index=True) # Add row to DataFrame

Now I can tie all these together into a single nested loop. I keep a DataFrame `battles` with every row that's added. Note that for every battle, I add the battle both from the player's perspective and from the opponent's perspective. This is to make analysis much easier later on.

In [None]:
battles = pd.DataFrame()

# Loop through clans, players, matches
for clan_tag in get_clans(1000):
    for player_tag in player_tags_from_clan(clan_tag):
        for battle in battles_from_player(player_tag):
            battles = add_battle_to_df(battles, battle) # Add battle

            # Swap positions
            player, opponent = battle['team'], battle['opponent']
            battle['team'], battle['opponent'] = opponent, player

            battles = add_battle_to_df(battles, battle) # Add swapped battle

This gives a good amount of data - around 300,000 matches. This is good, but for better results, I want somewhere on the order of a few million matches. Unfortunately, the API endpoint for getting clans caps the number of clans returned at 896 for some reason (I know that this isn't all of the clans because my clan has 50 members and it is never listed in the results). Another drawback of this approach is that matches are unevenly distributed across trophy count - there were many more matches between players around 5000 trophies than there were between players around 500. (The reason for this is that prior to the recent major update, the end of the ladder was at 5000 trophies.)

### Spidering Approach

My second approach was to spider (or crawl) through matches. I first get a good number of matches using the approach above, but instead of saving the original data, I get the player tag and trophy count. I then use these values to get a tag for one player in every 100-trophy interval. Once I have these tags, I run a modified BFS (Depth-Limited BFS? Is that a thing?) on their battles. I first add these initial tags to a queue, and for each one, I get the tags for every player they've fought a battle against. I add each of these players to a separate queue and do the same thing for that queue, making sure to avoid duplicate players. This means the number of players I can query isn't limited by the API, and because players fight matches against other players of similar trophy count, I will likely get a much more even distribution of data.

I get the intervals I want using `range` and use them to form a list of `pd.Interval` objects that need to be matched with a player tag. This list will be depleted as intervals are paired with player tags, and we will stop searching once all intervals have an associated player (i.e. `len(missing_intervals) == 0`). Then I request the list of clans as before, but this time, instead of a high minimum member count, I choose a low minimum score. A clan's score is based on the trophy count of its members, so this will prevent us from ruling out players with fewer than 1000 trophies, who are less likely to be in a clan. I loop through these, and for the same reasons as mentioned prior, I make sure that the clan has a minimum trophy requirement of 0. I then go through each member and check if they are part of an interval for which a player is needed, as well as if they have any recent battles to spider from. I then add this player's tag to the `players` dictionary.

In [None]:
intervals = range(0,7501,100) # Get trophy intervals
players = {} # Dict to map trophy interval to player tag
# List of intervals that do not yet have a player associated
missing_intervals = [pd.Interval(i, j, 'left') for i, j in zip(intervals, intervals[1:])]

# Fetch clan data
url = API_URL + f'/clans?minScore={1}&limit={1000}'
response = requests.get(url, headers=AUTH_HEADER).json()

for clan in response['items']: # Loop through clans
    if clan['requiredTrophies'] == 0: # Make sure there is no trophy requirement
        for player in player_tags_from_clan(clan['tag']): # Loop through clan members
            for interval in missing_intervals: # Check intervals
                if player['trophies'] in interval\
                        and len(battles_from_player(player['tag'])) > 0:
                    players[interval] = player['tag'] # Add to players
                    missing_intervals.remove(interval) # Remove from missing intervals
                    
    if len(missing_intervals) == 0: # If we have found enough players, stop searching
        break

Now that I have starting tags for each trophy level, I can start spidering from each of them. I put these tags in `current_queue`, and initialize `next_queue` to `[]`. I am using a weird version of BFS, because with vanilla BFS if I stop iterating at a certian number of battles, it is possible I am halfway through the queue, in which case I would have many times more battles for trophy levels in the lower half that was searched 1 layer deeper. This method ensures that I only stop searching when an entire layer has been searched.

As before, I initialize a DataFrame to store the battle data we get. I set the depth to 5, which at a rate of roughly 15 valid battles per player, should result in a few million battles. For each layer, I loop through `current_queue` until it is empty. I make sure the tag is new, and then I get the battles from that tag. I add the opponent for each battle to `next_queue` and add its data (if the battle is valid) to the DataFrame. After each layer, I remove all duplicates and check to see if we have at least 1 million battles. If not, I set `current_queue` to `next_queue` and clear `next_queue` for the next iteration.

In [None]:
current_queue = list(players.values()) # Tags we are currently iterating
next_queue = [] # List to store current_queue's descendants
seen = set() # Stores tags we have already visited
depth = 5 # How deep to traverse in the 'tree'
battles = pd.DataFrame() # Accumulates battle rows like before

for _ in range(depth): # Loop through each layer
    while len(current_queue) > 0:
        tag = current_queue.pop()
        if tag in seen: # Make sure tag has not been visited
            continue
        seen.add(tag)

        player_battles = battles_from_player(tag) # Get battles
        for battle in player_battles:
            next_queue.append(battle['opponent'][0]['tag']) # Add to queue

            battles = add_battle_to_df(battles, battle) # Add battle to the DataFrame
            player, opponent = battle['team'], battle['opponent'] # Swap
            battle['team'], battle['opponent'] = opponent, player
            battles = add_battle_to_df(battles, battle) # Add swapped battle

    battles = battles.drop_duplicates(subset=['battle_id']) # Remove duplicate battles

    if len(battles) > 1000000: # Check limit
        break
    
    current_queue = next_queue # Shift queues
    next_queue = []