# Clash of Clans: How many builders do you *really* need?

### (or, how should I spend those green gems?) <img src="builder.png" style="float:center;width:100px" />

Hello everyone! I am a mid-level town hall 8 avid clasher with 4 builders. Recently I discovered (like so many other [people](https://www.reddit.com/r/ClashOfClans/comments/2psnf3/strategy_lab_time_longer_than_builder_time_what/))that at my level research, not build time, is the limiting factor for progress. This made me wonder, is it really worth it to save up for the fifth builder? Or should I just spend gems on barracks/collector boosts, finishing research/hero upgrades in a timely fashion, etc. To solve this conundrum I decided to do a bit of simple data analysis using the upgrade time data available on the [Clash of Clans wiki](http://clashofclans.wikia.com/wiki/Clash_of_Clans_Wiki).

This next section contains a bit of Python used to prepare the dataset for visualization and analysis. If you aren't interested, just skip down to the [results section](#Results)

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd

In [2]:
building_df = pd.read_csv("building_upgrade_data.csv")
building_df = building_df[building_df["town_hall"] != 11]
research_df = pd.read_csv("research_data.csv")
research_df = research_df[research_df["town_hall"] != 11]

In [3]:
# CONSTANTS
HOURS_PER_DAY = 24.0
MIN_PER_DAY = HOURS_PER_DAY * 60
SEC_PER_DAY = MIN_PER_DAY * 60
UNIT_MAP = {"seconds": SEC_PER_DAY, "minutes": MIN_PER_DAY,
            "hours": HOURS_PER_DAY, "days": 1.0}

In [4]:
# These functions parse the possible time strings
from functools import reduce

def parse_time(t):
    return int(t[0]) / UNIT_MAP[t[1]]

def chunks(l, n):
    for i in range(0, len(l), n):
        yield l[i:i + n]

def parse_time_string(s):
    return reduce(lambda x, y: x + y, map(parse_time, chunks(s.split(' '), 2)))

In [5]:
building_df["build_days"] = building_df["build_time"].map(parse_time_string)
research_df["research_days"] = research_df["research_time"].map(parse_time_string)

In [6]:
def get_build_time(df):
    """This calculates total build time per town hall level"""
    build_time = {}
    grouped = df.groupby(["type"])
    for name, group in grouped:
        regrouped = group.groupby("town_hall")
        prev_quant = group.iloc[0]["quantity"]
        for rname, rgroup in regrouped:
            quant = rgroup["quantity"].iloc[0]
            build_days = quant * rgroup["build_days"].sum()
            build_time.setdefault(rname, 0)
            build_time[rname] += build_days
            # This adds time to each town hall level based on new structure acquisition
            if quant > prev_quant:
                diff = quant - prev_quant
                catch_up_days = diff * group[group["town_hall"] < rname]["build_days"].sum()
                build_time[rname] += catch_up_days
                prev_quant = quant
    return pd.Series(build_time)

In [7]:
build_times = get_build_time(building_df)

In [13]:
# Get research times by town hall, don't forget to add lab upgrade time
lab_build_days = building_df.groupby("type").get_group("laboratory")[["town_hall","build_days"]]
research_times = research_df.groupby("town_hall")["research_days"].sum()
lab_build_days["total_time"] = lab_build_days["build_days"] + research_times.values
research_times = lab_build_days.set_index("town_hall")["total_time"]
times = pd.concat([research_times, build_times], axis=1)
times.columns = ["research_time", "build_time"]

In [16]:
times["percent_research_time"] = times["research_time"].map(
    lambda x: x / times["research_time"].sum())
times["percent_build_time"] = times["build_time"].map(
    lambda x: x / times["build_time"].sum())
times = times.fillna(0)
times

Unnamed: 0,research_time,build_time,percent_research_time,percent_build_time
1,0.0,0.12037,0.0,4.6e-05
2,0.0,1.012847,0.0,0.000389
3,1.270833,6.432407,0.001775,0.002472
4,3.208333,23.502431,0.004481,0.009031
5,15.5,80.557986,0.021649,0.030956
6,11.0,58.583333,0.015364,0.022512
7,57.0,224.815162,0.079611,0.08639
8,157.0,453.761111,0.21928,0.174367
9,213.0,850.764583,0.297495,0.326924
10,258.0,902.783681,0.360346,0.346913


# Results