# How the "How Datawrapper developers perceive the cost of function names" chart was made

I used Google Forms for the survey.

The questions looked like this:
    
![](google_form.png)

Which produced this CSV of the answers:

``` csv
"Timestamp","check()","compute()","fetch()","find()","filter()","get()","lookup()","parse()","pick()","search()","select()","read()"
"2021/12/17 4:07:23 PM GMT+1","5","10","1","8","3","2","7","5","2","7","2","5"
"2021/12/17 4:21:02 PM GMT+1","8","1","6","8","5","10","7","6","9","4","10","8"
"2021/12/17 4:28:28 PM GMT+1","10","8","3","6","8","8","3","5","9","2","1","3"
"2021/12/17 4:29:49 PM GMT+1","10","9","5","9","10","10","10","6","8","5","4","3"
"2021/12/17 4:38:33 PM GMT+1","9","4","1","5","8","10","4","3","7","2","6","4"
"2021/12/17 6:58:27 PM GMT+1","1","6","2","10","7","2","3","3","5","9","5","4"
"2021/12/20 11:59:00 AM GMT+1","1","4","10","7","8","2","6","5","2","9","3","7"
"2021/12/21 1:09:32 PM GMT+1","9","4","5","3","6","8","6","7","8","7","8","7"
"2022/01/03 7:27:27 PM GMT+1","9","1","2","7","6","10","4","3","8","4","7","5"
```

To transform the CSV into a CSV that can be used as the source for the chart, I used Python's [pandas](https://pandas.pydata.org/) data analysis library. The code examples uses the [ipython](https://ipython.org/) interactive interpreter syntax.

First, let's import pandas and read the CSV into a DataFrame:

In [2]:
import pandas as pd
src = pd.read_csv('form.csv', index_col='Timestamp')
src

Unnamed: 0_level_0,check(),compute(),fetch(),find(),filter(),get(),lookup(),parse(),pick(),search(),select(),read()
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2021/12/17 4:07:23 PM GMT+1,5,10,1,8,3,2,7,5,2,7,2,5
2021/12/17 4:21:02 PM GMT+1,8,1,6,8,5,10,7,6,9,4,10,8
2021/12/17 4:28:28 PM GMT+1,10,8,3,6,8,8,3,5,9,2,1,3
2021/12/17 4:29:49 PM GMT+1,10,9,5,9,10,10,10,6,8,5,4,3
2021/12/17 4:38:33 PM GMT+1,9,4,1,5,8,10,4,3,7,2,6,4
2021/12/17 6:58:27 PM GMT+1,1,6,2,10,7,2,3,3,5,9,5,4
2021/12/20 11:59:00 AM GMT+1,1,4,10,7,8,2,6,5,2,9,3,7
2021/12/21 1:09:32 PM GMT+1,9,4,5,3,6,8,6,7,8,7,8,7
2022/01/03 7:27:27 PM GMT+1,9,1,2,7,6,10,4,3,8,4,7,5


Second, we calculate the arithmetic mean values for all columns. We will need those later:

In [4]:
mean = src.mean()
mean

check()      6.888889
compute()    5.222222
fetch()      3.888889
find()       7.000000
filter()     6.777778
get()        6.888889
lookup()     5.555556
parse()      4.777778
pick()       6.444444
search()     5.444444
select()     5.111111
read()       5.111111
dtype: float64

Now the heavy lifting: We have to figure out how many times each value (1-10) was assigned to a function. This is what Series.value_counts() does. For one column, value_counts() produces such a result:

In [5]:
src['check()'].value_counts()

9     3
10    2
1     2
5     1
8     1
Name: check(), dtype: int64

Which means the value `9` appears 3 times, the value `10` appears twice etc.

Now we just have to call `value_coutns()` for each column using [DataFrame.apply()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html):

In [6]:
res = src.apply(lambda x: x.value_counts())
res

Unnamed: 0,check(),compute(),fetch(),find(),filter(),get(),lookup(),parse(),pick(),search(),select(),read()
1,2.0,2.0,2.0,,,,,,,,1.0,
2,,,2.0,,,3.0,,,2.0,2.0,1.0,
3,,,1.0,1.0,1.0,,2.0,3.0,,,1.0,2.0
4,,3.0,,,,,2.0,,,2.0,1.0,2.0
5,1.0,,2.0,1.0,1.0,,,3.0,1.0,1.0,1.0,2.0
6,,1.0,1.0,1.0,2.0,,2.0,2.0,,,1.0,
7,,,,2.0,1.0,,2.0,1.0,1.0,2.0,1.0,2.0
8,1.0,1.0,,2.0,3.0,2.0,,,3.0,,1.0,1.0
9,3.0,1.0,,1.0,,,,,2.0,2.0,,
10,2.0,1.0,1.0,1.0,1.0,4.0,1.0,,,,1.0,


Not bad. The next step is to fill those NaN values with zeros:

In [7]:
res = res.fillna(0)
res

Unnamed: 0,check(),compute(),fetch(),find(),filter(),get(),lookup(),parse(),pick(),search(),select(),read()
1,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,0.0,0.0,2.0,0.0,0.0,3.0,0.0,0.0,2.0,2.0,1.0,0.0
3,0.0,0.0,1.0,1.0,1.0,0.0,2.0,3.0,0.0,0.0,1.0,2.0
4,0.0,3.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,2.0,1.0,2.0
5,1.0,0.0,2.0,1.0,1.0,0.0,0.0,3.0,1.0,1.0,1.0,2.0
6,0.0,1.0,1.0,1.0,2.0,0.0,2.0,2.0,0.0,0.0,1.0,0.0
7,0.0,0.0,0.0,2.0,1.0,0.0,2.0,1.0,1.0,2.0,1.0,2.0
8,1.0,1.0,0.0,2.0,3.0,2.0,0.0,0.0,3.0,0.0,1.0,1.0
9,3.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,2.0,0.0,0.0
10,2.0,1.0,1.0,1.0,1.0,4.0,1.0,0.0,0.0,0.0,1.0,0.0


and swap rows with columns:

In [8]:
res = res.T
res

Unnamed: 0,1,2,3,4,5,6,7,8,9,10
check(),2.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,3.0,2.0
compute(),2.0,0.0,0.0,3.0,0.0,1.0,0.0,1.0,1.0,1.0
fetch(),2.0,2.0,1.0,0.0,2.0,1.0,0.0,0.0,0.0,1.0
find(),0.0,0.0,1.0,0.0,1.0,1.0,2.0,2.0,1.0,1.0
filter(),0.0,0.0,1.0,0.0,1.0,2.0,1.0,3.0,0.0,1.0
get(),0.0,3.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,4.0
lookup(),0.0,0.0,2.0,2.0,0.0,2.0,2.0,0.0,0.0,1.0
parse(),0.0,0.0,3.0,0.0,3.0,2.0,1.0,0.0,0.0,0.0
pick(),0.0,2.0,0.0,0.0,1.0,0.0,1.0,3.0,2.0,0.0
search(),0.0,2.0,0.0,2.0,1.0,0.0,2.0,0.0,2.0,0.0


Almost done. We just have to add the arithmetic mean column and sort the DataFrame by it:

In [10]:
res['mean'] = mean
res.sort_values('mean', ascending=False, inplace=True)
res

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,mean
find(),0.0,0.0,1.0,0.0,1.0,1.0,2.0,2.0,1.0,1.0,7.0
check(),2.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,3.0,2.0,6.888889
get(),0.0,3.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,4.0,6.888889
filter(),0.0,0.0,1.0,0.0,1.0,2.0,1.0,3.0,0.0,1.0,6.777778
pick(),0.0,2.0,0.0,0.0,1.0,0.0,1.0,3.0,2.0,0.0,6.444444
lookup(),0.0,0.0,2.0,2.0,0.0,2.0,2.0,0.0,0.0,1.0,5.555556
search(),0.0,2.0,0.0,2.0,1.0,0.0,2.0,0.0,2.0,0.0,5.444444
compute(),2.0,0.0,0.0,3.0,0.0,1.0,0.0,1.0,1.0,1.0,5.222222
select(),1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,5.111111
read(),0.0,0.0,2.0,2.0,2.0,0.0,2.0,1.0,0.0,0.0,5.111111


Then we save it to a file:

In [12]:
res.to_csv('result.csv', index_label='function')

This is the resulting CSV, which we can upload to the Datawrapper app:

``` csv
function,1,2,3,4,5,6,7,8,9,10,mean
find(),0.0,0.0,1.0,0.0,1.0,1.0,2.0,2.0,1.0,1.0,7.0
check(),2.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,3.0,2.0,6.888888888888889
get(),0.0,3.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,4.0,6.888888888888889
filter(),0.0,0.0,1.0,0.0,1.0,2.0,1.0,3.0,0.0,1.0,6.777777777777778
pick(),0.0,2.0,0.0,0.0,1.0,0.0,1.0,3.0,2.0,0.0,6.444444444444445
lookup(),0.0,0.0,2.0,2.0,0.0,2.0,2.0,0.0,0.0,1.0,5.555555555555555
search(),0.0,2.0,0.0,2.0,1.0,0.0,2.0,0.0,2.0,0.0,5.444444444444445
compute(),2.0,0.0,0.0,3.0,0.0,1.0,0.0,1.0,1.0,1.0,5.222222222222222
select(),1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,5.111111111111111
read(),0.0,0.0,2.0,2.0,2.0,0.0,2.0,1.0,0.0,0.0,5.111111111111111
parse(),0.0,0.0,3.0,0.0,3.0,2.0,1.0,0.0,0.0,0.0,4.777777777777778
fetch(),2.0,2.0,1.0,0.0,2.0,1.0,0.0,0.0,0.0,1.0,3.888888888888889
```