## Hardware Details
[GCP](https://cloud.google.com/) VM: [n1-highmem-16](https://cloud.google.com/compute/docs/machine-types#n1_machine_types) (16 vCPUs, 104 GB memory)

In [1]:
cat(system("lscpu", intern=TRUE), sep='\n')

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping:              0
CPU MHz:               2300.000
BogoMIPS:              4600.00
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-15
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hyperviso

In [2]:
cat(system("cat /proc/meminfo | head -n1", intern=TRUE), sep='\n')

MemTotal:       107091244 kB


In [4]:
R.version

               _                           
platform       x86_64-redhat-linux-gnu     
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          6.0                         
year           2019                        
month          04                          
day            26                          
svn rev        76424                       
language       R                           
version.string R version 3.6.0 (2019-04-26)
nickname       Planting of a Tree          

## Basic functions

In [3]:
library(stringi)
library(microbenchmark)
library(dplyr)
library(bit64)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: bit
Attaching package bit
package:bit (c) 2008-2012 Jens Oehlschlaegel (GPL-2)
creators: bit bitwhich
coercion: as.logical as.integer as.bit as.bitwhich which
operator: ! & | xor != ==
querying: print length any all min max range sum summary
bit access: length<- [ [<- [[ [[<-
for more help type ?bit

Attaching package: ‘bit’

The following object is masked from ‘package:base’:

    xor

Attaching package bit64
package:bit64 (c) 2011-2012 Jens Oehlschlaegel
creators: integer64 seq :
coercion: as.integer64 as.vector as.logical as.integer as.double as.character as.bin
logical operator: ! & | xor != == < <= >= >
arithmetic operator: + - * / %/% %% ^
math: sign abs sqrt log log2 log10
math: floor ceiling trunc round
querying: is.integer64 is.vector [is.atomic} [length] f

In [4]:
createTable <- function(rowCount) {
    gc()
    data.frame(
        bucket = stri_rand_strings(rowCount, 2, pattern = "[a-z]"),
        qty = as.integer64(sample(1:100, rowCount, replace = TRUE)),
        risk = as.integer64(sample(1:10, rowCount, replace = TRUE)),
        weight = runif(rowCount, 0, 2),
        stringsAsFactors = FALSE
    )
}

In [5]:
executeQuery <- function(t) {
    t %>%
        group_by(bucket) %>%
        summarise(
            NR = n(),
            TOTAL_QTY = sum(qty), AVG_QTY = mean(qty),
            TOTAL_RISK = sum(risk), AVG_RISK = mean(risk),
            WEIGHTED_QTY = weighted.mean(qty, weight),
            WEIGHTED_RISK = weighted.mean(risk, weight)
        )
}

## 10k

In [6]:
t <- createTable(10 * 1000)

In [7]:
summary(microbenchmark(executeQuery(t), times = 100))

expr,min,lq,mean,median,uq,max,neval
<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
executeQuery(t),265.5714,280.2492,288.175,285.3571,291.2224,402.6351,100


## 100k

In [8]:
t <- createTable(100 * 1000)

In [9]:
summary(microbenchmark(executeQuery(t), times = 100))

expr,min,lq,mean,median,uq,max,neval
<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
executeQuery(t),1.730475,1.830667,1.881672,1.87219,1.934549,2.047844,100


## 1M

In [10]:
t <- createTable(1000 * 1000)

In [11]:
summary(microbenchmark(executeQuery(t), times = 100))

expr,min,lq,mean,median,uq,max,neval
<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
executeQuery(t),18.63896,19.50142,19.86172,19.78953,20.2076,21.54839,100


## 10M

In [12]:
t <- createTable(10 * 1000 * 1000)

In [13]:
summary(microbenchmark(executeQuery(t), times = 100))

expr,min,lq,mean,median,uq,max,neval
<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
executeQuery(t),190.6445,193.0429,195.8631,194.4574,196.0971,219.2777,100


# 100M
We execute the tests ten times only!

In [14]:
t <- createTable(100 * 1000 * 1000)

In [15]:
summary(microbenchmark(executeQuery(t), times = 10))

expr,min,lq,mean,median,uq,max,neval
<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
executeQuery(t),1577.265,2097.603,2060.511,2107.59,2131.207,2137.213,10


## 1B
We execute the tests ten times only!

In [16]:
t <- createTable(1000 * 1000 * 1000)

In [None]:
summary(microbenchmark(executeQuery(t), times = 10))