# Understanding vectorization

In [1]:
using BenchmarkTools

In [2]:
v_64 = rand(10_000_000);
v_32 = Array{Float32}(rand(10_000_000));

In [3]:
v_16 = Array{Float16}(rand(10_000_000));

In [4]:
# size in ram
sizeof(v_64)

80000000

In [5]:
function sum_elements(v)
    aux = zero(eltype(v))
    for v_k in v
        aux += v_k
    end
    return aux
end

sum_elements (generic function with 1 method)

In [6]:
@benchmark sum_elements(v_64)

BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     9.179 ms (0.00% GC)
  median time:      11.251 ms (0.00% GC)
  mean time:        12.052 ms (0.00% GC)
  maximum time:     53.751 ms (0.00% GC)
  --------------
  samples:          412
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

In [7]:
@benchmark sum_elements(v_32)

BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     9.475 ms (0.00% GC)
  median time:      12.394 ms (0.00% GC)
  mean time:        12.908 ms (0.00% GC)
  maximum time:     35.987 ms (0.00% GC)
  --------------
  samples:          384
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

In [8]:
function sum_elements_simd(v)
    aux = zero(eltype(v))
    @simd for v_k in v
        aux += v_k
    end
    return aux
end

sum_elements_simd (generic function with 1 method)

In [9]:
@benchmark sum_elements_simd(v_64)

BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     4.903 ms (0.00% GC)
  median time:      5.364 ms (0.00% GC)
  mean time:        5.619 ms (0.00% GC)
  maximum time:     9.929 ms (0.00% GC)
  --------------
  samples:          873
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

In [32]:
@benchmark sum_elements_simd(v_32)

BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     2.433 ms (0.00% GC)
  median time:      2.459 ms (0.00% GC)
  mean time:        2.530 ms (0.00% GC)
  maximum time:     7.492 ms (0.00% GC)
  --------------
  samples:          1907
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

### Changing "weights" in a vector

In [None]:
function read_quora_train_data(path)

In [34]:
X_train = readcsv("/Users/david/Documents/Datasets/Quora_question_pairs/train.csv")

404291×6 Array{Any,2}:
       "id"        "qid1"        "qid2"  …   "is_duplicate"
      0           1             2           0              
      1           3             4           0              
      2           5             6           0              
      3           7             8           0              
      4           9            10        …  0              
      5          11            12           1              
      6          13            14           0              
      7          15            16           1              
      8          17            18           0              
      9          19            20        …  0              
     10          21            22           0              
     11          23            24           1              
      ⋮                                  ⋱  ⋮              
 404278      537919        169786           0              
 404279      537920        537921        …  0              
 404280      5379

In [38]:
X_train[1,:]

6-element Array{Any,1}:
 "id"          
 "qid1"        
 "qid2"        
 "question1"   
 "question2"   
 "is_duplicate"

In [39]:
X_train[723,:]

6-element Array{Any,1}:
  721                                                                                                                                        
 1438                                                                                                                                        
 1439                                                                                                                                        
     "How does Quora quickly mark questions as needing improvement?"                                                                         
     "Why does Quora mark my questions as needing improvement/clarification before I have time to give it details? Literally within seconds…"
    1                                                                                                                                        

In [37]:
versioninfo()

Julia Version 0.6.0-dev.2069
Commit ff9a949 (2017-01-13 02:17 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)


In [27]:
function read_Quora_train()
    
    data = readcsv("/Users/david/Documents/Datasets/Quora_question_pairs/train.csv")
    id = Array{Int64}(data[2:end,1])
    qid1 = Array{Int64}(data[2:end,2])
    qid2 = Array{Int64}(data[2:end,3])
    questions1 = Array{String}([string(x) for x in data[2:end,4]])
    questions2 = Array{String}([string(x) for x in data[2:end,5]])
    y_train = Array{Int16}(data[2:end,6])

    return id, qid1, qid2, questions1, questions2, y_train
end

read_Quora_train (generic function with 1 method)

In [28]:
id, qid1, qid2, questions1, questions2, y_train = read_Quora_train();

In [15]:
typeof(questions2[722])

String

In [19]:
questions2[722]

"Why does Quora mark my questions as needing improvement/clarification before I have time to give it details? Literally within seconds…"

In [22]:
questions1[723]

"How will I contact a good hacker?"

In [23]:
y_train[723]

1

### Examples with  target =1 

In [208]:
questions1[8]

"How can I be a good geologist?"

In [190]:
y_train[6]

1

In [209]:
questions1[6]

"Astrology: I am a Capricorn Sun Cap moon and cap rising...what does that say about me?"

In [210]:
questions2[6]

"I'm a triple Capricorn (Sun, Moon and ascendant in Capricorn) What does this say about me?"

In [211]:
questions1[8]

"How can I be a good geologist?"

In [212]:
questions2[8]

"What should I do to be a great geologist?"

In [213]:
unique(y_train)

2-element Array{Int16,1}:
 0
 1