Julia is a relatively new programming language.

In [5]:
a = 1 + 1

2

In [6]:
A = [
1 0 1
1 1 0
0 1 0]

3×3 Array{Int64,2}:
 1  0  1
 1  1  0
 0  1  0

In [7]:
B = [
0 0 1 1
1 1 0 0
1 0 1 0]

3×4 Array{Int64,2}:
 0  0  1  1
 1  1  0  0
 1  0  1  0

In [8]:
C = A * B

3×4 Array{Int64,2}:
 1  0  2  1
 1  1  1  1
 1  1  0  0

Suppose we had a small corpus with four words. \
"walk", "walks", "walked"(past tense) and "walked" (past participle)

In [9]:
# #wa wal alk lk# lks ks# lke ked ed#
C = [
1 1 1 1 0 0 0 0 0 # walk
1 1 1 0 1 1 0 0 0 # walks
1 1 1 0 0 0 1 1 1 # walked_past
1 1 1 0 0 0 1 1 1 # walked_part
];

In [10]:
# WALK PRESENT PAST SINGULAR PARTICIPLE
S_feature = [
1 1 0 0 0
1 1 0 1 0
1 0 0 0 0
1 0 1 0 0
1 0 1 0 1
];

In [11]:
L = [
-1.52  -0.69   0.05  -0.31   1.60   0.23 # WALK
-0.92  -0.86   1.30   0.01   0.22  -0.58 # PRESENT
-0.69   0.98   2.94  -0.06   1.10   2.10 # PAST
-0.01   0.69  -0.26  -0.56  -0.89  -1.09 # SINGULAR
-1.37  -0.98   0.81   0.01   0.56   0.31 # PARTICIPLE
];

In [12]:
S1 = L[1:1,:] + L[2:2,:]                  # walk
S2 = L[1:1,:] + L[2:2,:] + L[4:4,:]       # walks
S3 = L[1:1,:] + L[3:3,:]                  # walked_past
S4 = L[1:1,:] + L[3:3,:] + L[5:5,:]       # walked_part

S = vcat(S1, S2, S3, S4)

4×6 Array{Float64,2}:
 -2.44  -1.55  1.35  -0.3   1.82  -0.35
 -2.45  -0.86  1.09  -0.86  0.93  -1.44
 -2.21   0.29  2.99  -0.37  2.7    2.33
 -3.58  -0.69  3.8   -0.36  3.26   2.64

In [13]:
using LinearAlgebra
F = pinv(C)*S

9×6 Array{Float64,2}:
 -0.712308  -0.314872    0.465641   -0.131026     0.504359  -0.0371795
 -0.712308  -0.314872    0.465641   -0.131026     0.504359  -0.0371795
 -0.712308  -0.314872    0.465641   -0.131026     0.504359  -0.0371795
 -0.303077  -0.605385   -0.0469231   0.0930769    0.306923  -0.238462
 -0.156538   0.0423077  -0.153462   -0.233462    -0.291538  -0.664231
 -0.156538   0.0423077  -0.153462   -0.233462    -0.291538  -0.664231
 -0.252692   0.248205    0.666026    0.00935897   0.488974   0.865513
 -0.252692   0.248205    0.666026    0.00935897   0.488974   0.865513
 -0.252692   0.248205    0.666026    0.00935897   0.488974   0.865513

In [14]:
G = pinv(S)*C

6×9 Array{Float64,2}:
  0.0735187   0.0735187   0.0735187  …  -0.0265705  -0.0265705  -0.0265705
  0.186802    0.186802    0.186802       0.225962    0.225962    0.225962
  0.0125078   0.0125078   0.0125078      0.153566    0.153566    0.153566
 -0.156934   -0.156934   -0.156934      -0.0817396  -0.0817396  -0.0817396
  0.699641    0.699641    0.699641       0.0501479   0.0501479   0.0501479
 -0.376047   -0.376047   -0.376047   …   0.107702    0.107702    0.107702

In [15]:
# comprehension model
# take "WALK"
C_val = [1 1 1 1 0 0 0 0 0]

1×9 Array{Int64,2}:
 1  1  1  1  0  0  0  0  0

In [16]:
Shat_val = C_val * F

1×6 Array{Float64,2}:
 -2.44  -1.55  1.35  -0.3  1.82  -0.35

In [17]:
using Statistics
# evaluation
rS = cor(S, Shat_val, dims=2)

4×1 Array{Float64,2}:
 1.0
 0.9218760403270171
 0.8657589393769567
 0.9084650742003979

In [18]:
# production evaluation
# take "WALK" again
S_val = L[1:1,:] + L[2:2,:]

1×6 Array{Float64,2}:
 -2.44  -1.55  1.35  -0.3  1.82  -0.35

In [19]:
Chat_val = S_val * G

1×9 Array{Float64,2}:
 1.0  1.0  1.0  1.0  -2.40912e-16  …  -5.24964e-16  -5.24964e-16

In [20]:
round.(Chat_val, digits=2)

1×9 Array{Float64,2}:
 1.0  1.0  1.0  1.0  -0.0  -0.0  -0.0  -0.0  -0.0

Basic numeric operations comparison 

In [21]:
# matrix multiplication
A = randn(20000, 2000)
B = randn(2000, 2000)
@time C = A * B;

# in R
# 1.183169 mins

  0.723391 seconds (2 allocations: 305.176 MiB, 1.22% gc time)


In [23]:
# pair-wise correlation
A = randn(20000, 2000)
B = randn(20000, 2000)

@time C = cor(A, B, dims=2);

# in R
# 

 11.521570 seconds (39 allocations: 3.577 GiB, 1.39% gc time)


In [70]:
# sparse matrix multiplication
using SparseArrays

function sparseN(N, n)
  sparse(
    rand(1:N*10, n),
    rand(1:N, n),
    randn(Float64, n),
    N*10,
    N)
end

A = sparseN(2000, 12000)

@show length(A.nzval) / A.m / A.n
B = randn(2000, 2000)
@time C = A * B

A = Array(A)
@time C = A * B

A = sparseN(2000, 1200000)

@show length(A.nzval) / A.m / A.n
B = randn(2000, 2000)
@time C = A * B

A = Array(A)
@time C = A * B;

(length(A.nzval) / A.m) / A.n = 0.000299975
  0.184816 seconds (2 allocations: 305.176 MiB, 15.56% gc time)
  0.806342 seconds (2 allocations: 305.176 MiB, 3.60% gc time)
(length(A.nzval) / A.m) / A.n = 0.029556875000000003
  5.291147 seconds (2 allocations: 305.176 MiB, 0.54% gc time)
  0.708202 seconds (2 allocations: 305.176 MiB, 0.15% gc time)


In [82]:
# inverse matrix
function sparseNBit(N, n)
  sparse(
    rand(1:N*10, n),
    rand(1:N, n),
    ones(Int64, n),
    N*10,
    N)
end

C = sparseNBit(2000, 1200000)
S = randn(20000, 2000);

In [80]:
@time F = pinv(Array(C)) * S;

# in R
# 

 10.537358 seconds (37 allocations: 1.669 GiB, 3.01% gc time)


In [81]:
@time G = pinv(S) * Array(C);

# in R
# 

 64.161852 seconds (353 allocations: 1.669 GiB, 0.36% gc time)


In [89]:
# C * F = S
# (CtC)' * Ct * C * F = (CtC)' * Ct * S
# F = (CtC)' * Ct * S
Ca = Array(C)
@time F = pinv(Ca'Ca) * Ca' * S;

 94.764339 seconds (47 allocations: 610.652 MiB, 0.16% gc time)


In [90]:
@time G = pinv(S'S) * S' * C;

 24.568936 seconds (41 allocations: 610.652 MiB, 0.67% gc time)


In [91]:
using JudiLing

@time F = JudiLing.make_transform_matrix(C, S);
@time G = JudiLing.make_transform_matrix(S, C);

  3.406613 seconds (83 allocations: 650.188 MiB, 4.57% gc time)
  3.709776 seconds (24 allocations: 244.157 MiB, 3.64% gc time)


Path finding algorithms \
learn_paths

In [100]:
# #wa wal alk lk# lks ks# lke ked ed#
C = [
1 1 1 1 0 0 0 0 0 # walk
1 1 1 0 1 1 0 0 0 # walks
1 1 1 0 0 0 1 1 1 # walked_past
1 1 1 0 0 0 1 1 1 # walked_part
];

In [119]:
L = [
-1.52  -0.69   0.05  -0.31   1.60   0.23 # WALK
-0.92  -0.86   1.30   0.01   0.22  -0.58 # PRESENT
-0.69   0.98   2.94  -0.06   1.10   2.10 # PAST
-0.01   0.69  -0.26  -0.56  -0.89  -1.09 # SINGULAR
-1.37  -0.98   0.81   0.01   0.56   0.31 # PARTICIPLE
]

S1 = L[1:1,:] + L[2:2,:]                  # walk
S2 = L[1:1,:] + L[2:2,:] + L[4:4,:]       # walks
S3 = L[1:1,:] + L[3:3,:]                  # walked_past
S4 = L[1:1,:] + L[3:3,:] + L[5:5,:]       # walked_part

S = vcat(S1, S2, S3, S4);

In [120]:
F = pinv(C) * S

9×6 Array{Float64,2}:
 -0.712308  -0.314872    0.465641   -0.131026     0.504359  -0.0371795
 -0.712308  -0.314872    0.465641   -0.131026     0.504359  -0.0371795
 -0.712308  -0.314872    0.465641   -0.131026     0.504359  -0.0371795
 -0.303077  -0.605385   -0.0469231   0.0930769    0.306923  -0.238462
 -0.156538   0.0423077  -0.153462   -0.233462    -0.291538  -0.664231
 -0.156538   0.0423077  -0.153462   -0.233462    -0.291538  -0.664231
 -0.252692   0.248205    0.666026    0.00935897   0.488974   0.865513
 -0.252692   0.248205    0.666026    0.00935897   0.488974   0.865513
 -0.252692   0.248205    0.666026    0.00935897   0.488974   0.865513

In [110]:
G = pinv(S) * C

6×9 Array{Float64,2}:
  0.0735187   0.0735187   0.0735187  …  -0.0265705  -0.0265705  -0.0265705
  0.186802    0.186802    0.186802       0.225962    0.225962    0.225962
  0.0125078   0.0125078   0.0125078      0.153566    0.153566    0.153566
 -0.156934   -0.156934   -0.156934      -0.0817396  -0.0817396  -0.0817396
  0.699641    0.699641    0.699641       0.0501479   0.0501479   0.0501479
 -0.376047   -0.376047   -0.376047   …   0.107702    0.107702    0.107702

In [104]:
# t1
Y1 = [
1 0 0 0 0 0 0 0 0 # walk
1 0 0 0 0 0 0 0 0 # walks
1 0 0 0 0 0 0 0 0 # walked_past
1 0 0 0 0 0 0 0 0 # walked_part
]

M1 = pinv(C) * Y1

9×9 Array{Float64,2}:
 0.282051   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.282051   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.282051   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.153846   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0769231  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0769231  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0512821  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0512821  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0512821  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

In [107]:
# t2-6
Y2 = [
0 1 0 0 0 0 0 0 0 # walk
0 1 0 0 0 0 0 0 0 # walks
0 1 0 0 0 0 0 0 0 # walked_past
0 1 0 0 0 0 0 0 0 # walked_part
]

M2 = pinv(C) * Y2;

Y3 = [
0 0 1 0 0 0 0 0 0 # walk
0 0 1 0 0 0 0 0 0 # walks
0 0 1 0 0 0 0 0 0 # walked_past
0 0 1 0 0 0 0 0 0 # walked_part
]

M3 = pinv(C) * Y3;

Y4 = [
0 0 0 1 0 0 0 0 0 # walk
0 0 0 0 1 0 0 0 0 # walks
0 0 0 0 0 0 1 0 0 # walked_past
0 0 0 0 0 0 1 0 0 # walked_part
]

M4 = pinv(C) * Y4;

Y5 = [
0 0 0 0 0 0 0 0 0 # walk
0 0 0 0 0 1 0 0 0 # walks
0 0 0 0 0 0 0 1 0 # walked_past
0 0 0 0 0 0 0 1 0 # walked_part
]

M5 = pinv(C) * Y5;

Y6 = [
0 0 0 0 0 0 0 0 0 # walk
0 0 0 0 0 0 0 0 0 # walks
0 0 0 0 0 0 0 0 1 # walked_past
0 0 0 0 0 0 0 0 1 # walked_part
]

M6 = pinv(C) * Y6;

In [108]:
# WALK PAST SINGULAR
S_val = L[1:1,:] + L[3:3,:] + L[4:4,:]

1×6 Array{Float64,2}:
 -2.22  0.98  2.73  -0.93  1.81  1.24

In [111]:
Chat = S_val * G

1×9 Array{Float64,2}:
 1.0  1.0  1.0  -1.0  1.0  1.0  1.0  1.0  1.0

In [112]:
Y1hat = Chat * M1
# #wa

## #wa

1×9 Array{Float64,2}:
 1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

In [113]:
Y2hat = Chat * M2
# wal

## #wa wal

1×9 Array{Float64,2}:
 0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

In [114]:
Y3hat = Chat * M3
# alk

## #wa wal alk

1×9 Array{Float64,2}:
 0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0

In [115]:
Y4hat = Chat * M4
# lks lke

## #wa wal alk lks
## #wa wal alk lke

1×9 Array{Float64,2}:
 0.0  0.0  0.0  -1.0  1.0  0.0  1.0  0.0  0.0

In [116]:
Y5hat = Chat * M5
# ks# ked

## #wa wal alk lks lk#
## #wa wal alk lke ked

1×9 Array{Float64,2}:
 0.0  0.0  0.0  0.0  0.0  1.0  0.0  1.0  0.0

In [117]:
Y6hat = Chat * M6
# ks# ked

## #wa wal alk lks lk#
## #wa wal alk lke ked ked#

1×9 Array{Float64,2}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0

In [122]:
# synthesis-by-analysi
chat_can1 = [1 1 1 0 1 1 0 0 0]
chat_can2 = [1 1 1 0 0 0 1 1 1]

1×6 Array{Float64,2}:
 -2.895  -0.2  3.395  -0.365  2.98  2.485

In [123]:
shat_can1 = chat_can1 * F

1×6 Array{Float64,2}:
 -2.45  -0.86  1.09  -0.86  0.93  -1.44

In [124]:
shat_can2 = chat_can2 * F

1×6 Array{Float64,2}:
 -2.895  -0.2  3.395  -0.365  2.98  2.485

In [130]:
@show cor(shat_can1, S_val, dims=2)
@show cor(shat_can2, S_val, dims=2)

# walked win
# corrected

1×1 Array{Float64,2}:
 0.8260925961736686

1×1 Array{Float64,2}:
 0.9326808118255102

  9.894719 seconds (33 allocations: 1.341 GiB, 1.67% gc time)
 10.016799 seconds (33 allocations: 1.341 GiB, 0.42% gc time)


  3.144818 seconds (33 allocations: 244.441 MiB)
  3.282621 seconds (33 allocations: 244.441 MiB, 2.03% gc time)


 11.214519 seconds (35 allocations: 1.371 GiB, 0.28% gc time)
 70.357720 seconds (41 allocations: 1.371 GiB, 0.15% gc time)
 58.931692 seconds (44 allocations: 580.135 MiB, 0.32% gc time)
 64.083393 seconds (44 allocations: 580.135 MiB)


[?25l

[32m[1m   Updating[22m[39m git-repo `https://github.com/MegamindHenry/JudiLing.jl.git`


[2K[?25h

[32m[1m  Resolving[22m[39m package versions...
[32m[1m   Updating[22m[39m `C:\Users\hakun\.julia\environments\v1.4\Project.toml`
[90m [no changes][39m
[32m[1m   Updating[22m[39m `C:\Users\hakun\.julia\environments\v1.4\Manifest.toml`
[90m [no changes][39m


106.648334 seconds (1.52 M allocations: 315.828 MiB)
 63.959567 seconds (23 allocations: 244.157 MiB)
