 ## بسم الله الرّحمٰن الرّحيم 

# Theory of Concentrism in the Qur'an using Bayesian Optimization & Large Language Model

_by Al-Ahmadgaid B. Asaad_

#### Install Python Libraries

There are Julia's equivalent for the following Python libraries, but for this paper the author decided to use the official one which is in Python.

In [11]:
using Pkg

ENV["PYTHON"]="" # necessary for Conda.pip
Pkg.build("PyCall")

[32m[1m    Building[22m[39m Conda ─→ `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/b19db3927f0db4151cb86d073689f2428e524576/build.log`
[32m[1m    Building[22m[39m PyCall → `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/9816a3826b0ebf49ab4926e2b18842ad8b5c8f04/build.log`


In [12]:
using Conda

Conda.pip_interop(true)
Conda.pip("install", "sentence-transformers")
Conda.pip("install", "umap-learn")

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mRunning `conda config --set pip_interop_enabled true --file /Users/al-ahmadgaidasaad/.julia/conda/3/aarch64/condarc-julia.yml` in root environment
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mRunning `pip install sentence-transformers` in root environment




[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mRunning `pip install umap-learn` in root environment




#### Load Libraries

In [59]:
using BOSS
using CairoMakie
using Clustering
using DataFrames
using Distributions
using Distances
using Optimization
using PyCall
using QuranTree
using Statistics
using Turing
using Yunir

In [14]:
# load the python libraries
sentence_transformers = pyimport("sentence_transformers")
umap_py = pyimport("umap.umap_")
UMAP = umap_py.UMAP

PyObject <class 'umap.umap_.UMAP'>

#### Load CL-Arabert Embedding Model

In [16]:
model_path = "/Users/al-ahmadgaidasaad/Documents/School/Islamic Studies/ma-thesis/codes/notebooks/models/CL-Arabert"
emodel = sentence_transformers.SentenceTransformer(model_path);

No sentence-transformers model found with name /Users/al-ahmadgaidasaad/Documents/School/Islamic Studies/ma-thesis/codes/notebooks/models/CL-Arabert. Creating a new one with mean pooling.


#### Load Qur'an Data

In [18]:
_, tnzl = load(QuranData());
tnzl_tbl = table(tnzl)

Tanzil Quran Text (Uthmani)
(C) 2008-2010 Tanzil.net

[1m6236×3 DataFrame[0m
[1m  Row [0m│[1m chapter [0m[1m verse [0m[1m form                              [0m
      │[90m Int64   [0m[90m Int64 [0m[90m String                            [0m
──────┼───────────────────────────────────────────────────
    1 │       1      1  بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
    2 │       1      2  ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ
    3 │       1      3  ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
    4 │       1      4  مَٰلِكِ يَوْمِ ٱلدِّينِ
    5 │       1      5  إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ
    6 │       1      6  ٱهْدِنَا ٱلصِّرَٰطَ ٱلْمُسْتَقِيمَ
    7 │       1      7  صِرَٰطَ ٱلَّذِينَ أَنْعَمْتَ عَلَيْهِمْ غَيْرِ ٱلْمَغْضُو…
    8 │       2      1  بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ الٓمٓ
    9 │       2      2  ذَٰلِكَ ٱلْكِتَٰبُ لَا رَيْبَ فِيهِ هُدًى لِّلْمُتَّقِينَ
   10 │       2      3  ٱلَّذِينَ يُؤْمِنُونَ بِٱلْغَيْبِ وَيُقِيمُونَ ٱلصَّل…
   11 │       2      4  وَٱ

In [148]:
surah2 = verses(tnzl_tbl[2]);

In [149]:
surah2_emb = emodel.encode(surah2)

286×768 Matrix{Float32}:
 0.531018  -0.132496    -0.756551   …  -0.305054   0.989819   0.201078
 0.784264  -0.187889     0.303152       0.0520253  0.646556   0.616321
 0.677887  -0.778916    -0.807045      -0.0719035  1.12943    0.859909
 0.506211  -0.325038    -0.502814       0.459864   1.43758    0.466729
 0.269263  -0.194165    -0.539852      -0.0628042  0.77668    0.554814
 0.467806  -0.373838    -0.0348215  …   0.392261   1.14069    0.293043
 0.886688  -0.527031    -0.530024      -0.0749496  0.983624   0.308669
 0.60995   -0.301923    -0.17381        0.103315   0.984032   0.482254
 0.412077  -0.309258    -0.512169       0.890515   0.994584   0.67909
 0.996288  -0.699789    -0.477904      -0.24042    0.772055   0.535724
 0.123426  -0.530353     0.13315    …   0.0387213  0.998667   0.42128
 0.153852   0.00949113  -0.474826       0.17242    0.753725  -0.188628
 0.563945  -0.655017    -0.349274      -0.40291    1.35588    0.388349
 ⋮                                  ⋱   ⋮             

In [None]:
struct Slicer
    num_slices::Int64
    min_ayahs::Int64
end

In [None]:
function gen_slices(slicer::Slicer, ayahs::Vector{String})
    ayah_len = length(ayahs)
    if (slicer.slices < ayah_len)
        error("`slices` should be less than the length of `ayahs` vector to slice.")
    end
    rand(Uniform(slicer.min_ayahs, ayah_len - slicer.min_ayahs), )     
end

In [197]:
ayahs = surah2_emb;

num_slices = 7

7

In [198]:
dir_samples = rand(Dirichlet(repeat([1.5], num_slices - 1)), 10_000)

6×10000 Matrix{Float64}:
 0.0747496  0.197962   0.147077   …  0.0953913  0.124093   0.28173
 0.263974   0.0614421  0.204169      0.0699034  0.150389   0.0880941
 0.274278   0.150874   0.245379      0.105307   0.0711559  0.293795
 0.320078   0.232029   0.0152451     0.0477861  0.28238    0.143587
 0.0113221  0.067032   0.360862      0.173374   0.198752   0.0865986
 0.0555989  0.290661   0.0272685  …  0.508238   0.17323    0.106195

In [199]:
midpoints = Int64.(floor.(size(ayahs)[1] .* dir_samples))
midpoints = mapslices(sort, midpoints, dims=1)
midpoints = unique(midpoints, dims=2)

6×9999 Matrix{Int64}:
  3  17    4    2  17   15   12    5  …   5   19   5    6   12   13  20  24
 15  19    7   19  23   22   24   24      9   23  29   22   19   19  35  25
 21  43   42   22  26   34   27   35     58   28  48   25   23   27  43  30
 75  56   58   35  61   36   45   45     63   50  54   44   26   30  49  41
 78  66   70   62  70   67   45   68     68   61  67   47   86   49  56  80
 91  83  103  143  86  109  131  107  …  80  104  79  139  116  145  80  84

In [200]:
slices = Vector{Matrix{Float32}}[]
mp_size = size(midpoints)
for j in 1:mp_size[2]
    slice = Matrix{Float32}[]
    for i in 1:mp_size[1]
        if i == 1
            push!(slice, ayahs[1:midpoints[i, j],:])
        elseif i < mp_size[1]
            push!(slice, ayahs[(midpoints[i-1, j]+1):midpoints[i, j],:])
        else
            push!(slice, ayahs[(midpoints[i-1, j]+1):midpoints[i, j],:])
            push!(slice, ayahs[(midpoints[i, j]+1):end,:])
        end
    end
    push!(slices, slice)
end

In [201]:
slices[1]

7-element Vector{Matrix{Float32}}:
 [0.5310184 -0.13249622 … 0.9898191 0.20107773; 0.784264 -0.18788888 … 0.64655566 0.6163212; 0.6778873 -0.7789163 … 1.1294347 0.8599088]
 [0.50621146 -0.3250376 … 1.4375801 0.4667288; 0.269263 -0.19416487 … 0.7766804 0.5548137; … ; 0.53529483 -0.5275592 … 1.324009 0.65387005; 0.61365443 -0.45799717 … 1.0048498 0.7528942]
 [0.5153532 -0.33468977 … 1.3602252 0.49015307; 0.88209534 -0.53245246 … 0.73670083 0.30808163; … ; 0.70786935 -0.8578767 … 1.2227405 0.45505133; 0.41623342 -0.1308928 … 0.9314575 0.59851867]
 [0.69335663 -0.16880381 … 0.9878541 0.5400404; 0.6340914 0.0046891314 … 0.8854246 0.368674; … ; 0.6736406 -0.55540085 … 1.0537224 0.22417146; 0.7874164 -0.34338334 … 1.283298 0.55426145]
 [0.60748917 -0.4051675 … 1.3984671 0.5379322; -0.306053 -0.42617646 … 0.46820095 0.75357646; 0.6771596 -0.3709957 … 0.4407325 0.242626]
 [0.47408026 -0.6757015 … 0.73481303 0.47399515; 0.47682613 -0.5640961 … 1.489575 0.43350336; … ; 0.6923062 -0.13564682 … 0.9

### Summarizing Embeddings

In [202]:
function quantile_summary(v::Vector)
    sv = sort(v)

    min = minimum(sv)
    q1 = quantile(sv, 0.25)
    med = median(sv)
    q3 = quantile(sv, 0.5)
    max = maximum(sv)

    return [min, q1, med, q3, max]
end

quantile_summary (generic function with 1 method)

In [203]:
fivenums = Vector{Matrix{Float32}}[]
j = 1
for slice in slices
    fivenum = Matrix{Float32}[]
    for i in 1:size(slice)[1]
        @info i
        push!(fivenum, mapslices(quantile_summary, slice[i], dims=1))
    end
    j += 1
    @info j
    push!(fivenums, fivenum)
end

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m1
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m2
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m3
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m4
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m5
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m6
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m7
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m2
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m1
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m2
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m3
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m4
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m5
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m6
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m7
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m3
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m1
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m2
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m3
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m4
[36m[1m[ [22m[39m[36m[1mInfo: [22

LoadError: ArgumentError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer

In [187]:
slices[62][2]

0×768 Matrix{Float32}

In [189]:
dir_samples[:,62]

4-element Vector{Float64}:
 0.02978849839191926
 0.10918418123048566
 0.8304270317458531
 0.030600288631742086

In [190]:
midpoints[:, 62]

4-element Vector{Int64}:
   8
   8
  31
 237

In [174]:
mapslices(quantile_summary, slices[1][1]; dims=1)

5×768 Matrix{Float64}:
 0.123426  -0.857877    -0.872551  …  -0.434731   0.646556  -0.188628
 0.467806  -0.530353    -0.512169     -0.253229   0.885425   0.308669
 0.563945  -0.364934    -0.397269     -0.0452077  0.994584   0.482254
 0.563945  -0.364934    -0.397269     -0.0452077  0.994584   0.482254
 0.996288   0.00949113   0.303152      0.890515   1.43758    0.869055

In [175]:
mapslices(quantile_summary, slices[1][2]; dims=1)

5×768 Matrix{Float64}:
 0.0219057  -0.537929   -0.971374   …  -0.267718   0.673117  -0.194758
 0.30872    -0.391888   -0.493111      -0.0735184  1.01886    0.210183
 0.6057     -0.232318   -0.298277       0.0588869  1.15339    0.411876
 0.6057     -0.232318   -0.298277       0.0588869  1.15339    0.411876
 1.07862     0.0353266   0.0669256      0.416928   1.67765    0.862111

In [167]:
quantile(1:3, 0.2)

1.4000000000000001

In [168]:
?quantile

search: [0m[1mq[22m[0m[1mu[22m[0m[1ma[22m[0m[1mn[22m[0m[1mt[22m[0m[1mi[22m[0m[1ml[22m[0m[1me[22m c[0m[1mq[22m[0m[1mu[22m[0m[1ma[22m[0m[1mn[22m[0m[1mt[22m[0m[1mi[22m[0m[1ml[22m[0m[1me[22m [0m[1mq[22m[0m[1mu[22m[0m[1ma[22m[0m[1mn[22m[0m[1mt[22m[0m[1mi[22m[0m[1ml[22m[0m[1me[22m! [0m[1mq[22m[0m[1mu[22mote [0m[1mq[22m[0m[1mu[22m[0m[1ma[22m[0m[1mn[22m[0m[1mt[22m[0m[1mi[22m[0m[1ml[22m[0m[1me[22m_summary angle evalfile



```
quantile(itr, p; sorted=false, alpha::Real=1.0, beta::Real=alpha)
```

Compute the quantile(s) of a collection `itr` at a specified probability or vector or tuple of probabilities `p` on the interval [0,1]. The keyword argument `sorted` indicates whether `itr` can be assumed to be sorted.

Samples quantile are defined by `Q(p) = (1-γ)*x[j] + γ*x[j+1]`, where `x[j]` is the j-th order statistic of `itr`, `j = floor(n*p + m)`, `m = alpha + p*(1 - alpha - beta)` and `γ = n*p + m - j`.

By default (`alpha = beta = 1`), quantiles are computed via linear interpolation between the points `((k-1)/(n-1), x[k])`, for `k = 1:n` where `n = length(itr)`. This corresponds to Definition 7 of Hyndman and Fan (1996), and is the same as the R and NumPy default.

The keyword arguments `alpha` and `beta` correspond to the same parameters in Hyndman and Fan, setting them to different values allows to calculate quantiles with any of the methods 4-9 defined in this paper:

  * Def. 4: `alpha=0`, `beta=1`
  * Def. 5: `alpha=0.5`, `beta=0.5` (MATLAB default)
  * Def. 6: `alpha=0`, `beta=0` (Excel `PERCENTILE.EXC`, Python default, Stata `altdef`)
  * Def. 7: `alpha=1`, `beta=1` (Julia, R and NumPy default, Excel `PERCENTILE` and `PERCENTILE.INC`, Python `'inclusive'`)
  * Def. 8: `alpha=1/3`, `beta=1/3`
  * Def. 9: `alpha=3/8`, `beta=3/8`

!!! note
    An `ArgumentError` is thrown if `v` contains `NaN` or [`missing`](@ref) values. Use the [`skipmissing`](@ref) function to omit `missing` entries and compute the quantiles of non-missing values.


# References

  * Hyndman, R.J and Fan, Y. (1996) "Sample Quantiles in Statistical Packages", *The American Statistician*, Vol. 50, No. 4, pp. 361-365
  * [Quantile on Wikipedia](https://en.m.wikipedia.org/wiki/Quantile) details the different quantile definitions

# Examples

```jldoctest
julia> using Statistics

julia> quantile(0:20, 0.5)
10.0

julia> quantile(0:20, [0.1, 0.5, 0.9])
3-element Vector{Float64}:
  2.0
 10.0
 18.000000000000004

julia> quantile(skipmissing([1, 10, missing]), 0.5)
5.5
```

---

```
quantile(v, w::AbstractWeights, p)
```

Compute the weighted quantiles of a vector `v` at a specified set of probability values `p`, using weights given by a weight vector `w` (of type `AbstractWeights`). Weights must not be negative. The weights and data vectors must have the same length. `NaN` is returned if `x` contains any `NaN` values. An error is raised if `w` contains any `NaN` values.

With [`FrequencyWeights`](@ref), the function returns the same result as `quantile` for a vector with repeated values. Weights must be integers.

With non `FrequencyWeights`,  denote $N$ the length of the vector, $w$ the vector of weights, $h = p (\sum_{i \leq N} w_i - w_1) + w_1$ the cumulative weight corresponding to the probability $p$ and $S_k = \sum_{i \leq k} w_i$ the cumulative weight for each observation, define $v_{k+1}$ the smallest element of `v` such that $S_{k+1}$ is strictly superior to $h$. The weighted $p$ quantile is given by $v_k + \gamma (v_{k+1} - v_k)$ with  $\gamma = (h - S_k)/(S_{k+1} - S_k)$. In particular, when all weights are equal, the function returns the same result as the unweighted `quantile`.

---

```
quantile(d::UnivariateDistribution, q::Real)
```

Evaluate the (generalized) inverse cumulative distribution function at `q`.

For a given `0 ≤ q ≤ 1`, `quantile(d, q)` is the smallest value `x` in the support of `d` for which `cdf(d, x) ≥ q`.

See also: [`cquantile`](@ref), [`invlogcdf`](@ref), and [`invlogccdf`](@ref).

---

```
quantile(chains[; q = [0.025, 0.25, 0.5, 0.75, 0.975], append_chains = true, kwargs...])
```

Compute the quantiles for each parameter in the chain.

Setting `append_chains=false` will return a vector of dataframes containing the quantiles for each chain.
