# <img src="https://github.com/JuliaLang/julia-logo-graphics/raw/master/images/julia-logo-color.png" height="100" /> _Colab Notebook Template_

## Instructions
1. Work on a copy of this notebook: _File_ > _Save a copy in Drive_ (you will need a Google account). Alternatively, you can download the notebook using _File_ > _Download .ipynb_, then upload it to [Colab](https://colab.research.google.com/).
2. If you need a GPU: _Runtime_ > _Change runtime type_ > _Harware accelerator_ = _GPU_.
3. Execute the following cell (click on it and press Ctrl+Enter) to install Julia, IJulia and other packages (if needed, update `JULIA_VERSION` and the other parameters). This takes a couple of minutes.
4. Reload this page (press Ctrl+R, or ⌘+R, or the F5 key) and continue to the next section.

_Notes_:
* If your Colab Runtime gets reset (e.g., due to inactivity), repeat steps 2, 3 and 4.
* After installation, if you want to change the Julia version or activate/deactivate the GPU, you will need to reset the Runtime: _Runtime_ > _Factory reset runtime_ and repeat steps 3 and 4.

In [None]:
!wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb
!dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb
!apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
!apt update -q
!apt install cuda gcc-6 g++-6 -y -q
!ln -s /usr/bin/gcc-6 /usr/local/cuda/bin/gcc
!ln -s /usr/bin/g++-6 /usr/local/cuda/bin/g++

--2024-06-16 01:41:02--  https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb
Resolving developer.nvidia.com (developer.nvidia.com)... 152.199.39.144
Connecting to developer.nvidia.com (developer.nvidia.com)|152.199.39.144|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://developer.nvidia.com/downloads/compute/cuda/9.0/prod/local_installers/cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb [following]
--2024-06-16 01:41:03--  https://developer.nvidia.com/downloads/compute/cuda/9.0/prod/local_installers/cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb
Reusing existing connection to developer.nvidia.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://developer.download.nvidia.com/compute/cuda/9.0/secure/Prod/local_installers/cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb?hTSm1BgArPTea8jN3PgDgav1a-Cc6zmrMAwoqPLSRI620agorXN2ubzeUOLQTVuxxbyUpMB9MlxottJfmNfSj

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.8.2" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia BenchmarkTools"
JULIA_PACKAGES_IF_GPU="CUDA" # or CuArrays for older Julia versions
JULIA_NUM_THREADS=2
#---------------------------------------------------#

if [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  nvidia-smi -L &> /dev/null && export GPU=1 || export GPU=0
  if [ $GPU -eq 1 ]; then
    JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"' &> /dev/null
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia

  echo ''
  echo "Successfully installed `julia -v`!"
  echo "Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then"
  echo "jump to the 'Checking the Installation' section."
fi

Installing Julia 1.8.2 on the current Colab Runtime...
2024-06-16 01:45:35 URL:https://storage.googleapis.com/julialang2/bin/linux/x64/1.8/julia-1.8.2-linux-x86_64.tar.gz [135859273/135859273] -> "/tmp/julia.tar.gz" [1]
Installing Julia package IJulia...
Installing Julia package BenchmarkTools...
Installing Julia package CUDA...


# Checking the Installation
The `versioninfo()` function should print your Julia version and some other info about the system:

In [None]:
versioninfo()

Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 2 × Intel(R) Xeon(R) CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake-avx512)
  Threads: 1 on 2 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/lib64-nvidia


In [None]:
using BenchmarkTools

M = rand(2^11, 2^11)

@btime $M * $M;

  317.809 ms (2 allocations: 32.00 MiB)


In [None]:
try
    using CUDA
catch
    println("No GPU found.")
else
    run(`nvidia-smi`)
    # Create a new random matrix directly on the GPU:
    M_on_gpu = CUDA.CURAND.rand(2^11, 2^11)
    @btime $M_on_gpu * $M_on_gpu; nothing
end

Sun Jun 16 01:52:14 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P8               9W /  70W |      3MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

LoadError: InterruptException:

In [None]:
using Pkg
Pkg.add("CSV"), using CSV
Pkg.add("DataFrames"), using DataFrames
Pkg.add("StatsModels"), using StatsModels
Pkg.add("GLM"), using GLM
Pkg.add("Random"), using Random
Pkg.add("MLDataUtils"), using MLDataUtils
Pkg.add("MLBase"), using MLBase
Pkg.add("FixedEffectModels"), using FixedEffectModels
Pkg.add("Lasso"), using Lasso
Pkg.add("MLJ"), using MLJ
Pkg.add("DecisionTree"), using DecisionTree
Pkg.add("RData"), using RData
Pkg.add("GLMNet"), using GLMNet
Pkg.add("PrettyTables"), using PrettyTables
Pkg.add("StatsBase")

[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m TranscodingStreams ─ v0.10.9
[32m[1m   Installed[22m[39m WeakRefStrings ───── v1.4.2
[32m[1m   Installed[22m[39m CodecZlib ────────── v0.7.4
[32m[1m   Installed[22m[39m FilePathsBase ────── v0.9.21
[32m[1m   Installed[22m[39m WorkerUtilities ──── v1.6.1
[32m[1m   Installed[22m[39m CSV ──────────────── v0.10.14
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.8/Project.toml`
 [90m [336ed68f] [39m[92m+ CSV v0.10.14[39m
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.8/Manifest.toml`
 [90m [336ed68f] [39m[92m+ CSV v0.10.14[39m
 [90m [944b1d66] [39m[92m+ CodecZlib v0.7.4[39m
 [90m [48062228] [39m[92m+ FilePathsBase v0.9.21[39m
 [90m [3bb67fe8] [39m[92m+ TranscodingStreams v0.10.9[39m
 [90m [ea10d353] [39m[92m+ WeakRefStrings v1.4.2[39m
 [90m [76eceee3] [39m[92m+ Wo

(nothing, nothing)

In [None]:
Pkg.add("StatsBase")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.8/Project.toml`
[33m⌅[39m[90m [2913bbd2] [39m[92m+ StatsBase v0.33.21[39m
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.8/Manifest.toml`


In [None]:
using Pkg, CSV, DataFrames, StatsModels, GLM, Random, RData, MLDataUtils, MLBase, FixedEffectModels, Lasso, MLJ, DecisionTree, GLMNet, PrettyTables

In [None]:
path = "/content/processed_esti.csv"
df = CSV.read(path, DataFrame)

Row,y,w,gender_female,gender_male,gender_transgender,ethnicgrp_asian,ethnicgrp_black,ethnicgrp_mixed_multiple,ethnicgrp_other,ethnicgrp_white,partners1,postlaunch,msm,age,imd_decile
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,1,1,0,1,0,0,0,1,0,0,0,1,0,27,5
2,0,0,0,1,0,0,0,0,0,1,0,0,0,19,6
3,0,1,0,1,0,0,1,0,0,0,0,1,0,26,4
4,0,0,1,0,0,0,0,0,0,1,1,0,0,20,2
5,1,1,1,0,0,1,0,0,0,0,0,1,0,24,3
6,1,1,0,1,0,0,0,0,0,1,0,1,0,24,2
7,1,1,1,0,0,0,0,0,0,1,0,0,0,24,4
8,0,0,0,1,0,0,1,0,0,0,0,1,0,21,2
9,0,1,0,1,0,0,0,0,0,1,1,0,0,27,2
10,1,1,1,0,0,0,1,0,0,0,0,0,0,21,6


In [38]:
#Setting the variablse we are going to use, which are mostly treatment and the other variables.
df[:, :age] = categorical(df[:, :age]);
df[:, :imd_decile] = categorical(df[:, :imd_decile]);
describe(df)

Row,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Int64,DataType
1,y,0.351926,0.0,0.0,1.0,0,Float64
2,w,,0.0,,1.0,0,"CategoricalValue{Int64, UInt32}"
3,gender_female,0.584244,0.0,1.0,1.0,0,Int64
4,gender_male,0.413456,0.0,0.0,1.0,0,Int64
5,gender_transgender,0.00230017,0.0,0.0,1.0,0,Int64
6,ethnicgrp_asian,0.0638298,0.0,0.0,1.0,0,Int64
7,ethnicgrp_black,0.0862565,0.0,0.0,1.0,0,Int64
8,ethnicgrp_mixed_multiple,0.0885566,0.0,0.0,1.0,0,Int64
9,ethnicgrp_other,0.013226,0.0,0.0,1.0,0,Int64
10,ethnicgrp_white,0.748131,0.0,1.0,1.0,0,Int64


In [39]:
using StatsBase, StatsModels
# We define the `poly` function as provided by the documentation of the `StatsModels` package:

# syntax: best practice to define a _new_ function
poly(x, n) = x^n

# type of model where syntax applies: here this applies to any model type
const POLY_CONTEXT = Any

# struct for behavior
struct PolyTerm{T,D} <: AbstractTerm
    term::T
    deg::D
end

Base.show(io::IO, p::PolyTerm) = print(io, "poly($(p.term), $(p.deg))")

# for `poly` use at run-time (outside @formula), return a schema-less PolyTerm
poly(t::Symbol, d::Int) = PolyTerm(term(t), term(d))

# for `poly` use inside @formula: create a schemaless PolyTerm and apply_schema
function StatsModels.apply_schema(t::FunctionTerm{typeof(poly)},
                                  sch::StatsModels.Schema,
                                  Mod::Type{<:POLY_CONTEXT})
    apply_schema(PolyTerm(t.args...), sch, Mod)
end

# apply_schema to internal Terms and check for proper types
function StatsModels.apply_schema(t::PolyTerm,
                                  sch::StatsModels.Schema,
                                  Mod::Type{<:POLY_CONTEXT})
    term = apply_schema(t.term, sch, Mod)
    isa(term, ContinuousTerm) ||
        throw(ArgumentError("PolyTerm only works with continuous terms (got $term)"))
    isa(t.deg, ConstantTerm) ||
        throw(ArgumentError("PolyTerm degree must be a number (got $t.deg)"))
    PolyTerm(term, t.deg.n)
end

function StatsModels.modelcols(p::PolyTerm, d::NamedTuple)
    col = modelcols(p.term, d)
    reduce(hcat, [col.^n for n in 1:p.deg])
end

# the basic terms contained within a PolyTerm (for schema extraction)
StatsModels.terms(p::PolyTerm) = terms(p.term)
# names variables from the data that a PolyTerm relies on
StatsModels.termvars(p::PolyTerm) = StatsModels.termvars(p.term)
# number of columns in the matrix this term produces
StatsModels.width(p::PolyTerm) = p.deg

StatsBase.coefnames(p::PolyTerm) = coefnames(p.term) .* "^" .* string.(1:p.deg)

# output


In [40]:
#Constructing the Data
coerce!(df, :y => MLJ.Continuous, :w => Multiclass)
y, X = unpack(df, ==(:y))
d, X = unpack(X, ==(:w))
select!(X, Not([:gender_male, :ethnicgrp_white, ]))

Row,gender_female,gender_transgender,ethnicgrp_asian,ethnicgrp_black,ethnicgrp_mixed_multiple,ethnicgrp_other,partners1,postlaunch,msm,age,imd_decile,n
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,0,0,0,0,1,0,0,1,0,27,5,1
2,0,0,0,0,0,0,0,0,0,19,6,1
3,0,0,0,1,0,0,0,1,0,26,4,1
4,1,0,0,0,0,0,1,0,0,20,2,1
5,1,0,1,0,0,0,0,1,0,24,3,1
6,0,0,0,0,0,0,0,1,0,24,2,1
7,1,0,0,0,0,0,0,0,0,24,4,1
8,0,0,0,1,0,0,0,1,0,21,2,1
9,0,0,0,0,0,0,1,0,0,27,2,1
10,1,0,0,1,0,0,0,0,0,21,6,1


In [41]:
X

Row,gender_female,gender_transgender,ethnicgrp_asian,ethnicgrp_black,ethnicgrp_mixed_multiple,ethnicgrp_other,partners1,postlaunch,msm,age,imd_decile,n
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,0,0,0,0,1,0,0,1,0,27,5,1
2,0,0,0,0,0,0,0,0,0,19,6,1
3,0,0,0,1,0,0,0,1,0,26,4,1
4,1,0,0,0,0,0,1,0,0,20,2,1
5,1,0,1,0,0,0,0,1,0,24,3,1
6,0,0,0,0,0,0,0,1,0,24,2,1
7,1,0,0,0,0,0,0,0,0,24,4,1
8,0,0,0,1,0,0,0,1,0,21,2,1
9,0,0,0,0,0,0,1,0,0,27,2,1
10,1,0,0,1,0,0,0,0,0,21,6,1


In [44]:
using StatsBase, StatsModels
# We define the `poly` function as provided by the documentation of the `StatsModels` package:

# syntax: best practice to define a _new_ function
poly(x, n) = x^n

# type of model where syntax applies: here this applies to any model type
const POLY_CONTEXT = Any

# struct for behavior
struct PolyTerm{T,D} <: AbstractTerm
    term::T
    deg::D
end

Base.show(io::IO, p::PolyTerm) = print(io, "poly($(p.term), $(p.deg))")

# for `poly` use at run-time (outside @formula), return a schema-less PolyTerm
poly(t::Symbol, d::Int) = PolyTerm(term(t), term(d))

# for `poly` use inside @formula: create a schemaless PolyTerm and apply_schema
function StatsModels.apply_schema(t::FunctionTerm{typeof(poly)},
                                  sch::StatsModels.Schema,
                                  Mod::Type{<:POLY_CONTEXT})
    apply_schema(PolyTerm(t.args...), sch, Mod)
end

# apply_schema to internal Terms and check for proper types
function StatsModels.apply_schema(t::PolyTerm,
                                  sch::StatsModels.Schema,
                                  Mod::Type{<:POLY_CONTEXT})
    term = apply_schema(t.term, sch, Mod)
    isa(term, ContinuousTerm) ||
        throw(ArgumentError("PolyTerm only works with continuous terms (got $term)"))
    isa(t.deg, ConstantTerm) ||
        throw(ArgumentError("PolyTerm degree must be a number (got $t.deg)"))
    PolyTerm(term, t.deg.n)
end

function StatsModels.modelcols(p::PolyTerm, d::NamedTuple)
    col = modelcols(p.term, d)
    reduce(hcat, [col.^n for n in 1:p.deg])
end

# the basic terms contained within a PolyTerm (for schema extraction)
StatsModels.terms(p::PolyTerm) = terms(p.term)
# names variables from the data that a PolyTerm relies on
StatsModels.termvars(p::PolyTerm) = StatsModels.termvars(p.term)
# number of columns in the matrix this term produces
StatsModels.width(p::PolyTerm) = p.deg

StatsBase.coefnames(p::PolyTerm) = coefnames(p.term) .* "^" .* string.(1:p.deg)

# output


┌───────────────┬────────────────────┬─────────────────┬─────────────────┬──────────────────────────┬─────────────────┬────────────┬────────────┬────────────┬────────────┬────────────┐
│[1m gender_female [0m│[1m gender_transgender [0m│[1m ethnicgrp_asian [0m│[1m ethnicgrp_black [0m│[1m ethnicgrp_mixed_multiple [0m│[1m ethnicgrp_other [0m│[1m partners1  [0m│[1m postlaunch [0m│[1m msm        [0m│[1m age        [0m│[1m imd_decile [0m│
│[90m Float64       [0m│[90m Float64            [0m│[90m Float64         [0m│[90m Float64         [0m│[90m Float64                  [0m│[90m Float64         [0m│[90m Float64    [0m│[90m Float64    [0m│[90m Float64    [0m│[90m Float64    [0m│[90m Float64    [0m│
│[90m Continuous    [0m│[90m Continuous         [0m│[90m Continuous      [0m│[90m Continuous      [0m│[90m Continuous               [0m│[90m Continuous      [0m│[90m Continuous [0m│[90m Continuous [0m│[90m Continuous [0m│[90m Continuous [