# JuliaCon 2024
</br>

## End-to-End AI (E2EAI) with Julia, K0s, and Argo Workflow
</br>

#####     Presentor: Paulito Palmes, IBM Research
##### Collaborators: SUNRISE-6G EU Partners
#####          Date: July 11, 2024 

# OUTLINE

- ### The Motivations Behind E2EAI (End-to-End AI)
- ### Components of E2EAI
- ### The Julia AI/ML Solution Use-case
- ### The Future

# The Motivations Behind E2EAI

- current paradigms do not exploit tight integration of IaC and MLOPs in deploying AI solutions
- issues with no tight integration of IaC and MLOPs in deploying AI solutions include:
  - difficult to identify optimal infrastructure 
  - difficult to predict resource viability and feasibility
  - difficult to infer the cost of deployment
  - difficult to identify performance bottlenecks and root-cause analysis

# End-to-End AI (E2EAI)
<center>
    <img  src="./yamlcontents.png" width="600"/>
</center>

- E2EAI is a unified framework tightly integrating MLOps and IaC 
  - single yaml file: Infrastructure + ML Pipeline + LifeCycle Management
  - single yaml file to describe both the IaC and MLOPs
  - reliance on yaml workflow templates imply zero to minimal coding
  - collection of yamls can be used as inputs to LLM for intent-driven E2EAI

# Components of E2EAI
<center>
    <img  src="./e2eai-components.png" width="700"/>
    <img  src="./eu-funding.png" width="200"/>
</center>

- SUNRISE-6G
  - SUstainable federatioN of Research Infrastructures \
    for Scaling-up Experimentation in 6G
  - H2020 EU Project (3 years)

# The Julia AI/ML Solution Use-case

- AutoMLPipeline workflow
- Integrating AutoMLPipeline in E2EAI

### Load ML pipeline preprocessing components and models

In [24]:
using AutoMLPipeline;
import PythonCall; const PYC=PythonCall; warnings = PYC.pyimport("warnings"); warnings.filterwarnings("ignore")

#### Decomposition
pca = skoperator("PCA"); fa  = skoperator("FactorAnalysis"); ica = skoperator("FastICA")
#### Scaler 
rb   = skoperator("RobustScaler"); pt   = skoperator("PowerTransformer"); norm = skoperator("Normalizer")
mx   = skoperator("MinMaxScaler"); std  = skoperator("StandardScaler")
#### categorical preprocessing
ohe = OneHotEncoder()
#### Column selector
catf = CatFeatureSelector(); numf = NumFeatureSelector(); disc = CatNumDiscriminator()
#### Learners
rf = skoperator("RandomForestClassifier"); gb = skoperator("GradientBoostingClassifier"); lsvc = skoperator("LinearSVC")
svc = skoperator("SVC"); mlp = skoperator("MLPClassifier")
ada = skoperator("AdaBoostClassifier"); sgd = skoperator("SGDClassifier")
skrf_reg = skoperator("RandomForestRegressor"); skgb_reg = skoperator("GradientBoostingRegressor")
jrf = RandomForest(); tree = PrunedTree()
vote = VoteEnsemble(); stack = StackEnsemble(); best = BestLearner();

### Prepare dataset for classification

In [25]:
# Make sure that the input feature is a dataframe and the target output is a 1-D vector.
using AutoMLPipeline
profbdata = getprofb()
X = profbdata[:,2:end] 
Y = profbdata[:,1] |> Vector;
head(x)=first(x,10)
head(profbdata)

Row,Home.Away,Favorite_Points,Underdog_Points,Pointspread,Favorite_Name,Underdog_name,Year
Unnamed: 0_level_1,String7,Int64,Int64,Float64,String3,String3,Int64
1,away,27,24,4.0,BUF,MIA,89
2,at_home,17,14,3.0,CHI,CIN,89
3,away,51,0,2.5,CLE,PIT,89
4,at_home,28,0,5.5,NO,DAL,89
5,at_home,38,7,5.5,MIN,HOU,89
6,at_home,34,20,6.0,DEN,KC,89
7,away,31,21,6.0,LAN,ATL,89
8,at_home,24,27,2.5,NYJ,NE,89
9,away,16,13,1.5,PHX,DET,89
10,at_home,40,14,3.5,LAA,SD,89


### Pipeline to transform categorical features to one-hot encoding

In [26]:
pohe = catf |> ohe
tr = fit_transform!(pohe,X,Y)
head(tr)

Row,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28,x29,x30,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,x50,x51,x52,x53,x54,x55,x56
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Pipeline to transform numerical features to pca and ica and concatenate them

In [27]:
pdec = (numf |> pca) + (numf |> ica)
tr = fit_transform!(pdec,X,Y)
head(tr)

Row,x1,x2,x3,x4,x1_1,x2_1,x3_1,x4_1
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,2.47477,7.87074,-1.10495,0.902431,0.433184,0.0796294,1.21188,-0.696399
2,-5.47113,-3.82946,-2.08342,1.00524,-0.851126,-0.566853,1.16879,-0.17776
3,30.4068,-10.8073,-6.12339,0.883938,-1.91215,2.99234,1.1004,-1.30033
4,8.18372,-15.507,-1.43203,1.08255,-1.70283,0.955814,1.18439,0.499145
5,16.6176,-6.68636,-1.66597,0.978243,-0.88172,1.66475,1.19639,-0.180815
6,10.2588,5.22112,0.0731649,0.928496,0.44797,0.909697,1.24403,-0.328454
7,7.13435,5.60902,0.368661,0.939797,0.510885,0.601575,1.24843,-0.232919
8,-1.16369,10.3011,-2.15564,0.86957,0.449727,-0.33492,1.18641,-1.07025
9,-6.38764,-4.92017,-3.57339,0.986345,-1.20905,-0.673366,1.13179,-0.498682
10,17.0567,0.672,-3.29448,0.879581,-0.486706,1.57433,1.16653,-1.04527


### More complex pipeline with robust scaling and power transform

In [28]:
ppt = (numf |> rb |> ica) + (numf |> pt |> pca)
tr = fit_transform!(ppt,X,Y)
head(tr)

Row,x1,x2,x3,x4,x1_1,x2_1,x3_1,x4_1
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,0.0797428,0.433233,0.696339,-1.21189,-0.64552,1.40289,-0.0284468,0.111773
2,-0.566905,-0.851057,0.177854,-1.1688,-0.832404,0.475629,-1.14881,-0.01702
3,2.99223,-1.91236,1.30034,-1.10032,1.54491,1.65258,-1.35967,-2.57866
4,0.955637,-1.70299,-0.49905,-1.18435,1.32065,0.563565,-2.05839,-0.74898
5,1.6647,-0.881889,0.180796,-1.19634,1.1223,1.45555,-0.88864,-0.776195
6,0.909794,0.447895,0.328349,-1.24401,0.277462,1.70936,0.00130938,0.0768767
7,0.601674,0.510834,0.232825,-1.24842,0.0977821,1.58007,-0.0364638,0.258464
8,-0.334787,0.449855,1.07021,-1.18643,-1.31815,1.27463,0.00789964,-0.0553192
9,-0.67344,-1.20894,0.498816,-1.1318,-1.29056,0.326316,-1.31916,-0.511818
10,1.57436,-0.486784,1.04522,-1.1665,0.318224,1.76616,-0.28608,-1.02674


### Evaluating complex pipeline with RandomForest learner

In [29]:
prf = (catf |> ohe) + (numf |> rb |> fa) + (numf |> pt |> pca) |> rf
crossvalidate(prf,X,Y,"accuracy_score")

fold: 1, 0.6716417910447762
fold: 2, 0.6567164179104478
fold: 3, 0.75
fold: 4, 0.6865671641791045
fold: 5, 0.6716417910447762
fold: 6, 0.5970149253731343
fold: 7, 0.5970149253731343
fold: 8, 0.7205882352941176
fold: 9, 0.6865671641791045
fold: 10, 0.6119402985074627
errors: 0


(mean = 0.6649692712906058, std = 0.05105709742517328, folds = 10, errors = 0)

### Evaluating complex pipeline with Linear SVM learner

In [30]:
plsvc = ((numf |> rb |> pca)+(numf |> rb |> fa)+(numf |> rb |> ica)+(catf |> ohe )) |> lsvc
crossvalidate(plsvc,X,Y,"accuracy_score")

fold: 1, 0.746268656716418
fold: 2, 0.7313432835820896
fold: 3, 0.7647058823529411
fold: 4, 0.7611940298507462
fold: 5, 0.7761194029850746
fold: 6, 0.6567164179104478
fold: 7, 0.746268656716418
fold: 8, 0.7941176470588235
fold: 9, 0.7164179104477612
fold: 10, 0.7164179104477612
errors: 0


(mean = 0.740956979806848, std = 0.03870925271242532, folds = 10, errors = 0)

### Parallel search of the best ML pipeline

In [31]:
using Random, DataFrames, Distributed
nprocs() == 1 && addprocs()
@everywhere using DataFrames; @everywhere using AutoMLPipeline
@everywhere begin
    import PythonCall; const PYC=PythonCall; warnings = PYC.pyimport("warnings"); warnings.filterwarnings("ignore")
end
@everywhere begin
  profbdata = getprofb(); X = profbdata[:,2:end]; Y = profbdata[:,1] |> Vector;
end
@everywhere begin
  jrf  = RandomForest(); ohe  = OneHotEncoder(); catf = CatFeatureSelector(); numf = NumFeatureSelector()
  tree = PrunedTree(); ada  = skoperator("AdaBoostClassifier"); disc = CatNumDiscriminator()
  sgd  = skoperator("SGDClassifier"); std  = skoperator("StandardScaler"); lsvc = skoperator("LinearSVC")
end

learners = @sync @distributed (vcat) for learner in [jrf,ada,sgd,lsvc,tree]
   pcmc = disc |> ((catf |> ohe) + (numf |> std)) |> learner
   println(learner.name[1:end-4])
   mean,sd,_ = crossvalidate(pcmc,X,Y,"accuracy_score",3)
   DataFrame(name=learner.name[1:end-4],mean=mean,sd=sd)
end;

      From worker 4:	AdaBoostClassifier
      From worker 5:	SGDClassifier
      From worker 3:	rf
      From worker 7:	prunetree
      From worker 6:	LinearSVC
      From worker 5:	[33m[1m│ [22m[39m               for entry (13, 2) = SF.
      From worker 5:	[33m[1m│ [22m[39m               Patching value to PIT.
      From worker 5:	[33m[1m└ [22m[39m[90m@ AMLPipelineBase.BaseFilters ~/.julia/packages/AMLPipelineBase/FFCPY/src/basefilters.jl:106[39m
      From worker 5:	[33m[1m│ [22m[39m               for entry (65, 2) = SF.
      From worker 5:	[33m[1m│ [22m[39m               Patching value to PIT.
      From worker 5:	[33m[1m└ [22m[39m[90m@ AMLPipelineBase.BaseFilters ~/.julia/packages/AMLPipelineBase/FFCPY/src/basefilters.jl:106[39m
      From worker 5:	[33m[1m│ [22m[39m               for entry (161, 2) = SF.
      From worker 5:	[33m[1m│ [22m[39m               Patching value to PIT.
      From worker 5:	[33m[1m└ [22m[39m[90m@ AMLPipelineBase.

### Best Pipeline

In [32]:
@show sort!(learners,:mean,rev=true);

sort!(learners, :mean, rev = true) = 5×3 DataFrame
 Row │ name                mean      sd
     │ String              Float64   Float64
─────┼─────────────────────────────────────────
   1 │ LinearSVC           0.721726  0.0112349
   2 │ SGDClassifier       0.703869  0.0272772
   3 │ AdaBoostClassifier  0.66369   0.0201306
   4 │ rf                  0.65625   0.0389187
   5 │ prunetree           0.584821  0.0425866


# E2EAI Application

## Infrastructure Creation Automation

<center><img src="./k0s.png" width="500"/></center>

<center><img src="./cluster.png" width="800"/></center>

## AI as a Service: Zero Coding Using Workflow Template
<center><img src="./template.png" width="1000"/></center>

## Explicit ML Pipeline

<center><img src="./mlpipeline.png" width="600"/></center>

## Optimal Pipeline Discovery by AutoML
<center><img src="./lowcode.png" width="500"/></center>

## Low vs High Pipeline Complexity
<center><img src="./low-high-comp.png" width="300"/></center>

### Low Complexity Pipeline
<center><img src="./low-comp.png" width="500"/></center>

### High Complexity Pipeline 
<center><img src="./high-comp.png" width="500"/></center>

# The Future

<center> 
    <img  src="./e2eai-sphere.png" width="500"/> 
    <img  src="./flowchart.png" width="500"/> 
</center>


- ### Unified Control Plane
- ### Intent-Driven E2EAI

# Acknowledgement

<center> <img  src="./eu-funding.png" width="300"/> </center>


#### This work has been funded by the SUNRISE-6G project, Grant number 101139257, co-funded by the European Union and Smart Networks and Services Joint Undertaking (SNS JU).

### DISCLAIMER

##### "Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union and Smart Networks and Services Joint Undertaking (SNS JU). Neither the European Union nor the granting authority can be held responsible for them."

<br>
<br>
<br>


<center>
     <font size="+5">THANK YOU!</font>
</center>