SKaiNET is an open-source deep learning framework written in Kotlin, designed with developers in mind to enable the creation modern AI powered applications with ease.
SKaiNET aims to democratize "Edge AI" by bridging the gap between high-level application development and low-level hardware optimization. We believe AI should be portable, type-safe, and developer-friendly, enabling seamless intelligence in everything from mobile apps to IoT devices without sacrificing performance.
Important
About the name
“SKaiNET” is a working project name chosen early in the project’s life as part of a personal learning and experimentation effort, before any trademark considerations were known.
The name is not intended to reference, infringe, or imply association with any existing trademarks, companies, or products. It is not a commercial brand and is not claimed or assignable to any company or organization that contributors may be affiliated with.
If a naming conflict arises, the project name may be changed in the future.
SKaiNET uses a hybrid backend strategy that separates development iteration from production deployment.
- Built-in Data Loaders:
MNIST,Fashion-MNIST,CIFAR-10 - I/O Formats:
GGUF,ONNX,JSON,Image(JPEG, PNG, etc.) - Transformation DSL: Compose complex preprocessing pipelines including image resizing, normalization, and tensor conversion, using a type-safe Kotlin DSL.
// Data Transformation Pipeline
val transform = transforms<PlatformBitmapImage, Tensor<FP32, Float>> {
resize(224, 224)
centerCrop(200, 200)
toTensor(ctx)
normalize(ctx, mean = floatArrayOf(0.485f, 0.456f, 0.406f), std = floatArrayOf(0.229f, 0.224f, 0.225f))
}
val processedTensor = transform.apply(rawImage)// data loaders
val ds = MNIST.load(train = true)
val (x, y) = ds.nextBatch(64)
// Type-safe tensor creation via tensor DSL
val ctx = DefaultNeuralNetworkExecutionContext()
val mask = data<FP32, Float>(ctx) {
tensor{
shape(3, 3) {
from(
1f, 0f, 0f,
1f, 1f, 0f,
1f, 1f, 1f,
)
}
}
}
val t = tensor<FP32, Float>(ctx, FP32::class) {
tensor {
shape(2, 3) {
from(
0f, 1f, 2f,
10f, 11f, 12f
)
}
}
}
println("shape=${t.shape} first=${t.data[0,0]}")- Kotlin DSLs for Data, Neural Nets, Graphs, and Pipelines
val model = nn {
input(28 * 28)
dense(out = 128)
relu()
dense(out = 10)
}For complex architectures with arbitrary wiring (like YOLO or ResNet), use the Graph DSL:
val program = dag {
val x = input<FP32>("input", spec)
val c1 = conv2d(x, w1, b1, padding = 1 to 1)
val c2 = conv2d(c1, w2, b2, padding = 1 to 1)
val sum = add(x, c2)
output(relu(sum))
}Read the Graph DSL Documentation for more details.
- Kotlin Notebook support Explorer and Notebook-friendly APIs
// Works smoothly in Kotlin Notebooks
display(model.summary())
println(ds.describe())- MLIR/StableHLO Backend: Lowering from high-level Kotlin DSL to MLIR StableHLO dialect.
- Optimization Passes: Extensible transformation API for optimizing the compiled IR.
ConstantFoldingPass: Folds arithmetic operations with constant operands.OperationFusionPass: Fuses multiple ops (e.g., Add + ReLU) into efficient kernels.DeadCodeEliminationPass: Removes unused computations.
// Applying Compiler Optimizations
val optimizer = StableHloOptimizer.createDefault()
val optimizedModule = optimizer.optimize(mlirModule)- Arduino C Code Generation: Export models to standalone, optimized C99 code with static memory allocation.
// Export model to an Arduino library
val facade = CCodegenFacade()
facade.exportToArduinoLibrary(
model = model,
forwardPass = { ctx -> model.forward(input, ctx) },
outputPath = "build/arduino",
libraryName = "MyModel"
)Read the Deep Technical Explanation for more details.
- Clean APIs, growing docs, Maven Central artifacts
- Get productive in minutes with minimal deps
dependencies {
implementation("sk.ainet.core:SKaiNET-lang-core:0.9.1")
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.9.1")
}
// Ready to build & run in ~8 minutes- From Kotlin code in apps, libraries, CLIs
- In Kotlin Notebooks for quick exploration
- With sample projects to learn patterns
Gradle (Kotlin DSL):
dependencyResolutionManagement {
repositories {
mavenCentral()
}
}
dependencies {
// minimal dependency with simple CPU backend
implementation("sk.ainet.core:SKaiNET-lang-core:0.9.1")
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.9.1")
// simple model zoo
implementation("sk.ainet.core:SKaiNET-lang-models:0.9.1")
// Optional I/O (e.g., GGUF loader, JSON)
implementation("sk.ainet.core:SKaiNET-io-core:0.9.1")
implementation("sk.ainet.core:SKaiNET-io-gguf:0.9.1")
}Maven:
<dependency>
<groupId>sk.ainet.core</groupId>
<artifactId>SKaiNET-lang-core</artifactId>
<version>0.9.1</version>
</dependency>- See examples
- Kotlin Notebook: https://github.com/SKaiNET-developers/SKaiNET-notebook
- SafeTensors: Native support for the SafeTensors format for secure and fast model loading.
- Generalized Weight Loading: Improved I/O pipeline with
WeightMapperand progress tracking. - JVM Vector API: Optimized tensor kernels for JVM using SIMD instructions.
- Llama & GGUF: Enhanced tokenizer and ingestion logic for Llama-based models.
// Example: Loading SafeTensors weights
val loader = SafeTensorsParametersLoader(ctx)
loader.load("model.safetensors", model)See CHANGELOG.md for the full list.
- KLlama (Llama 2 port): Initial version supporting GGUF models with
mmapfor zero-copy loading. - Quantization & BitNet: Support for
Q8_0,Q4_K, and BitNet/Ternary (TQ1_0,TQ2_0) formats. - Streaming & I/O: Added streaming support for GGUF/ONNX and improved GGUF metadata loading.
- Advanced Operations: Added
LeakyReLU,ELU,AvgPool2d,Conv1d, andConv3d. - Optimizers & Metrics: New
Adam,AdamWoptimizers andAccuracymetrics. - Datasets & Transforms: Support for
CIFAR-10,Fashion-MNIST, and a newData Transform API.
// Example: Streaming inference with KLlama (GGUF)
val llama = KLlama.load("path/to/model.gguf")
llama.generate("Once upon a time") { token ->
print(token) // streaming output
}- WASM/JS: Initial support for web-based deployments.
- GGUF-only: Simplified I/O by focusing on GGUF (removed legacy formats).
See CHANGELOG.md for the full list.
- Autograd Engine: Initial support for automatic differentiation and reverse-mode gradients using
DefaultGradientTape. - Optimization & Training: New
SgdOptimizerand training DSL to build and run training loops. - Loss Functions: Added
MSELossandCrossEntropyLosswith configurable reduction strategies.
// Example training step with Autograd
val loss = MSELoss()
val optimizer = sgd(lr = 0.01)
val (tape, l) = record { loss.forward(model.forward(x, ctx), y, ctx) }
tape.computeGradients(targets = listOf(l), sources = model.parameters())
optimizer.step()- Improved Graph DSL with better wiring and recording support.
- Stability improvements for StableHLO and CUDA backends.
See CHANGELOG.md for the full list.
- StableHLO and CUDA support via IREE
// Compile model to StableHLO and run on CUDA
val ir = Compile.toStableHlo(model)
println(ir.pretty())- Arduino C99 code generation
// Export model to an Arduino library
val facade = CCodegenFacade()
facade.exportToArduinoLibrary(
model = model,
forwardPass = { ctx -> model.forward(input, ctx) },
outputPath = "build/arduino",
libraryName = "MyModel"
)- KSP-based TracingOps generation for recording pipelines.
- Improved HLO implementation and CUDA backend strategy.
See CHANGELOG.md for the full list.
- Kolmogorov–Arnold Networks (KAN/AKN) preview in the NN DSL
val model = nn {
input(64)
dense(out = 64)
// KAN layer (preview) with residual when dims match
kanLayer(outputDim = 64, gridSize = 16, useResidual = true)
dense(out = 10)
}- Training/Eval phases made easy
val base = DefaultNeuralNetworkExecutionContext() // default = EVAL
val yTrain = train(base) { ctx -> model.forward(x, ctx) }
val yEval = eval(base) { ctx -> model.forward(x, ctx) }- Dropout and BatchNorm layers
val y = x
.let { dropout(p = 0.1).forward(it, ctx) }
.let { batchNorm(numFeatures = 64).forward(it, ctx) }- Conv2D + MaxPool in the NN DSL
val model = nn {
conv2d(outChannels = 16, kernel = 3)
maxPool2d(kernel = 2)
dense(out = 10)
}- Data API with MNIST loader and JSON dataset support
val ds = MNIST.load(train = true) // platform-aware loader
val (batchX, batchY) = ds.nextBatch(64)- GGUF model loading (initial)
val gguf = GGUF.read("/path/to/model.gguf")
println("Tensors: ${gguf.tensors.size}")- SIMD/Vector API acceleration on JVM; MatMul, tril, pooling ops; forward hooks and simple tape recording; unified tensor creation contexts; nested data blocks returning tensors.
SKaiNET includes an initial KAN layer implementation that you can wire into the NN DSL. A KAN layer expands each input feature by a learnable grid of basis coefficients and then mixes them with a linear projection, with optional bias and residual connection.
- Current status: experimental/preview. API and behavior may change.
- Forward path uses broadcasted basis expansion and a matmul mixing step.
gridSize,useBias,useResidual, and a custombaseActivationare supported. Thedegreeparameter is reserved for future spline/basis functions and is not yet used.
Quick usage example:
val model = nn {
input(64)
dense(out = 64)
// Add a KAN layer that keeps the same dimensionality and uses a residual connection
kanLayer(outputDim = 64, gridSize = 16, useResidual = true)
dense(out = 10)
}Notes and limitations:
- Works with the default CPU backend; performance tuning and specialized kernels may arrive later.
- Residuals are applied only when
outputDim == inputDim. - You can customize initializers for the mixing weights, basis, and bias via the DSL block.
See source for details:
- SKaiNET-lang/SKaiNET-kan/src/commonMain/kotlin/sk/ainet/lang/kan/KanDsl.kt
- SKaiNET-lang/SKaiNET-kan/src/commonMain/kotlin/sk/ainet/lang/kan/KanLayer.kt
Minimize cosine distance between tensors with just a few lines:
skainet(ctx) {
val a = tensor(1f, 0f, 0f).withRequiresGrad()
val b = tensor(0f, 1f, 0f)
// Record and compute gradients
val (tape, distance) = record { a.cosineDistance(b) }
tape.computeGradients(targets = listOf(distance), sources = listOf(a))
// Optimize
val optimizer = sgd(lr = 0.5)
optimizer.addParameter(a)
optimizer.step()
println("Distance decreased to: ${a.cosineDistance(b).data.get()}")
}- Q1 2026: Full StableHLO coverage for Vision & NLP ops.
- Q2 2026: K2 Compiler Plugin for implicit context propagation.
- Q3 2026: IREE-based WebGPU backend for high-performance browser AI.
- Q4 2026: Federated learning support for multi-device training.
- GitHub Discussions: Ask questions & suggest features as issue
We love contributions! Whether it's a new operator, documentation, or a bug fix:
- Read our Contribution Guide.
- Check the Good First Issues.
MIT — see LICENCE.
