Summary
GgufModelMetadata.from() and any code that reaches into reader.fields with the idiom (value as? Number)?.toInt() silently returns null for GGUF metadata stored as uint32 / uint64. In Kotlin, the unsigned types (UInt, ULong, UShort, UByte) do not extend kotlin.Number, so the as? Number cast yields null.
Modern GGUF files (anything produced by recent llama.cpp converters) store dimensions and counts as uint32. The result: contextLength, embeddingLength, headCount, layerCount, vocabSize (fallback), bosTokenId, eosTokenId, etc. are all populated as null instead of the real values, and the model loader falls back to defaults (e.g. blockCount = 0 → a transformer with zero layers).
Where
skainet-io/skainet-io-gguf/src/commonMain/kotlin/sk/ainet/io/gguf/GgufModelMetadata.kt:179
private fun Map<String, Any?>.getInt(vararg keys: String): Int? {
for (key in keys) {
val value = this[key]
when (value) {
is Number -> return value.toInt() // ← UInt/ULong fall through
is String -> value.toIntOrNull()?.let { return it }
}
}
return null
}
getIntList (line 190) has the same bug for the list-of-numbers case.
Reproduction
val md = GgufModelMetadata.from(mapOf(
"general.architecture" to "llama",
"llama.context_length" to 8192u, // UInt — what the reader actually emits
"llama.embedding_length" to 4096u,
"llama.block_count" to 32u
))
md.contextLength // null — expected 8192
md.embeddingLength // null — expected 4096
md.layerCount // null — expected 32
The existing GgufModelMetadataTokenizerTest only uses Int literals, which is why this never tripped a test.
Impact
- Anyone calling
GgufModelMetadata.from(reader) on a real-world GGUF gets a GgufModelMetadata with most numeric fields null.
- Same idiom is repeated downstream — e.g. SKaiNET-transformers
UnifiedModelLoader.peek had to introduce a local workaround. Every consumer of reader.fields is exposed to the same trap.
Proposed fix (target: hotfix/0.22.2)
- Add public top-level extensions on
Map<String, Any?> in sk.ainet.io.gguf (new file, e.g. GgufFieldAccessors.kt):
getInt(vararg keys: String): Int?
getLong(vararg keys: String): Long?
getString(vararg keys: String): String?
getIntList(vararg keys: String): List<Int>?
getStringList(vararg keys: String): List<String>?
The numeric ones handle Int/UInt/Long/ULong/Short/UShort/Byte/UByte/String.
- Delete the buggy private helpers in
GgufModelMetadata.kt and route through the new public ones.
- Add a regression test that drives
GgufModelMetadata.from with UInt and ULong values (both list and scalar).
- Bump
VERSION_NAME to 0.22.2.
This is non-breaking — only adds new public API and fixes existing methods to return correct values.
Notes
- Downstream stopgap already in SKaiNET-transformers
develop (UnifiedModelLoader.toIntValue); it can be removed once consumers can adopt 0.22.2.
Summary
GgufModelMetadata.from()and any code that reaches intoreader.fieldswith the idiom(value as? Number)?.toInt()silently returnsnullfor GGUF metadata stored asuint32/uint64. In Kotlin, the unsigned types (UInt,ULong,UShort,UByte) do not extendkotlin.Number, so theas? Numbercast yieldsnull.Modern GGUF files (anything produced by recent llama.cpp converters) store dimensions and counts as
uint32. The result:contextLength,embeddingLength,headCount,layerCount,vocabSize(fallback),bosTokenId,eosTokenId, etc. are all populated asnullinstead of the real values, and the model loader falls back to defaults (e.g.blockCount = 0→ a transformer with zero layers).Where
skainet-io/skainet-io-gguf/src/commonMain/kotlin/sk/ainet/io/gguf/GgufModelMetadata.kt:179getIntList(line 190) has the same bug for the list-of-numbers case.Reproduction
The existing
GgufModelMetadataTokenizerTestonly usesIntliterals, which is why this never tripped a test.Impact
GgufModelMetadata.from(reader)on a real-world GGUF gets aGgufModelMetadatawith most numeric fieldsnull.UnifiedModelLoader.peekhad to introduce a local workaround. Every consumer ofreader.fieldsis exposed to the same trap.Proposed fix (target:
hotfix/0.22.2)Map<String, Any?>insk.ainet.io.gguf(new file, e.g.GgufFieldAccessors.kt):getInt(vararg keys: String): Int?getLong(vararg keys: String): Long?getString(vararg keys: String): String?getIntList(vararg keys: String): List<Int>?getStringList(vararg keys: String): List<String>?The numeric ones handle
Int/UInt/Long/ULong/Short/UShort/Byte/UByte/String.GgufModelMetadata.ktand route through the new public ones.GgufModelMetadata.fromwithUIntandULongvalues (both list and scalar).VERSION_NAMEto0.22.2.This is non-breaking — only adds new public API and fixes existing methods to return correct values.
Notes
develop(UnifiedModelLoader.toIntValue); it can be removed once consumers can adopt 0.22.2.