Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose Lagoon %real release. #45

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
190 changes: 190 additions & 0 deletions UIPS/UIP-0122.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
---
uip: "0122"
title: "Lagoon IEEE 754 Reals"
description: Support jetted array operations
author: ~lagrev-nocfep
status: Draft
type: Standards Track
category: Arvo
created: 2024-03-26
---

## Abstract

Nock and its predicated software programs are designed to facilitate computing as a legible state machine. At-scale scientific computing has never really been part of the design criteria for the Urbit ecosystem. While it's hard to imagine density functional theory quantum chemistry packages preferring Hoon, it is far easier to envision a world of machine learning and even LLMs that are built on top of or adjacent to Urbit. At their core, essentially all of the “scientific computing” approaches rely heavily on linear algebra packages: classically BLAS and LAPACK, but many others as well today. It behooves us to provide this to future developers building on Mars.

Lagoon has matured through several refactors into a developer-friendly interface. It is time to prepare and release a subset of completed array-operational work to the developer community. Since Lagoon relies on jets to operate reasonably quickly, we will prepare and release `%real`-valued operations first.

## Motivation

Lagoon (Linear AlGebra in hOON) will offer BLAS-like vector and matrix operations (like NumPy's linear algebra essentials, but not solvers). This will facilitate a common and performant interface for all developers relying on a common lagnuage of data types and operators to represent vector, matrix, and tensor data. By focusing on `%real`s first, we can vet the interface further in production and lay the groundwork for expansion in further types in subsequent releases.

## Specification

### Data Types

Lagoon defines the following data types for `%real`-valued numbers (which are IEEE 754 floats compatible with corresponding `@r` types in Hoon). (Note that we are commenting out other prospective types for which some work has been done.)

```hoon
:: /sur/lagoon
:::: Types for Lagoon compatibility
::
|%
+$ ray :: $ray: n-dimensional array
$: =meta :: descriptor
data=@ux :: data, row-major order
==
::
+$ prec [a=@ b=@] :: fixed-point precision, a+b+1=bloq
+$ meta :: $meta: metadata for a $ray
$: shape=(list @) :: list of dimension lengths
=bloq :: logarithm of bitwidth
=kind :: name of data type
fxp=(unit prec) :: fixed-point scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to comment out fxp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How finicky are state upgrades going to be with adding a value here, I guess is the question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real PITA is that we will have to s///g all the test cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea makes sense. lets keep it then.

==
::
+$ kind :: $kind: type of array scalars
$? %real :: IEEE 754 float
:: %uint :: unsigned integer
:: %int2 :: 2s-complement integer
:: %cplx :: BLAS-compatible packed floats
:: %unum :: unum/posit
:: %fixp :: fixed-precision
==
::
+$ baum :: $baum: ndray with metadata
$: =meta ::
data=ndray ::
==
::
+$ ndray :: $ndray: n-dim array as nested list
$@ @ :: single item
(list ndray) :: nonempty list of children, in row-major order
::
+$ slice (unit [(unit @) (unit @)])
--
```

The core data type is a `+$ray`, which consists of a pair of metadata `+$meta` and data as a bit-aligned atom. The metadata which we track for any particular array are:

1. `shape=(list @)`, the dimensions of an array.
2. `bloq`, the block size of an array in standard Hoon $2^n$ terms.
3. `kind`, whether the value is `%real` or otherwise. (Only `%real` will be supported in this UIP's release.)
4. `fxp=(unit prec)`, the binary-point scale of a fixed-precision number as a unit. (This will always be `~` for `%real`s.)

`data` are in CBLAS-like row-major order, with the leading value of the array in the most signficant position bitwise. A `+$ray`'s `data` atom has a leading `1` bit at the most significant bit of the array plus one. This allows us to represent leading zeroes.

There is also a `+$ndray` type corresponding to an unpacked list of the same data.

### Operators

At the current time, Lagoon defines the following operators, a handful of which are for internal library convenience. This interface is the result of several complete refactors and we do not foresee additional major alterations in the course of producing the final product.

- `++print`
- `++slog`
- `++to-tank`
- `++get-term`
- `++squeeze `
- `++submatrix`
- `++product`
- `++gather`
- `++get-item`
- `++set-item`
- `++get-row`
- `++set-row`
- `++get-col`
- `++set-col`
- `++get-bloq-offset`
- `++get-item-number`
- `++strides`
- `++get-dim`
- `++get-item-index`
- `++ravel`
- `++en-ray`
- `++de-ray`
- `++get-item-baum`
- `++fill`
- `++spac`
- `++unspac`
- `++scalar-to-ray`
- `++eye`
- `++zeros`
- `++ones`
- `++iota`
- `++magic`
- `++range`
- `++linspace`
- `++urge`
- `++scale`
- `++max`
- `++argmax`
- `++min`
- `++argmin`
- `++cumsum`
- `++prod`
- `++reshape`
- `++stack`
- `++hstack`
- `++vstack`
- `++transpose`
- `++diag`
- `++trace`
- `++dot`
- `++mmul`
- `++abs`
- `++add-scalar`
- `++sub-scalar`
- `++mul-scalar`
- `++div-scalar`
- `++mod-scalar`
- `++add`
- `++sub`
- `++mul`
- `++div`
- `++mod`
- `++pow-n`
- `++pow`
- `++exp`
- `++log`
- `++gth`
- `++gte`
- `++lth`
- `++lte`
- `++mpow-n`
- `++is-close`
- `++any`
- `++all`
- `++fun-scalar`
- `++trans-scalar`
- `++el-wise-op`
- `++bin-op`

The IEEE 754 rounding mode is a property of the operation rather than of the value(s) involved. To effect this correctly, `/lib/lagoon` implements a `++lake` wrapper in a door-like pattern to modify the core's behavior. This information propagates to gates.

### Jetting

Jets will resolve at the level of (e.g.) `++add` rather than `++bin-op`.

Jets are built on top of the [SoftBLAS](https://github.com/urbit/SoftBLAS) which implements BLAS operations on top of SoftFloat. The build process for `vere` has been modified to link SoftBLAS (in the `lagoon-jets` branch of `urbit/vere`).

Because jets need to be compiled into the `vere` binary at the current time, `/lib/lagoon` needs to ship on the `%base` desk. However, we propose a departure from past convention: jet the library in `/lib` rather than move it into `/sys`.

In the long run, as more possibilities for userspace jets become available, `/lib/lagoon` may be moved out of `%base`. However, one change in policy at a time is all that we propose now.

### Rationale for `%real`

IEEE 754 `%real`s are a well-understood, well-vetted component in the numerical ecosystem for scientific computing and Urbit's `@r` affordances.

We have comprehensive unit test coverage of `/lib/lagoon` and SoftBLAS, lending credence to verified behavior.

Taken together, these lead us to believe that introducing `%real` to the developer community first is a safe and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe and ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“You're killing me, Smalls!”


## Resources

Primary development is taking place in [`urbit/numerics`](https://github.com/urbit/numerics).

Details are available in `~mopfel-winrux/numeric-computation-and-machine-learning` and [in summary form](https://gist.github.com/sigilante/f9b0d6b5d5a7675f96415e35c0b22d95).

## Copyright

Copyright and related rights waived via [CC0](../LICENSE.md).