Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix dialect #180

Merged
merged 385 commits into from
Mar 29, 2023
Merged

Matrix dialect #180

merged 385 commits into from
Mar 29, 2023

Conversation

NeuralCoder3
Copy link
Collaborator

A simple matrix dialect.
It exposes a matrix type Mat: Π [n: .Nat, S: «n; .Nat», T: *] -> * such that
Mat (n, s, T) is an n-dimensional tensor with size s_0 * ... * s_{n-1}.

The matrix operations are:

  • shape -- returns the size along the ith dimension
  • constMat -- returns a new matrix
  • read -- reads an entry of a matrix
  • insert -- replaces an entry of a matrix
  • init -- creates a matrix but does not initializes entries
  • prod -- computes the matrix-matrix product of two two-dimensional matrices (over a floating point type)
  • transpose -- transposes a two-dimensional matrix
  • sum -- sums up all entries of a matrix
  • mapReduce -- performs an arbitrary mapping and reduction operation (see below)

All operations are inside the memory monad, allowing for a matrix implementation involving side effects.
In fact, the current matrices are nested pointers to arrays that are manipulated in-place.

An alternative might be a immutable array implementation like skew binary random access list or one array implementation from haskell (e.g. diff arrays)

MapReduce

mapReduce is inspired by the einstein sum notation and implementations like

  • Tensorflow / XLA: einsum
  • Pytorch: einsum
  • NumPy: einsum
  • Halide
  • Haskell: Tensor DSL
  • Ricci Calculus
  • Einstein Notation
  • Pytorch DSL

It takes m matrices, a zero, and a combination function.
The combination function takes the accumulated (initially zero) and elements from the input matrices and returns the new accumulator.
The result is a matrix.

Pseudocode:

out_matrix = init
for output_indices:
  acc = zero
  for input_indices:
    element_[0..m] = read(matrix[0..m], indices)
    acc = f (acc, elements)
  insert (out_matrix, output_indices, acc)
return out_matrix

Optimization Pipeline

The matrix operations and type are translated using a staging approach that allows intercepting the process at different levels.

High-Level Rewrites

First, high-level operations like transpose, sum, and prod are rewritten into the mapReduce form.
To do so, pre-defined functions of the form internal_mapRed_matrix_[name] are looked up. The functions should agree on the type with the corresponding axiom.

High-Level Externalization

Alternatively, certain operations like prod could be dispatched to external libraries like blas.
This is however not implemented in the current version.

Medium-Level Lowering

The next step is to lower mapReduce to affine for loops.
The conceptual idea corresponds to the pseudocode above.

Low-Level Lowering

The last step is to eliminate all remnants of the matrix dialect.
We remove the remaining internal_mapRed_ functions (due to a missing association dialect).

Afterward, we lower the low-level matrix operations and types.

  • The matrix type is replaced by a pointer to n nested arrays.
  • init is replaced with alloc
  • read becomes lea+load
  • insert becomes lea+store
  • constMat becomes alloc+pack+store

Low-Level Functional Lowering

We could lower the matrix to a functional array representation like Haskell arrays or random access lists at this point.

Additional Operations

One could implement further operations either deeply or shallowly:

  • parallel versions of other operations
  • a specialized map
  • a fold (functional speak for reduce)
  • zipWith (a map on two matrices)

Known Issues

Edge cases like zero inputs or outputs are not handled correctly in every case for mapReduce.

@NeuralCoder3
Copy link
Collaborator Author

NeuralCoder3 commented Mar 17, 2023

The current issue is in lower_matrix_mediumlevel.cpp : counting_for.
Specifically, the computation of the accumulator type fails.
This probably is caused by uninitialized components in the acc generated in line 241.

Fixed in fd73b1e

@NeuralCoder3 NeuralCoder3 marked this pull request as ready for review March 20, 2023 12:42
Copy link
Member

@leissa leissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool :)

dialects/affine/affine.h Show resolved Hide resolved
dialects/core/be/ll/ll.cpp Outdated Show resolved Hide resolved
dialects/core/be/ll/ll.cpp Outdated Show resolved Hide resolved
dialects/matrix/matrix.h Show resolved Hide resolved
dialects/matrix/matrix.thorin Outdated Show resolved Hide resolved
dialects/matrix/passes/lower_matrix_mediumlevel.cpp Outdated Show resolved Hide resolved
dialects/matrix/passes/lower_matrix_mediumlevel.cpp Outdated Show resolved Hide resolved
dialects/matrix/passes/lower_matrix_mediumlevel.h Outdated Show resolved Hide resolved
@leissa
Copy link
Member

leissa commented Mar 27, 2023

Side note: I need to adjust my email settings. Sometimes I only see that you tagged me as a reviewer days later ... Sorry, for that.

@leissa leissa merged commit 905cf5a into AnyDSL:master Mar 29, 2023
@leissa leissa deleted the matrix_dialect branch March 29, 2023 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants