Skip to content

Commit

Permalink
Add concept of Problem (#32)
Browse files Browse the repository at this point in the history
* Use Problem as validation of consistency between assignment and formats
* Centralize code generation
* Use Problem validation before invoking code generation
* Add option to CLI to choose TACO as the tensor compiler
* Use Returns Result in more places
  • Loading branch information
drhagen authored Dec 2, 2023
1 parent dd2c73f commit 4593e68
Show file tree
Hide file tree
Showing 22 changed files with 430 additions and 212 deletions.
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ Tensors are n-dimensional generalizations of matrices. Instead of being confined

In a dense tensor, each element is explicitly stored in memory. If the vast majority of elements are zero, then this is an inefficient layout, taking far more memory to store and far more time to operate on. There are many different sparse tensor formats, each one better or worse depending on which elements of the tensor are nonzero.

Many tensor kernels (functions that perform a specific algebraic calculation between tensors with specific sparse formats) have been written to solve specific problems. Recently, the [Tensor Algebra Compiler](http://tensor-compiler.org/) (taco) was invented to automate the construction and optimization of tensor kernels for arbitrary algebraic expressions for arbitrary sparse formats. taco takes an algebraic expression and description of the format of each tensor in the expression and returns a C function that efficiently evaluates the given expression for those tensor arguments.
Many tensor kernels (functions that perform a specific algebraic calculation between tensors with specific sparse formats) have been written to solve specific problems. Recently, the [Tensor Algebra Compiler](http://tensor-compiler.org/) (TACO) was invented to automate the construction and optimization of tensor kernels for arbitrary algebraic expressions for arbitrary sparse formats. TACO takes an algebraic expression and description of the format of each tensor in the expression and returns a C function that efficiently evaluates the given expression for those tensor arguments.

Tensora is a Python wrapper around two tensor algebra compilers: the original taco library and an independent Python implementation in Tensora. Tensora has a central class `Tensor` that simply has a pointer to a taco tensor held in C memory, managed by the `cffi` package. Tensora exposes functions that take a string of an algebraic expression and return a Python function the performs that operation in fast C code. In order to do that, the string is parsed and passed to the tensor algebra compiler; the C code generated by the tensor algebra compiler is compiled "on the fly" by `cffi` and then wrapped by code that provides good error handling.
Tensora is a Python wrapper around two tensor algebra compilers: the original TACO library and an independent Python implementation in Tensora. Tensora has a central class `Tensor` that simply has a pointer to a taco tensor held in C memory, managed by the `cffi` package. Tensora exposes functions that take a string of an algebraic expression and return a Python function the performs that operation in fast C code. In order to do that, the string is parsed and passed to the tensor algebra compiler; the C code generated by the tensor algebra compiler is compiled "on the fly" by `cffi` and then wrapped by code that provides good error handling.

Tensora comes with a command line tool `tensora`, which provides the C code to the user for given algebraic expressions and tensor formats.

Expand Down Expand Up @@ -157,7 +157,7 @@ There is also `evaluate_tensora` and `evaluate_taco` that have identical interfa

## Getting the C code

The `tensora` CLI tool emits the C code for a given algebraic expression, tensor formats, and kernel type. It comes installed with Tensora and can be run like:
The `tensora` CLI tool emits the C code for a given algebraic expression, tensor formats, and kernel type. It can emit the C code it generates itself, or the C code generated by TACO. It comes installed with Tensora and can be run like:

```bash
tensora 'y(i) = A(i,j) * x(j)' -f A:ds -t compute -o kernel.c
Expand All @@ -183,6 +183,10 @@ Here is the output of `tensora --help` for reference:
│ be generated. Can be │
│ mentioned multiple times. │
│ [default: compute] │
│ --compiler -c [tensora|taco] The tensor algebra compiler │
│ to use to generate the │
│ kernel. │
│ [default: tensora] │
│ --output -o PATH The file to which the kernel │
│ will be written. If not │
│ specified, prints to standard │
Expand Down
2 changes: 1 addition & 1 deletion src/tensora/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from .compile import TensorCompiler
from .format import Format, Mode
from .function import evaluate, evaluate_taco, evaluate_tensora, tensor_method
from .generate import TensorCompiler
from .tensor import Tensor
80 changes: 46 additions & 34 deletions src/tensora/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@
from typing import Annotated, Optional

import typer
from parsita import Failure, Success
from parsita import ParseError
from returns.result import Failure, Success

from .desugar.exceptions import DiagonalAccessError, NoKernelFoundError
from .expression import parse_assignment
from .format import Format, Mode, parse_format
from .format import parse_named_format
from .generate import TensorCompiler, generate_c_code
from .kernel_type import KernelType
from .native import generate_c_code_from_parsed
from .problem import make_problem

app = typer.Typer()

Expand Down Expand Up @@ -43,6 +44,14 @@ def tensora(
help="The type of kernel that will be generated. Can be mentioned multiple times.",
),
] = [KernelType.compute],
tensor_compiler: Annotated[
TensorCompiler,
typer.Option(
"--compiler",
"-c",
help="The tensor algebra compiler to use to generate the kernel.",
),
] = TensorCompiler.tensora,
output_path: Annotated[
Optional[Path],
typer.Option(
Expand All @@ -61,48 +70,51 @@ def tensora(
case Failure(error):
typer.echo(f"Failed to parse assignment:\n{error}", err=True)
raise typer.Exit(1)
case Success(sugar):
sugar = sugar
case Success(parsed_assignment):
pass
case _:
raise NotImplementedError()

# Parse formats
parsed_formats = {}
for target_format_string in target_format_strings:
split_format = target_format_string.split(":")
if len(split_format) != 2:
typer.echo(
f"Format must be of the form 'target:format_string': {target_format_string}",
err=True,
)
raise typer.Exit(1)

target, format_string = split_format
match parse_named_format(target_format_string):
case Failure(ParseError(_) as error):
typer.echo(f"Failed to parse format:\n{error}", err=True)
raise typer.Exit(1)
case Failure(error):
typer.echo(str(error), err=True)
raise typer.Exit(1)
case Success((target, format)):
pass
case _:
raise NotImplementedError()

if target in parsed_formats:
typer.echo(f"Format for {target} was mentioned multiple times", err=True)
raise typer.Exit(1)

match parse_format(format_string):
case Failure(error):
typer.echo(f"Failed to parse format:\n{error}", err=True)
typer.Exit(1)
case Success(format):
parsed_formats[target] = format
parsed_formats[target] = format

# Fill in missing formats with dense formats
# Use the order of variable_orders to determine the parameter order
formats = {}
for variable_name, order in sugar.variable_orders().items():
if variable_name in parsed_formats:
formats[variable_name] = parsed_formats[variable_name]
else:
formats[variable_name] = Format((Mode.dense,) * order, tuple(range(order)))
# Validate and standardize assignment and formats
match make_problem(parsed_assignment, parsed_formats):
case Failure(error):
typer.echo(str(error), err=True)
raise typer.Exit(1)
case Success(problem):
pass
case _:
raise NotImplementedError()

# Generate code
try:
code = generate_c_code_from_parsed(sugar, formats, kernel_types)
except (DiagonalAccessError, NoKernelFoundError) as error:
typer.echo(error, err=True)
raise typer.Exit(1)
match generate_c_code(problem, kernel_types, tensor_compiler):
case Failure(error):
typer.echo(str(error), err=True)
raise typer.Exit(1)
case Success(code):
pass
case _:
raise NotImplementedError()

if output_path is None:
typer.echo(code)
Expand Down
92 changes: 26 additions & 66 deletions src/tensora/compile.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
__all__ = [
"taco_kernel",
"generate_library",
"allocate_taco_structure",
"taco_structure_to_cffi",
"take_ownership_of_arrays",
Expand All @@ -8,20 +8,18 @@
]

import re
import subprocess
import tempfile
import threading
from enum import Enum, auto
from pathlib import Path
from typing import Any, List, Tuple
from typing import Any
from weakref import WeakKeyDictionary

from cffi import FFI
from returns.result import Failure, Success

from .expression import deparse_to_taco
from .expression.ast import Assignment
from .format import Format
from .native import generate_c_code_from_parsed
from .generate import TensorCompiler, generate_c_code
from .kernel_type import KernelType
from .problem import Problem

lock = threading.Lock()

Expand Down Expand Up @@ -97,69 +95,31 @@
tensor_lib = tensor_cdefs.dlopen(None)


def format_to_taco_format(format: Format):
return (
"".join(mode.character for mode in format.modes)
+ ":"
+ ",".join(map(str, format.ordering))
)


class TensorCompiler(Enum):
taco = auto()
tensora = auto()
def generate_library(
problem: Problem, compiler: TensorCompiler = TensorCompiler.tensora
) -> tuple[list[str], Any]:
"""Generate source, compile it, and load it.

def taco_kernel(
expression: Assignment,
formats: dict[str, Format],
compiler: TensorCompiler = TensorCompiler.tensora,
) -> Tuple[List[str], Any]:
"""Call taco with expression and compile resulting function.
Given an expression and a set of formats:
(1) call out to taco to get the source code for the evaluate function that runs that expression for those formats
Given a problem:
(1) invoke the tensor algebra compiler to generate C code for evaluate
(2) parse the signature in the source to determine the order of arguments
(3) compile the source with cffi
(4) return the list of parameter names and the compiled library
Because compilation can take a non-trivial amount of time, the results of this function is cached by a
`functools.lru_cache`, which is configured to store the results of the 256 most recent calls to this function.
Args:
expression: An expression that can parsed by taco.
formats: A frozen set of pairs of strings. It must be a frozen set because `lru_cache` requires that the
arguments be hashable and therefore immutable. The first element of each pair is a variable name; the second
element is the format in taco format (e.g. 'dd:1,0', 'dss:0,1,2'). Scalar variables must not be listed because
taco does not understand them having a format.
problem: A valid tensor algebra expression and associated tensor formats
compiler: The tensor algebra compiler to use to generate the C code
Returns:
A tuple where the first element is the list of variable names in the order they appear in the function
signature, and the second element is the compiled FFILibrary which has a single method `evaluate` which expects
cffi pointers to taco_tensor_t instances in order specified by the list of variable names.
"""
match compiler:
case TensorCompiler.taco:
expression_string = deparse_to_taco(expression)
format_strings = frozenset(
(parameter_name, format_to_taco_format(format))
for parameter_name, format in formats.items()
if format.order != 0 # Taco does not like formats for scalars
)
# Call taco to write the kernels to standard out
result = subprocess.run(
[taco_binary, expression_string, "-print-evaluate", "-print-nocolor"]
+ [f"-f={name}:{format}" for name, format in format_strings],
capture_output=True,
text=True,
)

if result.returncode != 0:
raise RuntimeError(result.stderr)

source = result.stdout
case TensorCompiler.tensora:
source = generate_c_code_from_parsed(expression, formats)
match generate_c_code(problem, [KernelType.evaluate], compiler):
case Failure(error):
raise error
case Success(source):
pass

# Determine signature
# 1) Find function by name and capture its parameter list
Expand Down Expand Up @@ -197,7 +157,7 @@ def taco_kernel(


def allocate_taco_structure(
mode_types: Tuple[int, ...], dimensions: Tuple[int, ...], mode_ordering: Tuple[int, ...]
mode_types: tuple[int, ...], dimensions: tuple[int, ...], mode_ordering: tuple[int, ...]
):
"""Allocate all parts of a taco tensor except growable arrays.
Expand Down Expand Up @@ -285,12 +245,12 @@ def allocate_taco_structure(


def taco_structure_to_cffi(
indices: List[List[List[int]]],
vals: List[float],
indices: list[list[list[int]]],
vals: list[float],
*,
mode_types: Tuple[int, ...],
dimensions: Tuple[int, ...],
mode_ordering: Tuple[int, ...],
mode_types: tuple[int, ...],
dimensions: tuple[int, ...],
mode_ordering: tuple[int, ...],
):
"""Build a cffi taco tensor from Python data.
Expand Down Expand Up @@ -485,5 +445,5 @@ def take_ownership_of_tensor(cffi_tensor) -> None:
take_ownership_of_tensor_members(cffi_tensor)


def weakly_increasing(list: List[int]):
def weakly_increasing(list: list[int]):
return all(x <= y for x, y in zip(list, list[1:]))
21 changes: 14 additions & 7 deletions src/tensora/desugar/best_algorithm.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
__all__ = ["best_algorithm"]

from returns.result import Failure, Result, Success

from ..format import Format
from ..iteration_graph.iteration_graph import IterationGraph
from . import ast
from .exceptions import NoKernelFoundError
from .exceptions import DiagonalAccessError, NoKernelFoundError
from .to_iteration_graphs import to_iteration_graphs


def best_algorithm(assignment: ast.Assignment, formats: dict[str, Format]) -> IterationGraph:
match next(to_iteration_graphs(assignment, formats), None):
case None:
raise NoKernelFoundError()
case graph:
return graph
def best_algorithm(
assignment: ast.Assignment, formats: dict[str, Format | None]
) -> Result[IterationGraph, DiagonalAccessError | NoKernelFoundError]:
try:
match next(to_iteration_graphs(assignment, formats), None):
case None:
return Failure(NoKernelFoundError())
case graph:
return Success(graph)
except DiagonalAccessError as e:
return Failure(e)
2 changes: 1 addition & 1 deletion src/tensora/desugar/to_iteration_graphs.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ def merge_assignment(


def to_iteration_graphs(
assignment: ast.Assignment, formats: dict[str, Format]
assignment: ast.Assignment, formats: dict[str, Format | None]
) -> Iterator[ig.IterationGraph]:
output_format = formats[assignment.target.name]
output_layers = {
Expand Down
2 changes: 1 addition & 1 deletion src/tensora/expression/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from . import ast
from .deparse_to_taco import deparse_to_taco
from .exceptions import InconsistentVariableSizeError, MutatingAssignmentError
from .exceptions import InconsistentDimensionsError, MutatingAssignmentError
from .parser import parse_assignment
4 changes: 2 additions & 2 deletions src/tensora/expression/ast.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ class Assignment:
expression: Expression

def __post_init__(self):
from .exceptions import InconsistentVariableSizeError, MutatingAssignmentError
from .exceptions import InconsistentDimensionsError, MutatingAssignmentError

target_name = self.target.name

Expand All @@ -215,7 +215,7 @@ def __post_init__(self):

for variable in rest:
if first.order != variable.order:
raise InconsistentVariableSizeError(self, first, variable)
raise InconsistentDimensionsError(self, first, variable)

variable_orders[name] = first.order

Expand Down
4 changes: 2 additions & 2 deletions src/tensora/expression/exceptions.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__all__ = ["MutatingAssignmentError", "InconsistentVariableSizeError"]
__all__ = ["MutatingAssignmentError", "InconsistentDimensionsError"]

from dataclasses import dataclass

Expand All @@ -17,7 +17,7 @@ def __str__(self):


@dataclass(frozen=True, slots=True)
class InconsistentVariableSizeError(Exception):
class InconsistentDimensionsError(Exception):
assignment: Assignment
first: Variable
second: Variable
Expand Down
Loading

0 comments on commit 4593e68

Please sign in to comment.