# Quick Start Guide



## Basics

In [1]:
import mlx.core as mx
a = mx.array([1, 2, 3,4])
a.shape

(4,)

In [2]:
a.dtype

mlx.core.int32

In [3]:
b = mx.array([1.0,2.0,3.0,4.0])
b.dtype

mlx.core.float32

Operations in MLX are lazy. The outputs of MLX are not computed until they are needed. To force an array to be evalated use `eval()`. Arrays will automatically be evaluated in few cases. For example, inspecting a scalar with `array.item()`, printing an array or converting an array from  `array.numpy.ndarray` all automatically evaluate the array.

In [7]:
c = a+b  # c is not evaluated yet
mx.eval(c)  # c is evaluated now
print(c)
# or 

c = a+b # c is not evaluated yet
print(c) # c is evaluated now

# or

c = a+b # c is not evaluated yet
import numpy as np
np.array(c) # c is evaluated now
print(c) # c is evaluated now

array([2, 4, 6, 8], dtype=float32)
array([2, 4, 6, 8], dtype=float32)
array([2, 4, 6, 8], dtype=float32)


## Function and Graph Transformations

MLX has standard function transformations like `grad()` and `vmap()`. Transformations can be composed arbitrarily. For example `grad(vmap(grad(fn)))` is allowed

In [8]:
x = mx.array(0.0)
mx.sin(x)


array(0, dtype=float32)

In [11]:
mx.grad(mx.sin)(x)

array(1, dtype=float32)

In [12]:
mx.grad(mx.grad(mx.sin))(x)

array(-0, dtype=float32)

where `mlx.core.grad` returns a function which computes the gradient of `fun`


`mlx.core.grad(fun:function, argnums: Optional) -> function`


# Lazy Evaluation

## Why Lazy Evaluation?

When we perform operations in MLX, no computation actually happens. Instead a compute graph is recorded. The actual computation only happens if an `eval()` is performed.

MLX uses lazy evaluation because it has some nice features:

## Transforming Computer Graphs

Lazy evaluation let us record a compute graph without actually doing any computations. This is useful for function transformations like `grad()` and `vmap()` and graph optimizations.

Currently, MLX does not compile and rerun compute graphs. They are all generated dynamically. However, lazy evaluation makes it much easier to integrate compilation for future performance enhancements.

## Only Compute What you use!

In MLX you do not need to worry much about computing outputs that are never used. Similarly, lazy evaluation can be beneficial for saving memory while keeping the code simple. Say you have a very large model `Model` derived from `mlx.nn.Module`. You can instantiate this model with `model = Model()`. Typically, this will intialize all of the weights as `float32`, but the initialization does not actully computer anything until you perform an `eval()`. If you update the model with `float16` weights, your maximum memory consumend will be half that required.

## When to Evaluate?

A common question is when to use `eval()`. The trade-off is between letting graphs get too large and not batching enough useful work.
For example

In [15]:
for _ in range(100):
    a = a+b
    mx.eval(a)
    b = b*2
    mx.eval(b)

This is a bad idea because there is some fixed overhead with each graph evaluation. On the other hand, there is some slight overhead which grows with the compute graph size, so extremely large graphs (while computationally correct) can be costly.

Luckily, a wide range of compute graph sizes work pretty well with MLX: anything from a few tens of operations to many thousands of operations per evaluation should be okay.

Most numerical computations have an iterative outer loop (e.g. the iteration in stochastic gradient descent). A natural and usually efficient place to use eval() is at each iteration of this outer loop.