<a href="https://colab.research.google.com/github/Oceanman15/mojo-playground/blob/main/Modular_Mojo_Colab_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install max --index-url https://dl.modular.com/public/nightly/python/simple/

Looking in indexes: https://dl.modular.com/public/nightly/python/simple/
Collecting max
  Downloading https://dl.modular.com/public/nightly/python/max-25.4.0-py3-none-manylinux_2_34_x86_64.whl (285.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m285.0/285.0 MB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: max
Successfully installed max-25.4.0


This import allows Mojo code to be built and run via a notebook cell:

In [2]:
import max.support.notebook

The following is the basic Mojo GPU vector addition example, and will run on a GPU-enabled Colab session. This should work for the T4, L4, and A100 instances on Colab.

In [5]:
%%mojo
from gpu import thread_idx, block_idx, warp
from gpu.host import DeviceContext
from layout import Layout, LayoutTensor
from math import iota

# aliases for dtype, blocks and threads per block:
alias dtype = DType.float32
alias threads = 4
alias blocks = 8
alias element_in = blocks * threads

def main():
    var ctx = DeviceContext()

    # initialise input and output buffers
    var in_buffer = ctx.enqueue_create_buffer[dtype](element_in)
    var out_buffer = ctx.enqueue_create_buffer[dtype](blocks)

    # set input and output buffers to right values
    with in_buffer.map_to_host() as bufferio:
        iota(bufferio.unsafe_ptr(), element_in)

    var _ = out_buffer.enqueue_fill(0)

    # layoutTensor creation
    # input
    alias layout = Layout.row_major(blocks, threads)
    # essential to create InTensor type which can be registered by the kernel
    # later
    alias InTensor = LayoutTensor[dtype, layout, MutableAnyOrigin]
    var in_tensor = InTensor(in_buffer)

    alias out_layout = Layout.row_major(blocks)
    # essential to create OutTensor type which can be registered by the kernel
    # later
    alias OutTensor = LayoutTensor[dtype, out_layout, MutableAnyOrigin]
    var out_tensor = OutTensor(out_buffer)
    # kernel with input and output layouttensors as arguments
    # lesson learnt, you need to create the correct tensor type for your kernel
    # with alias as well. That is why the mojo example has the extra alias for
    # In_tensor and Out_tensor.
    fn reduce_sum(in_tensor: InTensor, out_tensor: OutTensor):
        var value = in_tensor.load[1](block_idx.x, thread_idx.x)
        value = warp.sum(value)
        if thread_idx.x == 0:
            out_tensor[block_idx.x] = value


    ctx.enqueue_function[reduce_sum](
        in_tensor,
        out_tensor,
        grid_dim=blocks,
        block_dim=threads,
    )

    with out_buffer.map_to_host() as host_buffer:
        print(host_buffer)



HostBuffer([6.0, 22.0, 38.0, 54.0, 70.0, 86.0, 102.0, 118.0])



The next cell is the calculation of the Mandelbrot set, again running on GPU in Colab:

In [4]:
%%mojo


UsageError: %%mojo is a cell magic, but the cell body is empty.
