---
title: '2: Basic Types'
category: '1: Setup'
usage: Some basic types to get you started with Mojo
---

# Basic Types
_This is in very early stages and under heavy development_

## PythonObject
To understand how Mojo can interact with the Python ecosystem, and why Mojo can do certain things so much faster, let's start by running code through the Python interpreter to get a [PythonObject](https://docs.modular.com/mojo/MojoPython/PythonObject.html) back:

In [1]:
x = Python.evaluate('5 + 10')
print(x)

15


`x` is represented in memory the same way as if we ran this in Python:

In [2]:
%%python
x = 5 + 10
print(x)

15


_in the Mojo playground, using `%%python` will run straight through the interpreter_

`x` is actually a pointer to `heap` allocated memory.

::: tip CS Fundamentals
`stack` and `heap` memory are really important concepts to understand, [this YouTube video](https://www.youtube.com/watch?v=_8-ht2AKyH4) does a fantastic job of explaining it visually. 

If the video doesn't make sense, for now you can use the mental model that:

- `stack` memory is very fast but small, the size of the values must be known at runtime
- `pointer` is an address to lookup the value somewhere else in memory
- `heap` memory is huge and the size can change at runtime, but needs a pointer to access the data which is slow

We'll be revisiting these concepts a lot, don't worry if it's not clicking yet.
:::

We can access all the Python keywords by importing `builtins`:

In [3]:
let py = Python.import_module("builtins")

py.print("using python keywords")

using python keywords


We can now use the `type` builtin from Python to see what the dynamic type of `x` was:

In [4]:
py.print(py.type(x))

<class 'int'>


We can also read the address that is stored on the `stack` which allows us to read memory on the `heap` with the Python `id` builtin:

In [5]:
py.print(py.id(x))

140659155257632


This is pointing to a C object in Python, and Mojo behaves the same when using a `PythonObject`, accessing the value actually uses the address to lookup the data on the `heap` which comes with a performance cost. 

This is a simplified representation of how the `C Object` being pointed to would look if it were a Python dict:

In [6]:
%%python
heap = {
    44601345678945: {
        "type": "int",
        "ref_count": 1,
        "size": 1,
        "digit": 8,
        #...
    }
    #...
}

On the stack `x` could be represented like:

In [7]:
%%python
[
    {"frame": "main", "x": 44601345678945 }
]

`x` contains an address that is pointing to the heap object

In Python we can change the type like:

In [8]:
x = "mojo"

The object in C will change its representation:

In [9]:
%%python
heap = {
    "a": {
        "type": "string",
        "ref_count": 1,
        "size": 4,
        "ascii": True,
        # utf-8 / ascii for "mojo"
        "value": [109, 111, 106, 111]
        # ...
    }
}

Mojo gives us the power to do this with a `PythonObject` as well, it works the exact same way as it would in a Python program.

This allows the program to do nice convenient things for us
- once the `ref_count` goes to zero it will be de-allocated from the heap during garbage collection, so the OS can use that memory for something else
- an integer can grow beyond 64 bits by increasing `size`
- we can dynamically change the `type`
- the data can be large or small, we don't have to think about when we should allocate to the heap

However this also comes with a penalty, there is a lot of extra memory being used for the extra fields, and it also takes CPU instructions to allocate the data, retrieve it, garbage collect etc.

In Mojo we can remove all that overhead:

## Mojo 🔥

In [10]:
x = 5 + 10
print(x)

15


We've just unlocked our first Mojo optimization! Instead of looking up an object on the heap via an address, `x` is now just a value on the stack with 64 bits that can be passed through registers.

This has numerous performance implications:

- All the expensive allocation, garbage collection, and indirection is no longer required
- The compiler can do huge optimizations when it knows what the numeric type is
- The value can be passed straight into registers for mathematical operations
- There is no overhead associated with compiling to bytecode and running through an interpreter
- The data can now be packed into a vector for huge performance gains

That last one is very important in today's world, let's see how Mojo gives us the power to take advantage of modern hardware.

## SIMD

SIMD stands for `Single Instruction, Multiple Data`, hardware now contains special registers that allow you do the same operation in a single instruction, greatly improving performance, let's take a look:

In [11]:
from DType import DType

y = SIMD[DType.uint8, 4](1, 2, 3, 4)
print(y)

[1, 2, 3, 4]


In the definition `[DType.uint8, 4]` are known as parameters which means they're compile-time known, while `(1, 2, 3, 4)` are the arguments which can be compile-time or runtime known.

This is now a vector of 8 bit numbers that are packed into 32 bits, we can perform a single instruction across all of it instead of 4 separate instructions:

In [12]:
y *= 10
print(y)

[10, 20, 30, 40]


::: tip CS Fundamentals
Binary is how your computer stores memory, with each bit representing a `0` or `1`. Memory is typically byte-addressable, meaning that each unique memory address points to one byte, which consists of 8 bits.

This is how the first 4 digits in a `uint8` are represented in hardware:

- 1 = `00000001`
- 2 = `00000010`
- 3 = `00000011`
- 4 = `00000100`

In RAM, binary `1` and `0` represent charged or uncharged capacitors, indicating ON or OFF states.

[Check this video](https://www.youtube.com/watch?v=RrJXLdv1i74) if you want more information on binary.
:::

We're packing the data together with SIMD on the heap so it can be passed a register like this:

`00000001` `00000010` `00000011` `00000100`

The SIMD register in modern CPU's is huge, let's see how big our SIMD register is in the playground:

In [13]:
from TargetInfo import simd_bit_width
print(simd_bit_width())

512


That means we could pack 64 x 8bit numbers together and perform a calculation on all of it with a single instruction.

## Scalars

Scalar just means a single value, you'll notice in Mojo all the numerics are SIMD scalars:

In [14]:
var x = UInt8(1)
x = "will cause an error"

error: [0;1;31m[1mExpression [14]:20:9: [0m[1mcannot implicitly convert 'StringLiteral' value to 'SIMD[ui8, 1]' in assignment
[0m    x = "will cause an error"
[0;1;32m        ^~~~~~~~~~~~~~~~~~~~~
[0m[0m


UInt8 is just an alias for `SIMD[DType.uint8, 1]`, you can see all the [numeric SIMD types imported by default here](https://docs.modular.com/mojo/MojoStdlib/SIMD.html)

Also notice when we try and change the type it throws an error, this is because Mojo is `strongly typed`

If we use existing Python modules, it will give us back a `PythonObject` that behaves the same `loosely typed` way as it does in Python:

In [15]:
np = Python.import_module("numpy")

arr = np.ndarray([5])
print(arr)
arr = "this will work fine"
print(arr)

[0.   0.25 0.5  0.75 1.  ]
this will work fine


## Strings
In Mojo the heap allocated string isn't imported by default:

In [16]:
from String import String

s = String("Mojo🔥")
print(s)

Mojo🔥


`String` is actually a pointer to `heap` allocated data, this means we can load a huge amount of data into it, and change the size of the data dynamically during runtime.

Let's cause a type error so you can see the data type underlying the String:

In [17]:
x = s.buffer
x = 20

error: [0;1;31m[1mExpression [17]:22:10: [0m[1mcannot implicitly convert 'DynamicVector[SIMD[si8, 1]]' value to 'PythonObject' in assignment
[0m    x = s.buffer
[0;1;32m        ~^~~~~~~
[0m[0m


`DynamicVector` is similar to a Python list, here it's storing `int8` that represent the characters, let's print the first character:

In [18]:
print(s[0])

M


Now lets take a look at the decimal representation:

In [19]:
from String import ord

print(ord(s[0]))

77


That's the ASCII code [shown in this table](https://www.ascii-code.com/)

We can build our own string this way, we can put in 78 which is N and 79 which is O

In [20]:
from Vector import DynamicVector

let vec = DynamicVector[Int8](2)

vec.push_back(78)
vec.push_back(79)

We can use a `StringRef` to get a pointer to the same location in memory:

In [21]:
from Pointer import DTypePointer
from DType import DType

let vec_str_ref = StringRef(DTypePointer[DType.int8](vec.data).address, vec.size)
print(vec_str_ref)

NO


Because it points to the same location in `heap` memory, changing the original vector will also change the value retrieved by the reference:

In [22]:
vec[1] = 78
print(vec_str_ref)

NN


Create a `deep copy` of the String and allocate it to the heap:

In [23]:
from String import String
let vec_str = String(vec_str_ref)

print(vec_str)

NN


Now we've made a copy of the data to a new location in `heap` memory, we can modify the original and it won't effect our copy:

In [24]:
vec[0] = 65
vec[1] = 65
print(vec_str)

NN


The other string type is a `StringLiteral`, it's written directly into the binary, when the program starts it's loaded into `read-only` memory, which means it's constant and lives for the duration of the program:

In [25]:
let lit = "This is my StringLiteral"
print(lit)

This is my StringLiteral


Or a heap allocated deep copy of the data:

In [26]:
var lit_ref = StringRef(lit)
print(lit_ref)

## Tips

One thing to be aware of is that an emoji is actually four bytes, so we need a slice of 4 to have it print correctly:

In [27]:
emoji = String("🔥😀")
print("fire:", emoji[0:4])
print("smiley:", emoji[4:8])

fire: 🔥
smiley: 😀


Check out [Maxim Zaks Blog post](https://mzaks.medium.com/counting-chars-with-simd-in-mojo-140ee730bd4d) for more details.

You can also initialize SIMD with a single argument:

In [28]:
z = SIMD[DType.uint8, 4](1)
print(z)

[1, 1, 1, 1]


Or do it in a loop:

In [29]:
for i in range(3):
    print(SIMD[DType.uint16, 4](i))

[0, 0, 0, 0]
[1, 1, 1, 1]
[2, 2, 2, 2]


## Exercises
1. Create a SIMD of DType UInt8, 16 bytes wide and each value at 2, then multiply it by 8 and print it
2. Create a loop using SIMD that prints four rows of data that looks like this:
    [1,0,0,0]
    [0,1,0,0]
    [0,0,1,0]
    [0,0,0,1]

## Solutions
### Exercise 1

In [30]:
print(SIMD[DType.uint8, 16](2) * 8)

[16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16]


### Exercise 2

In [31]:
for i in range(4):
    simd = SIMD[DType.uint8, 4](0)
    simd[i] = 1
    print(simd)

[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 1, 0]
[0, 0, 0, 1]
