# Tutorial 1: Tensor Basics and Operations

This notebook introduces the `Tensor` type, which is the foundational data structure for all neural network operations in this project.

## What is a Tensor?

A tensor is a multi-dimensional array - think of it as a generalization of:
- **Scalar** (0D): A single number
- **Vector** (1D): An array of numbers `[1, 2, 3]`
- **Matrix** (2D): A grid of numbers
- **3D+ Tensor**: Higher-dimensional grids

In deep learning, tensors represent:
- **Inputs**: Text, images, audio (as numbers)
- **Weights**: Learned parameters of the model
- **Activations**: Intermediate computations
- **Gradients**: How to update weights

## Setup

First, let's import the local-code-model package:

In [None]:
import (
    "fmt"
    "github.com/scttfrdmn/local-code-model"
)

// Alias for convenience
type Tensor = main.Tensor

## Creating Tensors

### 1. From a Slice (Most Common)

In [None]:
// Create a 1D tensor (vector)
vec := main.NewTensor([]int{3}, []float64{1.0, 2.0, 3.0})
fmt.Printf("Vector shape: %v\n", vec.Shape())
fmt.Printf("Vector data: %v\n\n", vec.Data())

// Create a 2D tensor (matrix)
matrix := main.NewTensor([]int{2, 3}, []float64{
    1.0, 2.0, 3.0,  // Row 0
    4.0, 5.0, 6.0,  // Row 1
})
fmt.Printf("Matrix shape: %v\n", matrix.Shape())
fmt.Printf("Matrix data: %v\n", matrix.Data())

### 2. Special Constructors

In [None]:
// All zeros
zeros := main.Zeros(2, 3)
fmt.Printf("Zeros (2x3):\n%v\n\n", zeros.Data())

// All ones
ones := main.Ones(2, 3)
fmt.Printf("Ones (2x3):\n%v\n\n", ones.Data())

// Random values (Xavier initialization)
random := main.RandN(2, 3)
fmt.Printf("Random (2x3):\n%v\n", random.Data())

## Tensor Shape and Indexing

Understanding tensor shape is crucial:

In [None]:
// Create a 3D tensor: (batch_size=2, sequence_length=3, embedding_dim=4)
embeddings := main.RandN(2, 3, 4)

fmt.Printf("Shape: %v\n", embeddings.Shape())
fmt.Printf("Dimensions: %d\n", len(embeddings.Shape()))
fmt.Printf("Total elements: %d\n\n", len(embeddings.Data()))

// Accessing elements
val := embeddings.At(0, 1, 2)  // batch 0, position 1, dimension 2
fmt.Printf("Element at [0,1,2]: %.4f\n", val)

## Core Operations

### Matrix Multiplication (MatMul)

The most important operation in neural networks:

In [None]:
// Matrix A: 2x3
A := main.NewTensor([]int{2, 3}, []float64{
    1, 2, 3,
    4, 5, 6,
})

// Matrix B: 3x2
B := main.NewTensor([]int{3, 2}, []float64{
    1, 2,
    3, 4,
    5, 6,
})

// C = A @ B (result: 2x2)
C := A.MatMul(B)

fmt.Printf("A (%v) @ B (%v) = C (%v)\n", A.Shape(), B.Shape(), C.Shape())
fmt.Printf("C =\n")
for i := 0; i < 2; i++ {
    for j := 0; j < 2; j++ {
        fmt.Printf("  %.0f", C.At(i, j))
    }
    fmt.Println()
}

### Element-wise Operations

In [None]:
x := main.NewTensor([]int{2, 2}, []float64{1, 2, 3, 4})
y := main.NewTensor([]int{2, 2}, []float64{10, 20, 30, 40})

// Addition
sum := x.Add(y)
fmt.Printf("x + y = %v\n", sum.Data())

// Subtraction
diff := x.Sub(y)
fmt.Printf("x - y = %v\n", diff.Data())

// Element-wise multiplication (Hadamard product)
prod := x.Mul(y)
fmt.Printf("x * y = %v\n", prod.Data())

// Scalar multiplication
scaled := x.Scale(5.0)
fmt.Printf("x * 5 = %v\n", scaled.Data())

### Broadcasting

Operations work across different shapes:

In [None]:
// Matrix + Vector (broadcasts vector across rows)
matrix := main.NewTensor([]int{2, 3}, []float64{1, 2, 3, 4, 5, 6})
vec := main.NewTensor([]int{3}, []float64{10, 20, 30})

result := matrix.Add(vec)
fmt.Printf("Matrix shape: %v\n", matrix.Shape())
fmt.Printf("Vector shape: %v\n", vec.Shape())
fmt.Printf("Result shape: %v\n", result.Shape())
fmt.Printf("Result: %v\n", result.Data())

## Activation Functions

Non-linear transformations that make neural networks powerful:

In [None]:
x := main.NewTensor([]int{5}, []float64{-2.0, -1.0, 0.0, 1.0, 2.0})

// ReLU: max(0, x)
relu := x.ReLU()
fmt.Printf("ReLU(%v) = %v\n", x.Data(), relu.Data())

// GELU: Gaussian Error Linear Unit (smoother than ReLU)
gelu := x.GELU()
fmt.Printf("GELU(%v) = %v\n", x.Data(), gelu.Data())

// Softmax: converts to probability distribution
logits := main.NewTensor([]int{3}, []float64{1.0, 2.0, 3.0})
probs := logits.Softmax(0)
fmt.Printf("Softmax(%v) = %v\n", logits.Data(), probs.Data())
fmt.Printf("Sum of probabilities: %.4f\n", probs.Sum())

## Reshaping and Transposing

In [None]:
// Create 1D tensor
flat := main.NewTensor([]int{12}, []float64{
    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
})

// Reshape to 2D
mat := flat.Reshape(3, 4)
fmt.Printf("Reshaped to %v:\n", mat.Shape())
for i := 0; i < 3; i++ {
    for j := 0; j < 4; j++ {
        fmt.Printf("%5.0f", mat.At(i, j))
    }
    fmt.Println()
}

// Transpose (swap dimensions)
transposed := mat.Transpose(0, 1)
fmt.Printf("\nTransposed to %v:\n", transposed.Shape())
for i := 0; i < 4; i++ {
    for j := 0; j < 3; j++ {
        fmt.Printf("%5.0f", transposed.At(i, j))
    }
    fmt.Println()
}

## Aggregation Operations

In [None]:
data := main.NewTensor([]int{2, 3}, []float64{
    1, 2, 3,
    4, 5, 6,
})

// Sum all elements
total := data.Sum()
fmt.Printf("Sum: %.0f\n", total)

// Mean
avg := data.Mean()
fmt.Printf("Mean: %.2f\n", avg)

// Variance and standard deviation
variance := data.Variance()
stddev := data.StdDev()
fmt.Printf("Variance: %.4f\n", variance)
fmt.Printf("Std Dev: %.4f\n", stddev)

## Key Takeaways

1. **Tensors are multi-dimensional arrays** - the foundation of deep learning
2. **Shape matters** - operations require compatible shapes
3. **MatMul is king** - most neural network computation is matrix multiplication
4. **Activations add non-linearity** - without them, networks are just linear functions
5. **Reshaping is common** - data flows through different shapes in a network

## Next Steps

Now that you understand tensors, you're ready to:
- **Notebook 2**: Build the attention mechanism from scratch
- **Notebook 3**: Train a complete transformer model

## Exercise

Try creating a simple linear layer: `y = x @ W + b`

Where:
- `x` is input (batch_size=2, input_dim=3)
- `W` is weights (input_dim=3, output_dim=4)
- `b` is bias (output_dim=4)
- `y` is output (batch_size=2, output_dim=4)

In [None]:
// TODO: Implement linear layer
// x := main.RandN(2, 3)
// W := main.RandN(3, 4)
// b := main.RandN(4)
// y := ???