# Homework 3: Building an NDArray library

In this homework, you will build a simple backing library for the processing that underlies most deep learning systems: the n-dimensional array (a.k.a. the NDArray).  Up until now, you have largely been using numpy for this purpose, but this homework will walk you through developing what amounts to your own (albeit much more limited) variant of numpy, which will support both CPU and GPU backends.  What's more, unlike numpy (and even variants like PyTorch), you won't simply call out to existing highly-optimized variants of matrix multiplication or other manipulation code, but actually write your own versions that are reasonably competitive will the highly optimized code backing these standard libraries (by some measure, i.e., "only 2-3x slower" ... which is a whole lot better than naive code that can easily be 100x slower).  This class will ultimately be integrated into `needle`, but for this assignment you can _only_ focus on the ndarray module, as this will be the only subject of the tests.

**Note**: To avoid exhausting limited GPU resources in Colab, start by using CPU runtime for coding and testing non-GPU functions. Switch to GPU runtime when testing CUDA or GPU-accelerated code. This approach ensures efficient GPU usage and prevents running out of resources during critical tasks.


In [1]:
# Code to set up the assignment
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/
!mkdir -p 10714
%cd /content/drive/MyDrive/10714
%cd /content/drive/MyDrive/10714/DL-Systems-Project

!pip3 install --upgrade --no-deps git+https://github.com/dlsys10714/mugrade.git
!pip3 install pybind11

Mounted at /content/drive
/content/drive/MyDrive
/content/drive/MyDrive/10714
/content/drive/MyDrive/10714/DL-Systems-Project
Collecting git+https://github.com/dlsys10714/mugrade.git
  Cloning https://github.com/dlsys10714/mugrade.git to /tmp/pip-req-build-hkw5pm_y
  Running command git clone --filter=blob:none --quiet https://github.com/dlsys10714/mugrade.git /tmp/pip-req-build-hkw5pm_y
  Resolved https://github.com/dlsys10714/mugrade.git to commit ac73f725eb2ce0e2c6a38fa540035ee970b8b873
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: mugrade
  Building wheel for mugrade (setup.py) ... [?25l[?25hdone
  Created wheel for mugrade: filename=mugrade-1.3-py3-none-any.whl size=3708 sha256=a33cf60b74708d4bf626a403a5af050b7d7e702cb8762620cb13c530a348c2f0
  Stored in directory: /tmp/pip-ephem-wheel-cache-lezxcoyk/wheels/df/c7/14/2b747145fc762900af3ff05bd0c9192c506e70db3ef3890239
Successfully built mugrade
Installing collected packages: mugrade
Su

In [2]:
!ls

build		Makefile	   python     src
CMakeLists.txt	proj_andrew.ipynb  README.md  tests


In [3]:
!make

  Compatibility with CMake < 3.10 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
  to tell CMake that the project requires at least <min> but has been updated
  to work with policies introduced by <max> or earlier.

[0m
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python: /usr/local/bin/python (found version "3.12.12") found components: Development Interpreter Development.Module Development.Embed
-- Performing Test HAS_FLTO_AUTO
-- Performing Te

The make command reads the Makefile in the current directory. The Makefile contains rules that define how to build targets (like executables or libraries). For each target specified in the Makefile, make checks the timestamps of the target file and its dependencies (like .c, .cpp, or .h files). If any dependency has been modified recently, it must rebuild the target.

In [4]:
%set_env PYTHONPATH ./python
%set_env NEEDLE_BACKEND nd

env: PYTHONPATH=./python
env: NEEDLE_BACKEND=nd


In [5]:
import sys
sys.path.append('./python')

## Int8 quantization checks

These cells validate the new int8 quantization path (post-training).
They run quickly on CPU/GPU in Colab and ensure the quantized linear layer
tracks the float32 reference within a small tolerance.


In [7]:
!python3 -m pytest -v tests/hw3/test_quantization.py

platform linux -- Python 3.12.12, pytest-8.4.2, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/DL-Systems-Project
plugins: typeguard-4.4.4, langsmith-0.4.42, anyio-4.11.0
collected 3 items                                                              [0m

tests/hw3/test_quantization.py::test_quantize_round_trip [32mPASSED[0m[32m          [ 33%][0m
tests/hw3/test_quantization.py::test_per_channel_quantization_shapes [32mPASSED[0m[32m [ 66%][0m
tests/hw3/test_quantization.py::test_linear_quantized_matches_float [32mPASSED[0m[32m [100%][0m

