# Ultimate Migration Guide: MATLAB → Python (NumPy/SciPy) — For Scientific Computing

**Audience:** MATLAB-heavy scientists/engineers.  
**Goal:** Show that *nothing in MATLAB is irreplaceable* for scientific computing.  
**Scope:** Core language + NumPy, SciPy, Matplotlib, pandas, scikit-image; no Simulink/toolboxes required.  
**Tip:** Run cells top-to-bottom. Keep this open as a *cheat sheet* and reference.

## 0. TL;DR: Big Picture Mapping

- **MATLAB** is a proprietary array language. **Python** is a general language; use **NumPy/SciPy/Matplotlib/pandas** for MATLAB-like work.
- **Indexing:** MATLAB is **1-based** and slices **inclusive** on the end when using `end`; Python is **0-based** and slices are **exclusive** at the end.
- **Arrays:** MATLAB `double` matrices everywhere. Python uses **NumPy `ndarray`**; choose dtype explicitly (`float64` default).
- **Linear algebra:** MATLAB's `A\b`, `eig`, `svd` → **`numpy.linalg` / `scipy.linalg`**; sparse via **`scipy.sparse`**.
- **Plotting:** MATLAB `plot`, `imshow` → **Matplotlib** (`pyplot`) and **image utils** (`matplotlib.pyplot.imshow`), plus **scikit-image** for processing.
- **Tables:** MATLAB `table` → **pandas `DataFrame`**.
- **Toolboxes:** Most have SciPy/scikit-learn/scikit-image equivalents.
- **Scripts & functions:** MATLAB `.m` → **.py modules**, **Jupyter notebooks**, or **packages**.

## 1. Install & Environment (Conda or pip)

Use **Conda (recommended for workshops)** or `pip`. Either way, you can get a full scientific stack quickly.

### Option A — Conda (recommended)
```bash
# Fresh environment
conda create -n sci python=3.12 numpy scipy matplotlib pandas scikit-image jupyter numba cython
conda activate sci
jupyter lab  # or: jupyter notebook
```

### Option B — pip (inside a virtualenv)
```bash
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -U pip wheel
pip install numpy scipy matplotlib pandas scikit-image jupyter numba cython
jupyter lab
```

> **Tip:** If you rely on Fortran-accelerated routines (BLAS/LAPACK), conda-forge builds are painless.

## 2. Language Fundamentals: Indexing, Slicing, Shapes
**MATLAB vs Python (NumPy) essentials**.

In [None]:
import numpy as np

a = np.arange(1, 11)  # 1..10
print("a:", a)
print("shape:", a.shape)  # 1D vector

A = np.arange(1, 13).reshape(3, 4, order="C")  # 3x4 row-major by default
print("\nA:\n", A)
print("A shape:", A.shape)

# Indexing differences
print("\nIndexing examples:")
print("Python a[0] (MATLAB a(1)) ->", a[0])
print("Python a[0:3] (MATLAB a(1:3)) ->", a[0:3])
print("Python a[::2] (MATLAB a(1:2:end)) ->", a[::2])

# Rows/cols (remember 0-based)
print("\nRow 0, all cols  (MATLAB A(1,:)) ->", A[0, :])
print("All rows, col 1    (MATLAB A(:,2)) ->", A[:, 1])

# Boolean masking
mask = A % 2 == 0
print("\nEven mask:\n", mask)
print("A[mask] (MATLAB A(A%2==0)) ->", A[mask])

# Transpose & conjugate: A.T is transpose; A.conj().T is MATLAB A'
B = (A + 1j*A).conj().T
print("\nConjugate transpose like MATLAB A' -> shape:", B.shape)

### Broadcasting vs. Implicit Expansion
MATLAB newer versions have implicit expansion; NumPy has **broadcasting** since forever.

In [None]:
x = np.arange(4).reshape(4,1)   # 4x1
y = np.arange(3).reshape(1,3)   # 1x3
S = x + y                       # 4x3 via broadcasting
print(S)

## 3. Elementwise vs Matrix Operations

- MATLAB `.*`, `./`, `.^` → Python uses **same operators** for elementwise on arrays: `*`, `/`, `**`.
- Matrix multiply: MATLAB `*` → Python **`@`** or `np.matmul`.

In [None]:
A = np.arange(1, 7).reshape(2, 3)
B = np.arange(1, 7).reshape(3, 2)

print("Elementwise square (MATLAB A.^2):\n", A**2)
print("\nMatrix product (MATLAB A*B):\n", A @ B)

## 4. Shapes & Sizes (Common Queries)

In [None]:
X = np.random.default_rng(0).normal(size=(5, 3, 2))

print("shape (MATLAB size(X)):", X.shape)
print("ndim  (MATLAB ndims):", X.ndim)
print("size along axis 0 (MATLAB size(X,1)):", X.shape[0])
print("num elements (MATLAB numel):", X.size)
print("flatten (MATLAB X(:)):", X.ravel())

## 5. Linear Algebra (Dense): `numpy.linalg` / `scipy.linalg`
- Solve `Ax=b`: MATLAB `x = A\b` → **`np.linalg.solve(A, b)`**
- Least squares: MATLAB `A\b` for tall A → **`np.linalg.lstsq(A, b)`**
- `eig`, `svd`, `qr`, `cholesky` → **NumPy/SciPy equivalents**

In [None]:
import numpy as np
A = np.array([[3.0, 2.0], [2.0, 6.0]])
b = np.array([2.0, -8.0])

x = np.linalg.solve(A, b)
w, V = np.linalg.eig(A)
U, s, VT = np.linalg.svd(A)

print("solve Ax=b:", x)
print("\neigvals:", w)
print("svd singular values:", s)

## 6. Sparse Matrices: `scipy.sparse`

In [None]:
import numpy as np
from scipy import sparse

row = np.array([0, 1, 2, 2])
col = np.array([0, 1, 0, 2])
data = np.array([10.0, 20.0, 30.0, 40.0])

S = sparse.coo_matrix((data, (row, col)), shape=(3, 3)).tocsr()
print(S)
print("S @ [1,2,3] ->", (S @ np.array([1.0, 2.0, 3.0])).tolist())

## 7. Plotting: Matplotlib (familiar enough)
- `plot`, `xlabel`, `ylabel`, `title`, `legend` match MATLAB concepts.
- Multiple `plot()` calls draw on the same axes.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2*np.pi, 200)
y1 = np.sin(x)
y2 = np.cos(x)

plt.figure()
plt.plot(x, y1, label="sin")
plt.plot(x, y2, label="cos")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Sine & Cosine (Matplotlib)")
plt.legend()
plt.show()

### Images

In [None]:
import numpy as np
import matplotlib.pyplot as plt

img = np.linspace(0, 1, 256*256, dtype=np.float64).reshape(256,256)
plt.figure()
plt.imshow(img, cmap="gray")
plt.title("Grayscale gradient (imshow)")
plt.axis("off")
plt.show()

## 8. Tables & Metadata: pandas DataFrame = MATLAB `table`

In [None]:
import pandas as pd
import numpy as np

df = pd.DataFrame({
    "id": np.arange(5),
    "value": np.random.default_rng(0).normal(size=5),
    "label": list("abcde"),
})
display(df.head())

print("\nDescribe:")
print(df.describe(numeric_only=True))

print("\nFilter (MATLAB df(df.value>0,:))")
print(df[df["value"] > 0])

## 9. File I/O: MAT, CSV, HDF5, NetCDF

In [None]:
from scipy.io import savemat, loadmat
import numpy as np
import tempfile, os

# MAT files
tmp = tempfile.NamedTemporaryFile(suffix=".mat", delete=False)
savemat(tmp.name, {"A": np.arange(6).reshape(2,3), "note": "hello"})
mat = loadmat(tmp.name)
print("Loaded from .mat:", [k for k in mat.keys() if not k.startswith('__')])
os.remove(tmp.name)

# CSV
import pandas as pd, io
csv_buf = io.StringIO()
pd.DataFrame({"x":[1,2,3], "y":[10,20,30]}).to_csv(csv_buf, index=False)
csv_buf.seek(0)
print("\nCSV preview:")
print(csv_buf.getvalue())

# HDF5 via pandas (to_hdf) or h5py; NetCDF via netCDF4/xarray (not shown here).

## 10. Random Numbers

In [None]:
import numpy as np

rng = np.random.default_rng(42)  # Generator (recommended)
print("normal(0,1, size=3):", rng.normal(size=3))
print("integers 0..9:", rng.integers(0, 10, size=5))
print("choice:", rng.choice([10, 20, 30], size=4, replace=True))

## 11. Optimization, Roots, Least-Squares: `scipy.optimize`

In [None]:
import numpy as np
from scipy import optimize

# Nonlinear root: solve f(x)=0 for x ~ 1
f = lambda x: x*np.cos(x) - 0.25
root = optimize.root_scalar(f, bracket=[0, 2])
print("root ~", root.root)

# Nonlinear least squares
def model(p, x):
    a, b = p
    return a*np.exp(b*x)

x = np.linspace(0,1,20)
y = 2.0*np.exp(1.5*x) + 0.05*np.random.default_rng(0).normal(size=len(x))

def residuals(p):
    return model(p, x) - y

res = optimize.least_squares(residuals, x0=[1.0, 1.0])
print("fit params:", res.x)

## 12. ODEs/PDE Building Blocks: `scipy.integrate`

In [None]:
import numpy as np
from scipy.integrate import solve_ivp

def rhs(t, y):
    # dy/dt = -y, y(0)=1
    return -y

sol = solve_ivp(rhs, (0, 5), y0=[1.0], dense_output=True)
print("y(5) ~", sol.sol(5.0)[0])

## 13. Signal & Image Processing
- **Signal:** `scipy.signal`
- **FFT:** `numpy.fft` / `scipy.fft`
- **Image:** `scikit-image`

In [None]:
import numpy as np
from scipy import fft

fs = 200.0
t = np.arange(0, 1, 1/fs)
sig = np.sin(2*np.pi*20*t) + 0.5*np.sin(2*np.pi*40*t)

f = fft.rfftfreq(len(t), 1/fs)
S = np.abs(fft.rfft(sig))

import matplotlib.pyplot as plt
plt.figure()
plt.plot(f, S)
plt.xlabel("Frequency [Hz]")
plt.ylabel("|FFT|")
plt.title("Simple FFT")
plt.show()

## 14. Performance: Vectorize, Numba, Cython
- Prefer **vectorized NumPy** ops.
- Use **Numba** `@njit` for fast loops when vectorization is awkward.
- For the absolute edge, **Cython** / custom C/C++.

In [None]:
import numpy as np

# Vectorized: fast
def pairwise_distances_vectorized(X):
    # squared Euclidean distances between rows of X
    G = X @ X.T
    diag = np.diag(G)
    D2 = diag[:,None] + diag[None,:] - 2*G
    D2[D2 < 0] = 0.0
    return np.sqrt(D2, dtype=X.dtype)

X = np.random.default_rng(0).normal(size=(300, 8))
D = pairwise_distances_vectorized(X)
print("D shape:", D.shape)

In [None]:
# Optional: Numba acceleration demo (works if numba installed)
try:
    import numba as nb
    import numpy as np

    @nb.njit
    def roll_sum(x, w):
        n = x.shape[0]
        out = np.empty(n - w + 1, dtype=x.dtype)
        s = 0.0
        for i in range(n):
            s += x[i]
            if i >= w:
                s -= x[i-w]
            if i >= w-1:
                out[i-w+1] = s
        return out

    x = np.random.random(10_000)
    out = roll_sum(x, 32)
    print("Numba roll_sum:", out.shape)
except Exception as e:
    print("Numba not available or JIT failed:", e)

## 15. Common MATLAB → Python Cheat Sheet
**Use this table when porting.**

In [None]:
import pandas as pd

mapping = [
    ("size(A)", "A.shape"),
    ("length(A)", "A.size or max(A.shape)"),
    ("numel(A)", "A.size"),
    ("ndims(A)", "A.ndim"),
    ("A(:)", "A.ravel() or A.flatten()"),
    ("A.' / A'", "A.T / A.conj().T"),
    ("A*B (matrix)", "A @ B"),
    ("A.*B (elemwise)", "A * B"),
    ("A./B (elemwise)", "A / B"),
    ("A.^k", "A ** k"),
    ("A\\b", "np.linalg.solve(A,b) or np.linalg.lstsq(A,b, rcond=None)"),
    ("inv(A)", "np.linalg.inv(A) (rarely needed)"),
    ("pinv(A)", "np.linalg.pinv(A)"),
    ("eig(A)", "np.linalg.eig(A)"),
    ("svd(A)", "np.linalg.svd(A)"),
    ("qr(A)", "np.linalg.qr(A)"),
    ("chol(A)", "np.linalg.cholesky(A)"),
    ("eye(n)", "np.eye(n)"),
    ("zeros(n,m)", "np.zeros((n,m))"),
    ("ones(n,m)", "np.ones((n,m))"),
    ("rand(n,m)", "np.random.default_rng().random((n,m))"),
    ("randn(n,m)", "np.random.default_rng().normal(size=(n,m))"),
    ("linspace(a,b,n)", "np.linspace(a,b,n)"),
    ("logspace(a,b,n)", "np.logspace(a,b,n)"),
    ("meshgrid(x,y)", "np.meshgrid(x,y,indexing='xy')"),
    ("ndgrid(x,y)", "np.meshgrid(x,y,indexing='ij')"),
    ("repmat(A,m,n)", "np.tile(A, (m,n))"),
    ("bsxfun(@op,A,B)", "broadcasting: op(A,B)"),
    ("find(idx)", "np.nonzero(idx)"),
    ("sum(A,dim)", "A.sum(axis=dim-1)  # MATLAB dim=1(rows)→axis=0"),
    ("mean(A,dim)", "A.mean(axis=dim-1)"),
    ("std(A,dim)", "A.std(axis=dim-1, ddof=1)"),
    ("diag(v) / diag(A)", "np.diag(v) / np.diag(A)"),
    ("cat(dim, ...)", "np.concatenate([...], axis=dim-1)"),
    ("cell array", "Python list of arrays/objects"),
    ("struct", "Python dict or simple class"),
    ("function handle @f", "callable `f` (def or lambda)"),
    ("table", "pandas DataFrame"),
    ("save('x.mat','A')", "scipy.io.savemat('x.mat', {'A':A})"),
    ("load('x.mat')", "scipy.io.loadmat('x.mat')"),
    ("plot(x,y)", "matplotlib.pyplot.plot(x,y)"),
    ("imshow(I)", "matplotlib.pyplot.imshow(I)"),
    ("subplot(m,n,i)", "matplotlib.pyplot.subplot(m,n,i) / `plt.subplots`"),
    ("hold on", "multiple `plot()` on same axes"),
    ("axis equal", "plt.axis('equal')"),
    ("xlim/ylim", "plt.xlim / plt.ylim"),
]
df_map = pd.DataFrame(mapping, columns=["MATLAB", "Python/NumPy/SciPy"])

# Display in the notebook using pandas' rich repr
df_map

## 16. Scripts, Modules, Packages
- MATLAB scripts/functions in `.m` → **modules** (`.py`) and **packages** (folders with `__init__.py`).
- Add paths: use packages and `pip install -e .` rather than `addpath`.
- Docstrings (`"""Doc"""`) replace MATLAB help blocks.

## 17. Gotchas & Differences Checklist
- **Index base:** MATLAB 1-based vs Python 0-based.
- **Slice end:** MATLAB `1:3` includes 3; Python `0:3` excludes 3.
- **Column-major vs Row-major:** MATLAB is Fortran-order; NumPy defaults to C-order (row-major). Use `order='F'` where beneficial.
- **Views:** NumPy slicing returns **views** when possible — mutations can affect the original.
- **`A'` vs `A.'`:** In NumPy, use `A.conj().T` vs `A.T`.
- **Broadcasting:** Embrace it; drop `bsxfun`/`repmat` gymnastics.
- **NaN/Inf:** `np.nan`, `np.isfinite`, `np.nanmean`, etc.
- **Random:** Prefer `np.random.default_rng(seed)`.
- **Performance:** Vectorize → Numba/Cython if needed.

## 18. Final Word
If you can do it in MATLAB, you can do it — and often **faster/cheaper** — in Python with NumPy/SciPy/Matplotlib/pandas & friends.
Keep this notebook as your living companion during migration.