# Course Environment Setup (LING-ENV)

You’ll create a consistent Python environment so every notebook runs the same on everyone’s machine.  

## Conda path

### 1) Install
- **Windows & Intel Mac:** Install Miniconda or Anaconda.
- After install, open a new Terminal/PowerShell. If `conda` isn’t found, run `conda init` (e.g., `conda init zsh`) and restart the terminal.

### 2) Create and activate the environment
```bash
conda env create -f environment.yml
conda activate ling-env
```

### 3) Download language resources
```bash
python -m spacy download en_core_web_sm
python - <<'PY'
import nltk; nltk.download('punkt'); nltk.download('stopwords')
PY
```

### 4) Register the Jupyter kernel
```bash
python -m ipykernel install --user --name ling-env --display-name "Python (LING-ENV)"
jupyter lab
```

In each notebook, pick Kernel → Select Kernel → Python (LING-ENV).
![change_kernal](img/change_kernal_1.png)


### 5) Quick pre-flight cell
Run the following code when you are in LING-ENV.

In [1]:
import sys, os, importlib, spacy, nltk
env = os.environ.get("CONDA_DEFAULT_ENV") or os.path.basename(sys.prefix)
print("Python:", sys.version.split()[0])
print("Env:", env)
print("Kernel:", sys.executable)

for name in ("numpy","pandas","sklearn","matplotlib","seaborn",
             "nltk","spacy","wordcloud","tqdm","regex","ipykernel","jupyterlab"):
    try:
        m = importlib.import_module(name)
        print(f"{name:12s}", getattr(m, "__version__", "?"))
    except Exception as e:
        raise SystemExit(f"❌ Missing or broken package '{name}': {e}")

try:
    spacy.load("en_core_web_sm"); print("spaCy model: en_core_web_sm ✅")
except Exception as e:
    raise SystemExit(f"❌ Missing spaCy model: {e}")

for pkg, where in [("punkt","tokenizers/punkt"), ("stopwords","corpora/stopwords")]:
    try: nltk.data.find(where); print(f"NLTK data: {pkg} ✅")
    except LookupError: raise SystemExit(f"❌ Missing NLTK data '{pkg}'. See setup instructions.")
print("✅ Environment check passed.")


Python: 3.11.13
Env: ling-env
Kernel: /opt/anaconda3/envs/ling-env/bin/python
numpy        1.26.4
pandas       2.3.3
sklearn      1.7.2
matplotlib   3.10.6
seaborn      0.13.2
nltk         3.9.2
spacy        3.8.7
wordcloud    1.9.4
tqdm         4.67.1
regex        2.5.162
ipykernel    6.30.1
jupyterlab   4.4.9
spaCy model: en_core_web_sm ✅
NLTK data: punkt ✅
NLTK data: stopwords ✅
✅ Environment check passed.


In [2]:
# --- Compatibility check: spaCy ↔ NumPy ---
import importlib.metadata as im
from packaging.version import Version
from packaging.requirements import Requirement

np_ver = Version(__import__("numpy").__version__)
spacy_ver = Version(spacy.__version__)

# 1) 读取 spaCy 对 numpy 的声明式依赖（PEP 621 元数据）
numpy_spec = None
try:
    reqs = im.requires("spacy") or []
    for r in reqs:
        req = Requirement(r)
        if req.name.lower() == "numpy" and req.specifier:
            numpy_spec = str(req.specifier)  # 例如 '>=1.19.0,<3.0.0'
            break
except Exception:
    pass

print("\n[Compatibility]")
print("spaCy:", spacy_ver, "| NumPy:", np_ver)
if numpy_spec:
    print("spaCy requires NumPy", numpy_spec)
    # 用 packaging 检查当前 numpy 是否满足约束
    from packaging.specifiers import SpecifierSet
    ok = np_ver in SpecifierSet(numpy_spec)
    print("Version rule satisfied:", "✅" if ok else "❌")
else:
    print("Could not read spaCy's declared NumPy requirement (metadata missing).")

# 2) 运行时小烟囱测试（np<->spaCy 往返）
try:
    nlp = spacy.load("en_core_web_sm")
    doc = nlp("NumPy and spaCy should cooperate nicely.")
    # 把 token 长度向量化到 NumPy，再做一次简单变换
    import numpy as np
    lens = np.array([len(t.text) for t in doc], dtype=np.int32)
    assert lens.ndim == 1 and lens.size == len(doc)
    # 构造一个假“特征矩阵”，做点简单算术（确保二者 ABI/类型协作正常）
    feats = np.vstack([lens, lens**2]).T.astype(np.float64)
    _ = feats @ np.array([[0.1],[0.01]])  # 线性组合
    print("Runtime smoke test:", "✅ passed")
except Exception as e:
    print("Runtime smoke test: ❌ failed ->", type(e).__name__, e)



[Compatibility]
spaCy: 3.8.7 | NumPy: 1.26.4
spaCy requires NumPy >=1.15.0
Version rule satisfied: ✅
Runtime smoke test: ✅ passed
