# Init Colab

Initialisiert die Colab-Session: GPU, Dependencies, HuggingFace, Repo, Google Drive.

**Voraussetzungen:**
- GPU-Runtime: `Runtime > Change runtime type > T4 GPU`
- Colab Secret: `HF_TOKEN` (Schlüssel-Icon links)

**Nach Run All:** Andere Notebooks über `File > Open notebook > GitHub` öffnen.

In [None]:
# Cell 1: GPU prüfen
import torch

print(f"GPU verfügbar: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    vram_gb = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"VRAM: {vram_gb:.1f} GB")
else:
    print("WARNUNG: Keine GPU! Runtime > Change runtime type > T4/A100")

!nvidia-smi

In [None]:
# Cell 2: Dependencies installieren
!pip install -q transformers datasets huggingface_hub scikit-learn matplotlib seaborn tqdm pandas

In [None]:
# Cell 3: HuggingFace Login
from huggingface_hub import login
from google.colab import userdata

try:
    hf_token = userdata.get("HF_TOKEN")
    login(token=hf_token)
    print("HuggingFace authentifiziert via Colab Secrets.")
except Exception:
    hf_token = input("HuggingFace Token eingeben: ")
    login(token=hf_token)
    print("HuggingFace authentifiziert via manuelle Eingabe.")

In [None]:
# Cell 4: Repo klonen / aktualisieren
import os

REPO_URL = "https://github.com/ZorbeyOezcan/news_articles_classification_thesis.git"
REPO_DIR = "/content/news_articles_classification_thesis"
PIPELINE_DIR = f"{REPO_DIR}/Python/classification_pipeline"

if os.path.exists(REPO_DIR):
    print("Repo existiert bereits, aktualisiere...")
    !cd {REPO_DIR} && git pull
else:
    print("Klone Repo...")
    !git clone {REPO_URL} {REPO_DIR}

print(f"\nDateien in classification_pipeline:")
!ls {PIPELINE_DIR}/

In [None]:
# Cell 5: Google Drive mounten (für persistente Reports)
from google.colab import drive

drive.mount("/content/drive")

# Report-Ordner auf Drive erstellen
DRIVE_REPORTS = "/content/drive/MyDrive/thesis_reports/performance_reports"
os.makedirs(DRIVE_REPORTS, exist_ok=True)
print(f"Reports werden gespeichert in: {DRIVE_REPORTS}")

In [None]:
# Cell 6: Python-Pfad setzen (damit import pipeline_utils funktioniert)
import sys

if PIPELINE_DIR not in sys.path:
    sys.path.insert(0, PIPELINE_DIR)

# Test-Import
import pipeline_utils as pu
print(f"pipeline_utils geladen.")
print(f"Reports-Ordner: {pu.REPORTS_DIR}")
print(f"\nInit abgeschlossen! Jetzt init_data.ipynb öffnen und ausführen.")