# Medical Cross-Task Transfer - Kaggle (Simple Method)

**No virtual environment needed!**

Works by forcing installation of compatible package versions.

---

## Cell 1: Clone Repository

In [None]:
!git clone https://github.com/bharathbolla/Crosstalk_Medical_LLM.git
%cd Crosstalk_Medical_LLM
!pwd

## Cell 2: Force Install Compatible Packages

This uninstalls old packages and installs compatible versions.

In [None]:
# Remove old conflicting packages
!pip uninstall -y datasets pyarrow -q

# Install compatible versions (no cache = fresh install)
print("Installing compatible packages...")
!pip install --no-cache-dir pyarrow==14.0.0 -q
!pip install --no-cache-dir datasets==2.20.0 -q
!pip install --no-cache-dir transformers torch accelerate scikit-learn pyyaml -q

# Verify versions
import datasets, pyarrow, transformers
print(f"\nâœ… Installed:")
print(f"   datasets: {datasets.__version__}")
print(f"   pyarrow: {pyarrow.__version__}")
print(f"   transformers: {transformers.__version__}")

## Cell 3: Verify Datasets Exist

In [None]:
from pathlib import Path

data_path = Path("data/raw")
datasets_list = ["bc2gm", "jnlpba", "chemprot", "ddi", "gad", "hoc", "pubmedqa", "biosses"]

print("Checking datasets...\n")
for name in datasets_list:
    status = "âœ“" if (data_path / name).exists() else "âœ—"
    print(f"{status} {name}")

print("\nâœ… All datasets are pre-included in the repository!")

## Cell 4: Test Parsers

In [None]:
!python test_parsers.py

## Cell 5: Quick Verification

Load one dataset to verify everything works.

In [None]:
import sys
sys.path.insert(0, 'src')

from data import BC2GMDataset
from pathlib import Path

print("Loading BC2GM dataset...")
dataset = BC2GMDataset(data_path=Path("data/raw"), split="train")

print(f"\nâœ… Loaded {len(dataset)} samples!")
print(f"\nFirst sample:")
print(f"  Text: {dataset[0].input_text[:100]}...")
print(f"  Labels: {dataset[0].labels[:10]}...")

print("\nðŸŽ‰ Everything works! Ready to train!")

## Cell 6: Run Baseline Training (Optional)

In [None]:
# Run baseline experiment
!python scripts/run_baseline.py \
    --model bert-base-uncased \
    --task bc2gm \
    --epochs 3 \
    --batch_size 16

## Success! ðŸŽ‰

If you got here:
- âœ… Compatible packages installed
- âœ… All 8 datasets available
- âœ… All parsers working
- âœ… Ready for experiments!

---

### Next Steps:

Run full experiments:
```python
!python scripts/run_experiment.py strategy=s1_single task=all
```