# setup.ipynb

This notebook initializes the project environment for the Small Data NER project.

It performs the following steps:
1. Mounts Google Drive.
2. Creates the project folder structure if missing:
   - raw/: original E3C data
   - conll/: train/dev/test and few-shot splits
   - utils/: helper scripts (conll_io.py, metrics.py)
   - notebooks for preprocessing, model training, and evaluation
3. Installs required Python packages.
4. Verifies that all files and paths are accessible.

After running setup.ipynb once, all team members can open other notebooks directly (e.g., preprocessing.ipynb, prompting.ipynb).

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%cd /content
!rm -rf .git

/content


In [None]:
%cd /content/drive/MyDrive/small_data_NER_project
!printf "%s\n" \
"__pycache__/" \
"*.ipynb_checkpoints/" \
"*.gdoc" "*.gsheet" "*.tmp" \
"wandb/" \
"*.bin" "*.pt" "*.safetensors" \
".DS_Store" \
".config/" "sample_data/" > .gitignore

/content/drive/MyDrive/small_data_NER


In [None]:
import os

base_path = "/content/drive/MyDrive/small_data_NER_project"

# Define project subfolders
subfolders = [
    "raw",
    "conll",
    "utils",
    "results"
]

# Create directories
for sub in subfolders:
    os.makedirs(os.path.join(base_path, sub), exist_ok=True)

# Install basic dependencies
!pip install transformers datasets seqeval peft accelerate -q

# Verify structure
print("Project structure:")
for root, dirs, files in os.walk(base_path):
    level = root.replace(base_path, '').count(os.sep)
    indent = ' ' * 4 * level
    print(f"{indent}{os.path.basename(root)}/")
    subindent = ' ' * 4 * (level + 1)
    for f in files:
        print(f"{subindent}{f}")

print("\nSetup complete.")

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/43.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for seqeval (setup.py) ... [?25l[?25hdone
Project structure:
small_data_NER/
    setup.ipynb
    .gitattributes
    preprocessing.ipynb
    .gitignore
    biobert_baseline.ipynb
    .git/
        description
        packed-refs
        index
        HEAD
        config
        hooks/
            applypatch-msg.sample
            prepare-commit-msg.sample
            pre-rebase.sample
            post-update.sample
            commit-msg.sample
            pre-applypatch.sample
            pre-receive.sample
            pre-push.sample
            fsmonitor-watchman.sample
            pre-merge-commit.sample
            pre-commit.sample
            sendemail-validate.sample
     