### Tutorial: Download Model Checkpoints

This notebook guides you through downloading the pre-trained weights required to run HME.

**HME consists of two parts:**
1. **Base LLM:** `Meta-Llama-3-8B-Instruct` (Requires Hugging Face authentication).
2. **HME Adapters:** Specialized checkpoints for different tasks (Captioning, QA, etc.).

You can choose to download all models or specific ones based on your needs.

In [3]:
import sys
import os
from pathlib import Path

# Add project root to path
sys.path.append('..')

# Import the core download function
from hme.download_models import download_model

# Define the root directory for saving checkpoints
CHECKPOINTS_DIR = Path('../checkpoints')
CHECKPOINTS_DIR.mkdir(parents=True, exist_ok=True)

print(f"Models will be downloaded to: ../checkpoints/")

Models will be downloaded to: ../checkpoints/


### 1. Download Base Model (Llama-3)

**Important Requirement:**
To download `meta-llama/Meta-Llama-3-8B-Instruct`, you must:
1. Have a Hugging Face account.
2. Request access on the [model page](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
3. Log in locally by running `huggingface-cli login` in your terminal or using the cell below.

In [4]:
# Uncomment the following line if you haven't logged in yet
# !huggingface-cli login

repo_id = "meta-llama/Meta-Llama-3-8B-Instruct"
local_name = "Meta-Llama-3-8B-Instruct"
target_dir = CHECKPOINTS_DIR / local_name

print(f"Downloading Base Model: {repo_id} ...")
try:
    download_model(repo_id, target_dir)
    print("Base model ready.")
except Exception as e:
    print(f"\nDownload Failed: {e}")
    print("Please check your Hugging Face login status and access permissions.")

Downloading Base Model: meta-llama/Meta-Llama-3-8B-Instruct ...
Downloading 'meta-llama/Meta-Llama-3-8B-Instruct' to '../checkpoints/Meta-Llama-3-8B-Instruct'...


For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.


Fetching 14 files:   0%|          | 0/14 [00:00<?, ?it/s]

LICENSE:   0%|          | 0.00/7.80k [00:00<?, ?B/s]

Failed to download 'meta-llama/Meta-Llama-3-8B-Instruct'. Error: 403 Client Error. (Request ID: Root=1-693674d6-4d150027479460c4127f4d71;08e44720-b0c2-4ce1-83db-42ff420dcc38)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/8afb486c1db24fe5011ec46dfbe5b5dccdb575c2/config.json.
Your request to access model meta-llama/Meta-Llama-3-8B-Instruct has been rejected by the repo's authors.


SystemExit: 1

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


### 2. Download HME Task Adapters

Select the models you wish to use. You can uncomment specific lines in the list below.

In [5]:
# Dictionary of available HME models
hme_models = {
    "HME_comprehension-pretrain": "GreatCaptainNemo/HME_comprehension-pretrain",
    # "HME_comprehension-pretrain-s2": "GreatCaptainNemo/HME_comprehension-pretrain-s2",
    #"HME_captioning": "GreatCaptainNemo/HME_captioning",
    "HME_general-qa": "GreatCaptainNemo/HME_general-qa",
    # "HME_property-qa-1": "GreatCaptainNemo/HME_property-qa-1",
    # "HME_property-qa-2": "GreatCaptainNemo/HME_property-qa-2",
    # "HME_pocket-based-ligand-generation_pretrain": "GreatCaptainNemo/HME_pocket-based-ligand-generation_pretrain",
    # "HME_pocket-based-ligand-generation": "GreatCaptainNemo/HME_pocket-based-ligand-generation",
}

print(f"Starting download for {len(hme_models)} selected adapters...\n")

for local_name, repo_id in hme_models.items():
    target_dir = CHECKPOINTS_DIR / local_name
    print(f"Downloading [{local_name}] from {repo_id}...")
    
    try:
        download_model(repo_id, target_dir)
    except SystemExit:
        # Catch sys.exit from the helper function to prevent notebook from stopping
        print(f"Failed to download {local_name}")
        
print("\nAll selected tasks processed.")

Starting download for 2 selected adapters...

Downloading [HME_comprehension-pretrain] from GreatCaptainNemo/HME_comprehension-pretrain...
Downloading 'GreatCaptainNemo/HME_comprehension-pretrain' to '../checkpoints/HME_comprehension-pretrain'...


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/2.66G [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/838 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Successfully downloaded 'GreatCaptainNemo/HME_comprehension-pretrain'.
Downloading [HME_general-qa] from GreatCaptainNemo/HME_general-qa...
Downloading 'GreatCaptainNemo/HME_general-qa' to '../checkpoints/HME_general-qa'...


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/612M [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/854 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

Successfully downloaded 'GreatCaptainNemo/HME_general-qa'.

All selected tasks processed.


### 3. Verification

Check if the directories were created successfully.

In [6]:
print("Contents of checkpoints directory:")
for item in CHECKPOINTS_DIR.iterdir():
    if item.is_dir():
        print(f"  ðŸ“‚ {item.name}")

Contents of checkpoints directory:
  ðŸ“‚ Meta-Llama-3-8B-Instruct
  ðŸ“‚ HME_comprehension-pretrain
  ðŸ“‚ HME_general-qa
