### Use the following environment to run the below script
* Instance Type: c5.2xlarge
* Kernel: Pytorch 2.0.0 Python 3.10 CPU

Requirements:
* Huggingface read token: https://huggingface.co/docs/hub/en/security-tokens
* Request model access for [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
* 140 GB+ of storage for the LLM
* Ensure you have permissions to upload data to the bucket. You can grab in S3 with the following pattern: [aws-account-num]-[region]-[environment (default is `dev`)]-model-weights-[random string]:
    * `s3:PutObject`: Required for uploading files
    * `s3:GetObject`: Required for reading/verifying files (optional but recommended)
    * `s3:ListBucket`: Required for listing bucket contents

### Install huggingface_hub

In [None]:
!pip install huggingface_hub --quiet

### Imports

In [None]:
from huggingface_hub import login, snapshot_download
from pathlib import Path
import os
 
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

### Enter Bucket

In [None]:
bucket = "<ENTER BUCKET WHERE LLM WILL BE UPLOADED>"

### Sign in to huggingface hub

In [None]:
hf_token = "<ENTER YOUR HF TOKEN>" 
login(token=hf_token, add_to_git_credential=True)

### NER - BertNER

#### Select model ID

In [None]:
HF_MODEL_ID = "dslim/bert-large-NER"
tar_filename = "bert-large-NER.tar.gz"

#### Create model dir

In [None]:
model_dir = Path(HF_MODEL_ID.split("/")[-1])
model_dir.mkdir(exist_ok=True)

#### Download Model

In [None]:
# Download model from Hugging Face into model_dir
snapshot_download(
    HF_MODEL_ID,
    local_dir=str(model_dir), # download to model dir
    revision="main", # use a specific revision, e.g. refs/pr/21
    local_dir_use_symlinks=False, # use no symlinks to save disk space
    ignore_patterns=["*.msgpack*", "*.h5*", "*.bin*"], # to load safetensor weights
)
 
# check if safetensor weights are downloaded and available
assert len(list(model_dir.glob("*.safetensors"))) > 0, "Model download failed"

#### Zip model weights

In [None]:
!cd $model_dir && tar czvf ../$tar_filename *
!cd ..

#### Select file to upload into S3

In [None]:
!aws s3 cp $tar_filename s3://$bucket/ner-model.tar.gz

### LLM - Mixtral
Note: this may take several hours due to its size 140 GB +

#### Select model ID

In [None]:
HF_MODEL_ID = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tar_filename = "Mixtral-8x7B-Instruct-v0.1.tar.gz"

#### Create model dir

In [None]:
model_dir = Path(HF_MODEL_ID.split("/")[-1])
model_dir.mkdir(exist_ok=True)

#### Download Model

In [None]:
# Download model from Hugging Face into model_dir
snapshot_download(
    HF_MODEL_ID,
    local_dir=str(model_dir), # download to model dir
    revision="main", # use a specific revision, e.g. refs/pr/21
    local_dir_use_symlinks=False, # use no symlinks to save disk space
    ignore_patterns=["*.msgpack*", "*.h5*", "*.bin*"], # to load safetensor weights
)

# check if safetensor weights are downloaded and available
assert len(list(model_dir.glob("*.safetensors"))) > 0, "Model download failed"

#### Zip model weights

In [None]:
!cd $model_dir && tar czvf ../$tar_filename *
!cd ..

#### Select file to upload into S3

In [None]:
!aws s3 cp $tar_filename s3://$bucket/llm-model.tar.gz # for the mixtral llm model