## Libraries

In [2]:
#from accelerate import PartialState
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
import torch

## Creating 4-bit Quantization Config

In [4]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

## DeepSeek-Coder-V2-Lite-Instruct

In [3]:
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct")

model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct",
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)

del(model)
del(tokenizer)

A new version of the following files was downloaded from https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct:
- configuration_deepseek.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct:
- modeling_deepseek.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Downloading shards: 100%|██████████| 4/4 [45:36<00:00, 684.14s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [03:06<00:00, 46.57s/it]


## CodeLlama-13b-Instruct-hf

In [None]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-Instruct-hf")
model = AutoModelForCausalLM.from_pretrained(
    "codellama/CodeLlama-13b-Instruct-hf",
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True, #Automatic with quantized models
)

del(model)
del(tokenizer)

Downloading shards: 100%|██████████| 3/3 [37:42<00:00, 754.05s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [01:05<00:00, 21.68s/it]


## SantaCoder

In [5]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("bigcode/santacoder", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "bigcode/santacoder",
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)
del(model)
del(tokenizer)

A new version of the following files was downloaded from https://huggingface.co/bigcode/santacoder:
- configuration_gpt2_mq.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/bigcode/santacoder:
- modeling_gpt2_mq.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


## Instruct CodeT5P - 16b params

In [7]:
from transformers import AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/instructcodet5p-16b")
model = AutoModelForSeq2SeqLM.from_pretrained(
    "Salesforce/instructcodet5p-16b",
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)

del(model)
del(tokenizer)

A new version of the following files was downloaded from https://huggingface.co/Salesforce/instructcodet5p-16b:
- modeling_codet5p.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading shards: 100%|██████████| 5/5 [48:11<00:00, 578.21s/it]
CodeT5pForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Loadi

## CodeT5p-770m (Alternative to CodeT5p)(https://huggingface.co/Salesforce/codet5p-770m)

In [9]:
from transformers import AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/codet5p-770m")
model = AutoModelForSeq2SeqLM.from_pretrained(
    "Salesforce/codet5p-770m",
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)

del(model)
del(tokenizer)

## CodeT5-Base - 770M params (Alternative to CodeT5p)

In [None]:
from transformers import AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/codet5-base")
model = AutoModelForSeq2SeqLM.from_pretrained(
    "Salesforce/codet5-base",
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)

del(model)
del(tokenizer)

## CodeGen2.5-7b-Instruct_P

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen25-7b-instruct_P")
model = AutoModelForCausalLM.from_pretrained(
    "Salesforce/codegen25-7b-instruct_P",
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)

del(model)
del(tokenizer)

## CodeGen2 - 1B params (Alternative to codegen 2.5)

In [7]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen2-1B_P", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "Salesforce/codegen2-1B_P",
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)

del(model)
del(tokenizer)

A new version of the following files was downloaded from https://huggingface.co/Salesforce/codegen2-1B_P:
- configuration_codegen.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/Salesforce/codegen2-1B_P:
- modeling_codegen.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


## WizardCoder-15B-V1.0

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("WizardLMTeam/WizardCoder-15B-V1.0")
model = AutoModelForCausalLM.from_pretrained(
    "WizardLMTeam/WizardCoder-15B-V1.0",
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)

del(model)
del(tokenizer)

## StarCoder2-3b

In [8]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder2-3b")
model = AutoModelForCausalLM.from_pretrained(
    "bigcode/starcoder2-3b",
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)

del(model)
del(tokenizer)

## Freeing Up Cache Space

In [None]:
#Delete models from cache
import subprocess

subprocess.run(["rm", "-rf", "~/.cache/huggingface/hub"])

In [None]:
#Delete datasets from cache (Shouldn't be needed with this data)
import subprocess

subprocess.run(["rm", "-rf", "~/.cache/huggingface/datasets"])