Skip to content

Commit

Permalink
Merge pull request #590 from allenai/shanea/update-readme-to-olmo-1.7
Browse files Browse the repository at this point in the history
Update README HF examples to use OLMo-1.7-7B
  • Loading branch information
2015aroras committed May 21, 2024
2 parents 8ec2809 + 755d208 commit 652c745
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,13 @@ Details about the other types of OLMo checkpoints (including OLMo HF Transformer

## Inference

You can utilize our Hugging Face integration to run inference on the olmo checkpoints:
You can utilize our Hugging Face integration to run inference on the OLMo Transformers checkpoints:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-hf")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B-hf")
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1.7-7B-hf")

message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
Expand All @@ -80,7 +80,7 @@ Alternatively, with the Hugging Face pipeline abstraction:

```python
from transformers import pipeline
olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B-hf")
olmo_pipe = pipeline("text-generation", model="allenai/OLMo-1.7-7B-hf")
print(olmo_pipe("Language modeling is"))
```

Expand All @@ -95,7 +95,7 @@ python scripts/convert_olmo_to_hf_new.py --input_dir /path/to/olmo/checkpoint --
### Quantization

```python
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-hf", torch_dtype=torch.float16, load_in_8bit=True) # requires bitsandbytes
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf", torch_dtype=torch.float16, load_in_8bit=True) # requires bitsandbytes
```

The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as inputs.input_ids.to('cuda') to avoid potential issues.
Expand Down

0 comments on commit 652c745

Please sign in to comment.