# Using Cohere's CommandR+ with Hugging Face Serverless Inference API

In [None]:
!pip install --upgrade transformers huggingface_hub

In [3]:
from transformers import AutoTokenizer
from huggingface_hub import InferenceClient, interpreter_login, get_token

# interpreter_login()

### Instantiate an `InferenceClient`

See [the docs](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client) for details


In [4]:
# Note that we can optionally specify a model name or Inference Endpoint URL here or at the time of call the model.
client = InferenceClient()

## Generate Text Using a Chat Template

We first apply the chat template and then send the request via Inference API

In [7]:
model_id = "CohereForAI/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)

# define conversation input:
conversation = [{"role": "user", "content": "Whats the biggest penguin in the world?"}]
# define documents to ground on:
documents = [
    {
        "title": "Tall penguins",
        "text": "Emperor penguins are the tallest growing up to 122 cm in height.",
    },
    {"title": "Penguin habitats", "text": "Emperor penguins only live in Antarctica."},
]

# render the tool use prompt as a string:
grounded_generation_prompt = tokenizer.apply_grounded_generation_template(
    conversation,
    documents=documents,
    citation_mode="accurate",  # or "fast"
    tokenize=False,
    add_generation_prompt=True,
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [8]:
print(grounded_generation_prompt)

<BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># Safety Preamble
The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral.

# System Preamble
## Basic Rules
You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions.

# User Preamble
## Task and Context
You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or simil

In [9]:
response = client.text_generation(
    grounded_generation_prompt, model=model_id, max_new_tokens=512
)
print(response)

Relevant Documents: 0,1
Cited Documents: 0,1
Answer: The Emperor penguin is the tallest penguin species, growing up to 122 cm in height. They are native to Antarctica.
Grounded answer: The <co: 0,1>Emperor penguin</co: 0,1> is the tallest penguin species, growing up to <co: 0>122 cm in height.</co: 0> They are <co: 1>native to Antarctica.</co: 1>
