##### Copyright 2024 Google LLC.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

## Guide to using Gemma 2 with Langfun, PyGlove and llama-cpp-python

[Langfun](https://github.com/google/langfun) is a versatile library designed to bridge the gap between natural language processing (NLP) and structured data manipulation. It allows you to effortlessly parse, interpret, and convert unstructured text into well-defined Python objects. This capability is invaluable for applications that require seamless integration between language models and data-driven systems.

[PyGlove](https://github.com/google/pyglove) is a powerful library for flexible and scalable configuration management and automated machine learning (AutoML). It provides tools for defining complex schemas and object representations, enabling you to create highly customizable and maintainable code structures. PyGlove's compatibility with Langfun ensures that the objects you generate from natural language prompts are both robust and easy to use.

[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) provides Python bindings for the [llama.cpp](https://github.com/ggerganov/llama.cpp) C++ library. This allows you to enjoy the performance optimizations of `llama.cpp` while benefiting from the simplicity and flexibility of Python. With llama-cpp-python, you get a convenient API for loading models, generating text, and customizing inference parameters.

[Gemma](https://ai.google.dev/gemma) is a family of lightweight, state-of-the-art open-source language models from Google. Built from the same research and technology used to create the Gemini models, Gemma models are text-to-text, decoder-only large language models (LLMs), available in English, with open weights, pre-trained variants, and instruction-tuned variants.
Gemma models are well-suited for various text-generation tasks, including question-answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop, or your cloud infrastructure, democratizing access to state-of-the-art AI models and helping foster innovation for everyone.

By combining `Langfun`, `PyGlove`, `llama.cpp` and `Gemma`, you can create applications that not only understand and process natural language but also convert that understanding into structured, actionable data. This integration allows you to build sophisticated systems that can interact with external APIs, databases, and other services using well-defined objects, streamlining workflows and enhancing functionality.

In this guide, you'll learn how to set up your environment, configure the Gemma 2 model, and implement various real-world use cases.

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/Gemma_with_Langfun_and_LlamaCpp_Python_Bindings.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>


## Setup

### Select the Colab runtime
To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:

1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.
2. Select **Change runtime type**.
3. Under **Hardware accelerator**, select **T4 GPU**.

### Gemma setup

**Before you dive into the tutorial, let's get you set up with Gemma:**

1. **Hugging Face Account:**  If you don't already have one, you can create a free Hugging Face account by clicking [here](https://huggingface.co/join).
2. **Gemma Model Access:** Head over to the [Gemma model page](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) and accept the usage conditions.
3. **Colab with Gemma Power:**  For this tutorial, you'll need a Colab runtime with enough resources to handle the Gemma 9B model. Choose an appropriate runtime when starting your Colab session.
4. **Hugging Face Token:**  Generate a Hugging Face access (preferably `write` permission) token by clicking [here](https://huggingface.co/settings/tokens). You'll need this token later in the tutorial.

**Once you've completed these steps, you're ready to move on to the next section where you'll set up environment variables in your Colab environment.**


### Configure your HF token

Add your Hugging Face token to the Colab Secrets manager to securely store it.

1. Open your Google Colab notebook and click on the 🔑 Secrets tab in the left panel. <img src="https://storage.googleapis.com/generativeai-downloads/images/secrets.jpg" alt="The Secrets tab is found on the left panel." width=50%>
2. Create a new secret with the name `HF_TOKEN`.
3. Copy/paste your token key into the Value input box of `HF_TOKEN`.
4. Toggle the button on the left to allow notebook access to the secret.


In [None]:
import os
from google.colab import userdata

# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env
# vars as appropriate for your system.
os.environ["HF_TOKEN"] = userdata.get("HF_TOKEN")

### Install dependencies

You'll need to install a few Python packages for Langfun and PyGlove and some dependencies to interact with HuggingFace along with `llama-cpp-python` and run the model. Find some of the releases supporting CUDA 12.2 [here](https://abetlen.github.io/llama-cpp-python/whl/cu122/llama-cpp-python/).

Run the following cell to install or upgrade it:

In [None]:
# The Langfun and PyGlove libraries
!pip install -q langfun pyglove

# The huggingface_hub library allows us to download models and other files from Hugging Face.
!pip install --upgrade -q huggingface_hub

# The llama-cpp-python[server] library allows us to leverage GPUs
!pip install llama-cpp-python[server]==0.2.90 \
  -q -U --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu122

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.5/297.5 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.1/70.1 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m585.4/585.4 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.3/244.3 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for termcolor (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m436.4/436.4 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m443.8/443.8 MB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

### Logging into Hugging Face Hub

Next, you'll have to log into the Hugging Face Hub using your access token. This will allow us to download the Gemma model.

In [None]:
from huggingface_hub import login

login(os.environ["HF_TOKEN"])

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


### Downloading the Gemma 2 Model
Once you're logged in, you can download the Gemma 2 model files from Hugging Face. The [Gemma 2 model](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF) is available in **GGUF** format, which is optimized for use with `llama.cpp` and compatible tools like Llamafile.

In [None]:
from huggingface_hub import hf_hub_download

# Specify the repository and filename
repo_id = 'bartowski/gemma-2-9b-it-GGUF'  # Repository containing the GGUF model
filename = 'gemma-2-9b-it-Q6_K.gguf'  # The GGUF model file

# Download the model file to the current directory
hf_hub_download(repo_id=repo_id, filename=filename, local_dir='.')

gemma-2-9b-it-Q6_K.gguf:   0%|          | 0.00/7.59G [00:00<?, ?B/s]

'gemma-2-9b-it-Q6_K.gguf'

### Running the Gemma 2 model using `llama-cpp-python`

You will initialize the model using the `llama-cpp-python` library by loading a pre-trained Gemma 2 model from HuggingFace.

In [None]:
!python -m llama_cpp.server \
  --model gemma-2-9b-it-Q6_K.gguf \
  --n_gpu_layers -1 \
  --host 0.0.0.0 \
  --port 8000 \
  --chat_format gemma > server.log 2>&1 &
!sleep 60

Hint: You can use `tail -f server.log` to preview the command's output if necessary.

In [None]:
from google.colab.output import eval_js

# (Optional) View the API docs for the llama.cpp python server here
# Uncomment the following line if you need to access the API docs
# print(eval_js("google.colab.kernel.proxyPort(8000)") + "docs")

### Import Libraries

In [None]:
import langfun as lf
import pyglove as pg

### Configuring the Gemma 2 Model

To use the Gemma 2 model with Langfun, we need to create a custom llama.cpp Python class since Langfun may not have built-in support for it yet.

In [None]:
from typing import Any
from langfun.core.llms import rest

# Mostly similar to LLaMA C++ Langfun module from here:
# https://github.com/google/langfun/blob/main/langfun/core/llms/llama_cpp.py
# But the following class is repurposed to work with the llama-cpp-python library


class LlamaCppPythonRemote(rest.REST):
  """The remote llama.cpp Python model."""

  @pg.explicit_method_override
  def __init__(self, url: str, model: str | None = None, **kwargs):
    # Use llama-cpp-python's completion URL scheme
    super().__init__(api_endpoint=f'{url}/completions', model=model, **kwargs)

  @property
  def model_id(self) -> str:
    """Returns a string to identify the model."""
    return f'LLaMAC++({self.model or ""})'

  def request(
      self, prompt: lf.core.Message, sampling_options: lf.core.LMSamplingOptions
  ) -> dict[str, Any]:
    """Returns the JSON input for a message."""
    request = dict()
    request.update(self._request_args(sampling_options))
    request['prompt'] = prompt.text
    return request

  def _request_args(self, options: lf.core.LMSamplingOptions) -> dict[str, Any]:
    """Returns a dict as request arguments."""
    args = dict(
        max_tokens=options.max_tokens or 1024,
        top_k=options.top_k or 50,
        top_p=options.top_p or 0.95,
    )
    if options.temperature is not None:
      args['temperature'] = options.temperature
    return args

  def result(self, json: dict[str, Any]) -> lf.core.LMSamplingResult:
    # Handle llama-cpp-python response handling
    return lf.core.LMSamplingResult(
        [lf.core.LMSample(item['choices'][0]['text'], score=0.0) for item in json['items']]
    )

  def _sample_single(self, prompt: lf.Message) -> lf.core.LMSamplingResult:
    request = self.request(prompt, self.sampling_options)

    def _sample_one_example(request):
      response = self._session.post(
          self.api_endpoint,
          json=request,
          timeout=self.timeout,
      )
      if response.status_code == 200:
        return response.json()
      else:
        error_cls = self._error_cls_from_status(response.status_code)
        raise error_cls(f'{response.status_code}: {response.content}')

    items = self._parallel_execute_with_currency_control(
        _sample_one_example,
        [request] * (self.sampling_options.n or 1),
    )
    return self.result(dict(items=items))

To programmatically add control tokens to any prompt for use with an instruction-tuned model, you can create a function that wraps the prompt with the necessary control tokens. Here’s an example Python function that can be reused to apply this formatting:

In [None]:
def format_gemma_prompt(user_input):
    """
    Formats a prompt for the Gemma instruction-tuned model by adding control tokens.
    Here you'll be instructing the model to adhere to Langfun and PyGlove.

    Parameters:
    user_input (str): The input from the user to be formatted.

    Returns:
    str: The formatted prompt with Gemma control tokens.
    """
    return f"""\
    You are an AI assistant designed to work seamlessly with Langfun and PyGlove. Your task is to parse user inputs into complex object representations compatible with these libraries. Always output as an object and nothing else. Do not include any additional text, explanations, or comments.

    Guidelines:
    Output Format: Always provide your output as an object representation suitable for Langfun and PyGlove.
    Consistency: Do not deviate from producing the object. Ensure your output is valid and can be used directly in code.
    No Extra Text: Exclude any additional explanations or comments outside of the object.

    Technical Details:
    Your goal is to help parse user inputs into object representations that can be directly utilized within Langfun and PyGlove code.

    <start_of_turn>user
    {user_input}
    <end_of_turn>
    <start_of_turn>model
    """

### Using Langfun with Gemma 2 from Google AI

Langfun allows you to define a clear schema and effortlessly convert natural language inputs into structured objects. These objects can then be directly plugged into external systems, databases, or external APIs, facilitating seamless data flow and integration without the need for manual parsing or data cleaning.


First, let's try a simple use case by asking Langfun to represent a simple mathematical statement's final answer using a Python class `Answer`

In [None]:
class Answer(pg.Object):
  result: int

r = lf.query(
    prompt=format_gemma_prompt('The result of one plus two is three'),
    schema=Answer,
    lm=LlamaCppPythonRemote("http://0.0.0.0:8000/v1")
)
print(r)

Answer(
  result = 3
)


Next, you'll take a look at real-world use cases demonstrating how you can leverage Langfun and Gemma 2 to build robust applications.

### Parsing a Job Description into a Structured Object


Automating the recruitment process can significantly enhance efficiency and accuracy in HR departments. By parsing job descriptions into structured objects, you can seamlessly integrate this data into your HR systems, enabling functionalities like resume matching, job postings, and analytics.

Let's start first by defining a `Job` schema using PyGlove, then provide a job description as input. Langfun, powered by the Gemma 2 model, processes the text and extracts relevant information, populating the `Job` object accordingly.


In [None]:
# Define the Job schema
class Job(pg.Object):
    title: str
    company: str
    location: str
    responsibilities: list[str]
    requirements: list[str]

# Job description text
job_description = """
We are looking for a Software Engineer to join our team at TechCorp in San Francisco.
Responsibilities include developing software solutions, collaborating with cross-functional teams, and participating in code reviews.
Requirements: Bachelor's degree in Computer Science, 3+ years of experience in software development, proficiency in Python and JavaScript.
"""

# Parse the job description
result = lf.query(
    prompt=format_gemma_prompt(job_description),
    schema=Job,
    lm=LlamaCppPythonRemote("http://0.0.0.0:8000/v1")
)
print(result)

Job(
  title = 'Software Engineer',
  company = 'TechCorp',
  location = 'San Francisco',
  responsibilities = [
    0 : 'developing software solutions',
    1 : 'collaborating with cross-functional teams',
    2 : 'participating in code reviews'
  ],
  requirements = [
    0 : "Bachelor's degree in Computer Science",
    1 : '3+ years of experience in software development',
    2 : 'proficiency in Python and JavaScript'
  ]
)


This structured data can then be used directly in your HR systems or external APIs for various purposes such as job matching and integration with Applicant Tracking Systems (ATS) for streamlined recruitment workflows.

### Enabling Automated Handling of Support Tickets

In this second example, you'll create a support ticket by incorporating variables into your prompt. In customer service applications, generating support tickets efficiently is crucial for timely issue resolution. By creating structured support tickets from user inputs, you can integrate seamlessly with ticketing systems, prioritize issues, and assign them to the appropriate teams.

Define a `SupportTicket` schema and use variables like `customer_name` and `issue` in your prompt. Langfun processes these variables within the context, generating a structured `SupportTicket` object that can be integrated into your customer service systems or external APIs. This lets you automatically create tickets in a system, easily track ticket statuses and generate reports.

In [None]:
customer_name = 'John Doe'
issue = 'I am unable to log into my account.'

In [None]:
# Define the SupportTicket schema
class SupportTicket(pg.Object):
    customer_name: str
    issue: str
    steps_taken: list[str]
    resolution: str | None

# Prompt with variables
prompt = 'Create a support ticket for {{customer_name}} who reported: "{{issue}}" and propose steps to be taken along with a final resolution'

# Generate the support ticket
result = lf.query(
    prompt=format_gemma_prompt(prompt),
    customer_name=customer_name,
    issue=issue,
    schema=SupportTicket,
    lm=LlamaCppPythonRemote("http://0.0.0.0:8000/v1")
)
print(result)

SupportTicket(
  customer_name = 'John Doe',
  issue = 'I am unable to log into my account.',
  steps_taken = [
    0 : 'Verify username and password accuracy.',
    1 : 'Attempt password reset.',
    2 : 'Check for browser cache or cookie issues.'
  ],
  resolution = None
)


In a scenario where your customer service team receives numerous inquiries daily, by automating the creation of support tickets, you can ensure that each issue is documented consistently and efficiently. The structured `SupportTicket` objects can be directly fed into your CRM or ticketing software, facilitating faster response times and better customer satisfaction.


### Parsing Customer Feedback into Structured Data

Understanding customer feedback is essential for improving products and services. You'll start by first defining a `Feedback` schema and provide customer feedback text. Langfun processes the feedback, determining sentiment and identifying key topics, and outputs a structured `Feedback` object. This allows you to parse customer feedback into structured data, aiding in sentiment analysis, report generation and topic modeling.

In [None]:
# Define the Feedback schema
from typing import Literal

class Feedback(pg.Object):
    customer_id: int
    feedback_text: str
    sentiment: Literal['positive', 'negative', 'mixed']
    topics: list[str]

# Customer feedback text
feedback_text = 'I love the new features in the app, but it crashes frequently when I try to upload photos.'

# Context for the model
context = f'Parse the following customer (id: 1) feedback into an object representation (Feedback): {feedback_text}'

# Parse the feedback
result = lf.query(
    prompt=format_gemma_prompt(context),
    schema=Feedback,
    lm=LlamaCppPythonRemote("http://0.0.0.0:8000/v1")
)
print(result)

Feedback(
  customer_id = 1,
  feedback_text = 'I love the new features in the app, but it crashes frequently when I try to upload photos.',
  sentiment = 'mixed',
  topics = [
    0 : 'new features',
    1 : 'app crashes',
    2 : 'photo uploads'
  ]
)


This example shows how the model can extract valuable insights from customer feedback, which is crucial for product improvement. Imagine you manage a mobile app with a large user base. By systematically parsing feedback, you can identify which features are well-received and which areas require attention. For instance, detecting frequent crashes related to photo uploads allows your development team to prioritize bug fixes, enhancing the overall user experience.


### Classifying Support Tickets Using `Literal` Types

Efficiently categorizing and prioritizing support tickets can significantly enhance your ticket triaging system. Proper classification ensures that issues are addressed by the right teams promptly, improving resolution times and customer satisfaction.

Define a `Ticket` schema with specific categories and priorities. Provide a support ticket description, and Langfun processes it to classify the ticket accordingly. Next, let's perform classification tasks using `Literal` types from the `typing` module, which is useful for ticket triaging systems.

In [None]:
# Define the Ticket schema
class Ticket(pg.Object):
    id: int
    issue: str
    priority: Literal['low', 'medium', 'high']
    category: Literal['billing', 'technical', 'account', 'other']
    assigned_to: str | None

# Support ticket description
ticket_description = 'Ticket ID: 12345. Customer reports that they are unable to update their billing information on the website.'

# Context for classification
context = f'Classify the following support ticket into the Ticket object: {ticket_description}'

# Classify the ticket
result = lf.query(
    prompt=format_gemma_prompt(context),
    schema=Ticket,
    lm=LlamaCppPythonRemote("http://0.0.0.0:8000/v1")
)
print(result)

Ticket(
  id = 12345,
  issue = 'Customer reports that they are unable to update their billing information on the website.',
  priority = 'medium',
  category = 'billing',
  assigned_to = None
)


In a customer service center, tickets often vary widely in nature and urgency. By automatically classifying tickets, you ensure that billing issues are handled by a finance team, technical problems by the IT department, and so forth. Prioritizing tickets based on their severity helps in managing workloads effectively and ensuring that high-priority issues are resolved swiftly. This demonstrates the model's ability to categorize and prioritize support tickets automatically.


###  Chain-of-Thought Reasoning

Chain-of-thought (CoT) reasoning involves breaking down complex problems into a series of logical steps. This approach enhances the model's ability to solve intricate tasks by systematically reasoning through each component.

In this example, you'll demonstrate the model's capability in performing CoT reasoning for a simple puzzle.


In [None]:
from typing import Optional, List

class Step(pg.Object):
  description: str
  step_output: float

class Solution(pg.Object):
  question: str
  steps: list[Step]
  final_answer: int


# Puzzle question
question = (
  'Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. '
  'She sells the remainder at the farmers\' market daily for $2 per fresh duck egg. '
  'How much in dollars does she make every day at the farmers\' market? '
  'Solve the puzzle and use Solution to represent it'
)

# Assume gemma2_llm is your initialized Gemma 9b model
result = lf.query(
    prompt=format_gemma_prompt(question),
    schema=Solution,
    lm=LlamaCppPythonRemote("http://0.0.0.0:8000/v1")
)
print(result)

Solution(
  question = "How much in dollars does she make every day at the farmers' market?",
  steps = [
    0 : Step(
      description = 'Eggs laid daily',
      step_output = 16.0
    ),
    1 : Step(
      description = 'Eggs eaten for breakfast',
      step_output = -3.0
    ),
    2 : Step(
      description = 'Eggs used for baking',
      step_output = -4.0
    ),
    3 : Step(
      description = 'Total eggs remaining',
      step_output = 9.0
    ),
    4 : Step(
      description = 'Earnings from selling eggs',
      step_output = 18.0
    )
  ],
  final_answer = 18
)


This demonstrates the model's capability in performing chain-of-thought reasoning.


In this notebook, you've explored how to leverage **Langfun**, **PyGlove**, **llama.cpp** and the **Gemma 2** model from **Hugging Face** to address a variety of real-world scenarios. Through practical use cases—including parsing job descriptions, creating support tickets, analyzing customer feedback, classifying support tickets, and performing chain-of-thought reasoning—you've seen how Langfun effortlessly converts unstructured text into structured, actionable objects. This showcases the model's effectiveness in real-world scenarios, enhancing the capabilities of your applications and seamlessly integrating with external systems.

Feel free to extend these examples to fit your specific needs and explore the full potential of Langfun and Gemma in your projects!
