#**Building a Coding Assistant using Snowflake Notebooks and Gemma 2B**





In this tutorial, we’ll guide you through the process of fine-tuning the Gemma 2B model for coding tasks using Snowflake Notebooks. Snowflake’s platform provides a powerful, scalable environment for machine learning, and by leveraging their integrated notebooks, you can easily customize models to suit your needs. Whether you're enhancing the model for specific programming languages or optimizing its understanding of code structure, this step-by-step guide will help you unlock the potential of Gemma 2B for coding-related tasks. Let’s dive into setting up your environment and getting started!

This command installs several essential Python libraries that are used for fine-tuning language models, such as the Gemma 2B model. Here's what each package does:

1. **`datasets`**:
   - This is the Hugging Face `datasets` library, which provides access to a wide variety of datasets in an easy-to-use format. It is commonly used for NLP tasks and comes with built-in functions for loading, preprocessing, and working with large datasets.
   
2. **`torch`**:
   - This installs **PyTorch**, a popular open-source machine learning framework. PyTorch is widely used for deep learning tasks such as building and training neural networks. You'll need it to fine-tune the Gemma 2B model, as the model will be based on PyTorch’s framework.
   
3. **`peft`**:
   - This stands for **Parameter-Efficient Fine-Tuning**, a library used for fine-tuning large models without requiring extensive computational resources. It allows the tuning of only a subset of parameters in the model while freezing the rest, making it useful for scaling models like Gemma 2B.

4. **`accelerate`**:
   - This is another Hugging Face library that simplifies the process of distributing models across different devices (such as CPUs and GPUs) for faster training. It's particularly helpful when working with large models and datasets, ensuring efficient use of computational resources.
   
5. **`bitsandbytes`**:
   - This is a library that allows for 8-bit optimizers, helping to reduce memory usage when working with large-scale models. By using 8-bit precision for model parameters, it can reduce the resource requirements significantly, which is beneficial for fine-tuning large models.
   
6. **`trl`**:
   - This stands for **Transformer Reinforcement Learning**, a library that integrates reinforcement learning (RL) techniques with transformer models. It allows fine-tuning models using reward signals, which can be useful for tasks like training models to produce higher-quality code or solving specific coding-related problems.

In summary, this command sets up the environment by installing the necessary packages for accessing datasets, training models with PyTorch, fine-tuning models efficiently, and managing memory usage effectively. These tools will work together to help you fine-tune the Gemma 2B model in Snowflake Notebooks.

In [None]:
!pip install datasets torch peft accelerate bitsandbytes trl

In [None]:
from datasets import Dataset, ClassLabel
from transformers import AutoTokenizer, AutoModelForCausalLM
import sys
#from utils import Concatenator
import pandas as pd
pd.set_option('display.max_colwidth', None)

import torch
import sentencepiece
import os
import json
from transformers import TrainerCallback
from contextlib import nullcontext
from transformers import default_data_collator, Trainer, TrainingArguments

from snowflake.snowpark.session import Session
from snowflake.snowpark import VERSION
import snowflake.snowpark.functions as F
from snowflake.ml.registry import model_registry
from snowflake.ml.model import deploy_platforms
from snowflake.ml.model.models import llm

import logging
logger = logging.getLogger("snowflake.snowpark.session")
logger.setLevel(logging.ERROR)
logger = logging.getLogger("snowflake.ml")
logger.setLevel(logging.ERROR)

This block of code imports various libraries and sets up the environment for fine-tuning the Gemma 2B model on Snowflake Notebooks. Here's a breakdown of what each part does:

1. **`from datasets import Dataset, ClassLabel`**:
   - `Dataset`: This is used to create or manipulate datasets from the Hugging Face `datasets` library.
   - `ClassLabel`: A special feature type that handles classification labels in a dataset, converting them between integers and human-readable strings.

2. **`from transformers import AutoTokenizer, AutoModelForCausalLM`**:
   - `AutoTokenizer`: Automatically loads the appropriate tokenizer for a model, which processes text inputs for the model.
   - `AutoModelForCausalLM`: Loads a pre-trained model for causal language modeling tasks, which is suitable for code generation and other text-generation tasks.

3. **`import sys`**:
   - Imports the system module, allowing you to interact with the Python interpreter and manipulate input/output or modify paths.

4. **`#from utils import Concatenator`**:
   - This line is commented out, indicating the use of a utility called `Concatenator`, which might be a custom tool for concatenating text or data. It's not being used currently.

5. **`import pandas as pd`**:
   - Imports `pandas`, a powerful data manipulation library commonly used for handling tabular data.
   
   - `pd.set_option('display.max_colwidth', None)`: This setting ensures that the entire content of any cell in a pandas DataFrame will be displayed without truncation, useful for viewing long text entries, like code.

6. **`import torch`**:
   - Imports the PyTorch framework, which is needed for running and training the neural network models.

7. **`import sentencepiece`**:
   - Imports the `sentencepiece` library, which is used for tokenizing text into subwords. This is commonly used with models trained on large corpora to handle rare words more efficiently.

8. **`import os`**:
   - Imports the operating system module, allowing interaction with the file system (such as reading/writing files or environment variables).

9. **`import json`**:
   - Imports the `json` module, which is useful for working with JSON data (like reading/writing model configurations or datasets in JSON format).

10. **`from transformers import TrainerCallback`**:
    - This imports a class from the `transformers` library that allows you to create custom callbacks for the model training process. You can use callbacks to trigger actions at certain points during training.

11. **`from contextlib import nullcontext`**:
    - `nullcontext`: This is used to indicate a "no-op" context manager, essentially allowing you to ignore context management in certain cases where it is not needed.

12. **`from transformers import default_data_collator, Trainer, TrainingArguments`**:
    - `default_data_collator`: This is used for batching and preparing data before feeding it into the model during training.
    - `Trainer`: A high-level API provided by the `transformers` library that handles the training loop, including model optimization, evaluation, and saving.
    - `TrainingArguments`: A configuration class for specifying various training options such as learning rate, batch size, number of epochs, etc.

13. **`from snowflake.snowpark.session import Session`**:
    - `Session`: This is used to establish a connection to Snowflake via Snowpark. Snowpark is a developer environment that allows you to build and execute data pipelines directly in Snowflake using Python.

14. **`from snowflake.snowpark import VERSION`**:
    - `VERSION`: Retrieves the version of Snowpark that you are using, useful for compatibility checks or logging.

15. **`import snowflake.snowpark.functions as F`**:
    - This imports functions from Snowpark that allow you to perform SQL-like operations within Snowflake from your Python code. It typically includes functions for data manipulation and querying.

16. **`from snowflake.ml.registry import model_registry`**:
    - `model_registry`: Used to register and track machine learning models within Snowflake. It enables version control and tracking of models for deployment and reproducibility.

17. **`from snowflake.ml.model import deploy_platforms`**:
    - `deploy_platforms`: This allows for deploying machine learning models to different platforms within Snowflake for use in production environments.

18. **`from snowflake.ml.model.models import llm`**:
    - `llm`: Refers to large language models (LLMs) in Snowflake’s ML platform. This is likely related to deploying or managing such models within Snowflake.

19. **`import logging`**:
    - Imports Python’s logging module, which is used to configure and manage log messages for better tracking and debugging.

20. **`logger = logging.getLogger("snowflake.snowpark.session")` & `logger.setLevel(logging.ERROR)`**:
    - These lines create a logger specifically for Snowflake's Snowpark session and set its log level to `ERROR`, meaning it will only log error messages (and not warnings or info-level messages).

21. **`logger = logging.getLogger("snowflake.ml")` & `logger.setLevel(logging.ERROR)`**:
    - Similarly, this creates a logger for Snowflake’s machine learning components and sets it to only log errors.

In summary, this script sets up various libraries for handling data, training machine learning models, interacting with Snowflake, and managing logging. The focus is on preparing for the fine-tuning of the Gemma 2B model within the Snowflake environment.

In [None]:
!huggingface-cli login --token ""

In [None]:
from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)


This code snippet involves configuring the model to use 4-bit quantization for efficient memory usage and computation during fine-tuning. Here’s a detailed explanation:

### **`from transformers import BitsAndBytesConfig`**
- This imports the `BitsAndBytesConfig` class from the `transformers` library. This class is used to configure how the model loads using reduced precision, such as 4-bit quantization, which helps save memory and computational resources.

### **`bnb_config = BitsAndBytesConfig(...)`**
- This creates a `BitsAndBytesConfig` object called `bnb_config`. The object is configured with specific parameters for loading the model in 4-bit precision.

### Configuration options:

1. **`load_in_4bit=True`**:
   - This tells the model to load using 4-bit precision instead of the standard 16-bit or 32-bit precision. By reducing the precision, it saves a significant amount of memory, making it possible to fine-tune larger models on hardware with limited memory.

2. **`bnb_4bit_quant_type="nf4"`**:
   - This sets the quantization type to **NF4** (Normal Float 4). NF4 is a type of quantization that is more efficient for machine learning tasks because it improves precision in certain operations compared to regular 4-bit quantization.

3. **`bnb_4bit_compute_dtype=torch.bfloat16`**:
   - This specifies the data type used for computations as **bfloat16** (Brain Floating Point 16). Bfloat16 is a precision format commonly used in machine learning because it strikes a balance between speed and accuracy, especially when running computations on GPUs.
   
4. **`bnb_4bit_use_double_quant=True`**:
   - **Double quantization** is a technique where the model is quantized twice to further compress the model size. It first quantizes the weights and then quantizes the quantized values. This reduces memory usage even more but requires some extra compute during inference.

### Purpose:
- The goal of this configuration is to reduce the memory footprint and optimize the model for faster inference and training, which is particularly useful when working with large language models like Gemma 2B.

In [None]:
from transformers import GemmaTokenizer, AutoModelForCausalLM, AutoTokenizer

# Load the entire model on the GPU 0
device_map = {"": 0}
model_id = "google/gemma-2b"
print('loading tokenizer')
tokenizer_id = "philschmid/gemma-tokenizer-chatml"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id)
print('loading model')
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map=device_map)
model.config.use_cache = False
model.config.pretraining_tp = 1

Here’s a breakdown of what each part of the code does:

1. **`from transformers import GemmaTokenizer, AutoModelForCausalLM, AutoTokenizer`**:
   - Imports the necessary components for tokenizing and loading the model.

2. **`device_map = {"": 0}`**:
   - Specifies that the entire model will be loaded onto GPU 0 for efficient processing.

3. **`model_id = "google/gemma-2b"`**:
   - Sets the model identifier to load the pre-trained **Gemma 2B** model from Google.

4. **`tokenizer_id = "philschmid/gemma-tokenizer-chatml"`**:
   - Sets the tokenizer identifier to load the pre-trained tokenizer compatible with the model.

5. **`tokenizer = AutoTokenizer.from_pretrained(tokenizer_id)`**:
   - Loads the tokenizer from the specified pre-trained model (`philschmid/gemma-tokenizer-chatml`).

6. **`model = AutoModelForCausalLM.from_pretrained(...)`**:
   - Loads the **Gemma 2B** model for causal language modeling, with 4-bit quantization (`bnb_config`) and the model being loaded onto GPU 0.

7. **`model.config.use_cache = False`**:
   - Disables caching during training to reduce memory usage.

8. **`model.config.pretraining_tp = 1`**:
   - Sets the tensor parallelism factor to 1, meaning no tensor parallelism is used during model pre-training.

In [None]:
from datasets import load_dataset

dataset_name = "lucasmccabe-lmi/CodeAlpaca-20k"
dataset = load_dataset(dataset_name, split="train")

In [None]:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

In [None]:
import torch
from transformers import Conv1D
import bitsandbytes as bnb

def find_all_linear_names(model):
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = name.split(".")
            # model-specific
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if "lm_head" in lora_module_names:  # needed for 16-bit
        lora_module_names.remove("lm_head")
    return list(lora_module_names)

target = find_all_linear_names(model)
print(target)

Here’s an explanation of what this code does:

### **`import torch`** and **`from transformers import Conv1D`**:
- These import PyTorch and the 1D convolutional layer (`Conv1D`) from the `transformers` library, though they are not directly used in this snippet.

### **`import bitsandbytes as bnb`**:
- This imports the `bitsandbytes` library (as `bnb`), which is used for handling 4-bit quantization and memory-efficient operations.

### **`find_all_linear_names(model)` function**:
This function finds the names of all layers in the model that are of type `bnb.nn.Linear4bit` (quantized linear layers).

- **`lora_module_names = set()`**:
   - Initializes an empty set to store the names of layers.

- **`for name, module in model.named_modules()`**:
   - Iterates over all the modules (layers) in the model. For each module, it checks if it is an instance of `bnb.nn.Linear4bit`.

- **`if isinstance(module, bnb.nn.Linear4bit)`**:
   - If the current module is a 4-bit quantized linear layer (`Linear4bit`), the name of the module is processed.

- **`names = name.split(".")`**:
   - Splits the module name by dots (`.`) to handle hierarchical naming (e.g., `layer.0.linear`).

- **`lora_module_names.add(names[0] if len(names) == 1 else names[-1])`**:
   - Adds either the first or last part of the split name to the `lora_module_names` set, depending on its length.

- **`if "lm_head" in lora_module_names:`**:
   - If `"lm_head"` (the output layer of the model) is in the set, it is removed because it's not needed for the fine-tuning process in some contexts (like 16-bit training).

- **`return list(lora_module_names)`**:
   - Returns the names of the layers that are quantized 4-bit linear layers as a list.

### **`target = find_all_linear_names(model)`**:
- Calls the function on the `model` (which was defined earlier) to find all relevant linear layer names.

### **`print(target)`**:
- Prints the list of identified module names that are 4-bit linear layers.

This function helps identify specific layers in the model that use 4-bit quantization, which could be useful for further customization, such as applying parameter-efficient fine-tuning techniques (like LoRA).

In [None]:
from peft import LoraConfig, PeftModel
from trl import SFTTrainer
from transformers import TrainingArguments

# Load LoRA configuration
peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.05,
        r=64,
        bias="none",
        target_modules=target,
        task_type="CAUSAL_LM",
)

args = TrainingArguments(
    output_dir="gemma-2b-coder", # directory to save and repository id
    num_train_epochs=1,                     # number of training epochs
    per_device_train_batch_size=1,          # batch size per device during training
    gradient_accumulation_steps=1,          # number of steps before performing a backward/update pass
    gradient_checkpointing=True,            # use gradient checkpointing to save memory
    optim="adamw_torch_fused",              # use fused adamw optimizer
    logging_steps=100,                       # log every 10 steps
    save_strategy="epoch",                  # save checkpoint every epoch
    bf16=True,                              # use bfloat16 precision
    tf32=True,                              # use tf32 precision
    learning_rate=2e-4,                     # learning rate, based on QLoRA paper
    max_grad_norm=0.3,                      # max gradient norm based on QLoRA paper
    warmup_ratio=0.03,                      # warmup ratio based on QLoRA paper
    lr_scheduler_type="constant",           # use constant learning rate scheduler
    push_to_hub=False,                       # push model to hub
    report_to="tensorboard",                # report metrics to tensorboard
)

Here’s a breakdown of what this code does:

### **`from peft import LoraConfig, PeftModel`**
- Imports the **LoRA (Low-Rank Adaptation)** configuration class `LoraConfig` and `PeftModel` for parameter-efficient fine-tuning, allowing selective fine-tuning of large models.

### **`from trl import SFTTrainer`**
- Imports `SFTTrainer`, which is used for **supervised fine-tuning (SFT)** of transformer models, often in RL (Reinforcement Learning) setups.

### **Loading LoRA Configuration (`LoraConfig`)**:
The LoRA configuration specifies how fine-tuning will be applied to the model using low-rank adaptation.

- **`lora_alpha=16`**: This is a scaling factor for the LoRA layers. It controls the amount by which the updated weights should be scaled.
  
- **`lora_dropout=0.05`**: Specifies a dropout rate of 5%, helping to regularize the training and prevent overfitting.

- **`r=64`**: Defines the rank of the low-rank matrices used in LoRA. A higher rank means more trainable parameters but requires more resources.

- **`bias="none"`**: Indicates that no additional bias parameters will be updated during fine-tuning.

- **`target_modules=target`**: Sets the specific modules that will be targeted for LoRA-based fine-tuning, which were identified earlier with `find_all_linear_names()`.

- **`task_type="CAUSAL_LM"`**: Specifies that the task is causal language modeling, which is suitable for models like Gemma 2B, which generate text based on prior context.

### **Training Arguments (`TrainingArguments`)**:
The `TrainingArguments` object configures the model training process with various parameters.

- **`output_dir="gemma-2b-coder"`**: Specifies the directory where the model checkpoints and other outputs will be saved.

- **`num_train_epochs=1`**: Sets the number of training epochs to 1. The model will go through the entire dataset once.

- **`per_device_train_batch_size=1`**: Specifies that each training batch will contain just 1 sample per device (such as a GPU).

- **`gradient_accumulation_steps=1`**: Gradients are accumulated for 1 step before a backward pass, helping manage memory with small batch sizes.

- **`gradient_checkpointing=True`**: Saves memory during training by storing only essential gradients, useful for training large models.

- **`optim="adamw_torch_fused"`**: Uses the AdamW optimizer with fused kernels for faster training on modern GPUs.

- **`logging_steps=100`**: Logs training progress every 100 steps, providing insights into the training metrics.

- **`save_strategy="epoch"`**: Saves model checkpoints at the end of every epoch.

- **`bf16=True`**: Enables bfloat16 (Brain Floating Point 16) precision for faster computation and reduced memory usage on compatible hardware.

- **`tf32=True`**: Enables TensorFloat32 precision, which improves speed on certain operations while maintaining accuracy on modern NVIDIA GPUs.

- **`learning_rate=2e-4`**: Sets the learning rate for the optimizer to `2e-4`, based on recommendations from the QLoRA paper.

- **`max_grad_norm=0.3`**: Clips the gradients to a maximum value of 0.3, also based on QLoRA’s fine-tuning practices.

- **`warmup_ratio=0.03`**: Uses 3% of the training steps for learning rate warmup, slowly increasing the learning rate at the start of training.

- **`lr_scheduler_type="constant"`**: Uses a constant learning rate throughout training, meaning the learning rate will not change after the warmup.

- **`push_to_hub=False`**: This indicates that the model won’t be pushed to Hugging Face’s Model Hub.

- **`report_to="tensorboard"`**: Reports training metrics to TensorBoard for real-time visualization and tracking.

### Purpose:
This configuration prepares the model for fine-tuning using LoRA (for parameter-efficient adaptation) with careful management of memory and computation. The training will be logged to TensorBoard, using both bfloat16 and tf32 precision for optimal performance on GPUs.

In [None]:
from trl import DataCollatorForCompletionOnlyLM

def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts


response_template = "\n ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)


In [None]:
from trl import SFTTrainer

max_seq_length = 1512 # max sequence length for model and packing of the dataset

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    formatting_func=formatting_prompts_func,
    data_collator=collator,
    args=args,
    packing=False,
)


Here’s an explanation of what this code does:

### **`from trl import SFTTrainer`**
- This imports the `SFTTrainer` class, which is used for supervised fine-tuning of transformer models. This class handles the training loop, optimization, and logging.

### **`max_seq_length = 1512`**
- Sets the maximum sequence length for the input data to **1512 tokens**. This means that any input sequences longer than this will be truncated, and shorter sequences will be padded to fit this length. This parameter is crucial for ensuring the model can handle the input data correctly and for managing memory usage.

### **Creating the Trainer (`SFTTrainer`)**:
The `trainer` object is instantiated with various parameters needed for training.

- **`model=model`**:
  - Passes the model (previously loaded and configured) that will be fine-tuned.

- **`train_dataset=dataset`**:
  - Specifies the dataset to be used for training. The `dataset` variable should contain the processed data suitable for the model.

- **`peft_config=peft_config`**:
  - Passes the LoRA configuration (`peft_config`) defined earlier, which tells the trainer how to apply low-rank adaptation during training.

- **`formatting_func=formatting_prompts_func`**:
  - Specifies a formatting function (`formatting_prompts_func`) that is likely used to preprocess the input data or format prompts before feeding them into the model. This function customizes how input examples are structured.

- **`data_collator=collator`**:
  - Passes a data collator function (`collator`) responsible for combining individual data samples into batches. This is essential for proper batching during training.

- **`args=args`**:
  - Provides the training arguments (`args`), which were defined earlier, to configure various aspects of the training process (e.g., batch size, learning rate, logging frequency).

- **`packing=False`**:
  - Indicates that data packing is turned off. Packing typically involves grouping smaller sequences together into a larger batch to utilize the maximum sequence length efficiently. Setting this to `False` means that each sequence will be processed individually without packing.

### Purpose:
This code sets up the **SFTTrainer** with the specified model, dataset, and configurations, preparing it for the supervised fine-tuning process. The `max_seq_length` helps ensure that the input data fits within the model's capabilities, while the other parameters customize the training process according to the needs of the task and the architecture being used.

In [None]:
with torch.no_grad():
    torch.cuda.empty_cache()

# start training, the model will be automatically saved to the hub and the output directory
trainer.train()

# save model
trainer.save_model()

Here’s an explanation of what this code does:

### **`with torch.no_grad():`**
- This statement creates a context manager that disables gradient calculation. This is important during inference or evaluation, as it reduces memory consumption and speeds up computations by preventing PyTorch from tracking operations for gradient computation. This is especially useful when you want to free up memory and reduce overhead.

### **`torch.cuda.empty_cache()`**
- This command clears the unused memory cached by the CUDA allocator. PyTorch uses a caching allocator to manage GPU memory, which can lead to fragmentation over time. By calling `empty_cache()`, you ensure that any memory that is no longer being used is released back to the system, potentially allowing for more efficient memory usage during the training process.

### **`trainer.train()`**
- This line starts the training process using the previously configured `trainer` object. During this step, the model will:
  - Process the training dataset.
  - Apply the defined fine-tuning techniques (including LoRA).
  - Log metrics according to the specified logging strategy.
  - Automatically save model checkpoints to the output directory specified in the `TrainingArguments`.

### **`trainer.save_model()`**
- After training is complete, this line saves the final model weights and configuration. The model will be saved in the specified output directory (`"gemma-2b-coder"`), allowing for later use or deployment. Depending on the setup, this may also push the model to a model hub if configured to do so.

### Purpose:
Overall, this code snippet efficiently manages GPU memory during training and initiates the training process with the specified parameters. After training is complete, it ensures that the trained model is saved for future use.

In [None]:
# Save trained model
new_model = "FinetunedModel"
trainer.model.save_pretrained(new_model)

In [None]:
!ls

In [None]:
SNOWFLAKE_DATABASE  = os.getenv("SNOWFLAKE_DATABASE")
SNOWFLAKE_SCHEMA    = os.getenv("SNOWFLAKE_SCHEMA")
SNOWFLAKE_WAREHOUSE = 'CONTAINER_RUNTIME_WH'
MODEL_NAME = "FineTunedGemma"
MODEL_VERSION = "FineTunedV1"
DEPLOYMENT_NAME = "FINETUNED_Gemma"

In [None]:
from snowflake.snowpark.session import Session

In [None]:
# Read the login token supplied automatically by Snowflake. These tokens are short lived and should always be read right before creating any new connection.
def get_login_token():
  with open("/snowflake/session/token", "r") as f:
    return f.read()

# Construct Snowflake connection params from environment variables.
def get_connection_params():
  return {
    "account": os.getenv("SNOWFLAKE_ACCOUNT"),
    "host": os.getenv("SNOWFLAKE_HOST"),
    "warehouse": SNOWFLAKE_WAREHOUSE,
    "database": SNOWFLAKE_DATABASE,
    "schema": SNOWFLAKE_SCHEMA,
    "authenticator": "oauth",
    "token": get_login_token()
  }

# Create Snowflake Session object
session = Session.builder.configs(get_connection_params()).create()
session.sql_simplifier_enabled = True
snowpark_version = VERSION

# Current Environment Details
print('Role                        : {}'.format(session.get_current_role()))
print('Database                    : {}'.format(session.get_current_database()))
print('Schema                      : {}'.format(session.get_current_schema()))
print('Warehouse                   : {}'.format(session.get_current_warehouse()))
print('Snowpark for Python version : {}.{}.{}'.format(snowpark_version[0],snowpark_version[1],snowpark_version[2]))

In [None]:
SNOWFLAKE_SCHEMA

In [None]:
from transformers import pipeline

pipe = pipeline(task="text-generation", model="FinetunedModel", tokenizer=tokenizer, max_length=200)

In [None]:
eos_token = tokenizer("<|im_end|>",add_special_tokens=False)["input_ids"][0]

def test_inference(prompt):
    prompt = pipe.tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=False, add_generation_prompt=True)
    outputs = pipe(prompt, max_new_tokens=100, do_sample=True, temperature=0.1, top_k=50, top_p=0.95, eos_token_id=eos_token)
    return outputs[0]['generated_text'][len(prompt):].strip()

In [None]:
import streamlit as st

st.title("CodingAssistant")

question = st.text_input("Enter Question", label_visibility="collapsed")


if question:
    st.markdown(test_inference(question))