# Google Cloud Code ChatBot with Grounding

## Overview
In this tutorial, we'll build a chatbot that utilizes data from a datastore that contains URLs relating to Snakemake information as its source. Grounding in this context means ensuring that the model's responses are strictly based on the information available on the website. The data store, which contains information from publicly indexed websites using a web crawler, allows you to specify domains and configure search or recommendation features based on the data collected from these sites. Grounding is a component of RAG but in this example, it operates at a higher level by the model assuming the information it already knows is not completely accurate. This ensures that the model fully depends on the data source for its responses. In turn, this helps RAG enhance the quality and accuracy of text generation by incorporating relevant information from external knowledge sources. For additional details on Agent Builder grounding, please refer to the __[GCP_Grounding](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/notebooks/GenAI/GCP_Grounding.ipynb)__ tutorial. 

## Learning objectives
- Learn to create a Website data store.
- Learn to use an Vertex AI grounding.
                       
## Prerequisites
You must have enabled the Vertex AI, Compute Engine, and Agent Builder APIs.

## Pricing 
__[Google Cloud Pricing Calculator](https://cloud.google.com/products/calculator?hl=en)__ <br>
$0.80 per month based on:<br>
Search Enterprise Edition. Number of requests per month = 100<br>
Search LLM Add-On. Number of requests per month = 100<br>
Data Index. Amount of GiB indexed per month = 5<br>

## Get started


The first next step is to create the data store. Click on __'CREATE DATA STORE'__ to proceed.

Since we will use Snakemake URLs to extract information for our application, select __'Website Content'__.

![website](../../images/7_chatbot_grounding.jpeg)

Selecting __'Website Content'__ will prompt you to enter a list of URLs and specify any sites to exclude. Enterinh https is not needed. It is important to note that all the subdirectories of a domain can be extracted adding at the end of the URL a '__/*__'. i.e. in our case, we are extracting all the following webpages:
```
evodify.com/rna-seq-star-snakemake/*
github.com/twbattaglia/RNAseq-workflow/*
snakemake.readthedocs.io/en/stable/*
snakemake.readthedocs.io/en/v4.5.0/*
www.bioinformatics.babraham.ac.uk/training/Advanced_Python_Manual.docx
www.cd-genomics.com/genomics.html/*
www.cd-genomics.com/rna-seq-transcriptome.html/*
```
__Note:__ Make sure that 'Advanced website indexing' is __unchecked__. If the 'Advanced website indexing' option is enabled, you may encounter issues with grounding later on, as this option is intended for webpages owned by the user. For public webpages, ensure this option is disabled.

![website_settings](../../images/8_chatbot_grounding.png)

As the final step, you need to assign a name to the data store and click __"CREATE"__.

![datastore_name](../../images/9_chatbot_grounding.jpeg)

Be sure to copy and save the __'Data Store ID__,' as it will be needed later to configure the grounding tool.

![datastore_id](../../images/10_chatbot_grounding.jpeg)

Once the application is created, the next step is to set up the chatbot. To do this, click on the three-line "hamburger" icon in the upper left corner of the window to view all available GCP products. Then, select __'Vertex AI'__  then under Vertex AI Studio select __'Create Prompt'__. 

![chat](../../images/12_chatbot_grounding.jpeg)

Vertex AI Studio's Chat shows a preview of a few prompt templates, you can view the rest by browsing through the prompt gallery. You can also edit the chat's system instructions which will allow you to guide the model's behavior according to their specific needs and use cases. By setting a system instruction, you provide the model with extra context, enabling it to better understand the task, deliver more tailored responses, and follow specific guidelines throughout the entire interaction. To do this, click on __Edit__ in the top right corner.

![conversation](../../images/13_chatbot_grounding.jpeg)

The next step is to select the parameters that we want to be used for our model, select the __'Ground model responses'__ toggle and then __'Customize'__. To learn specifically about Grounding, you can review the __[GCP_Grounding](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/notebooks/GenAI/GCP_Grounding.ipynb)__ tutorial.  

![parameters](../../images/14_chatbot_grounding.jpeg)

Select __'Vertex AI search'__ as the grounding source, and in the __'Vertex AI datastore path'__ field, input the project ID, location, collections, and datastore ID in this format: __'projects/{PROJECT_ID}/locations/global/collections/default_collection/dataStores/{DATA_STORE_ID}'__. Once you've entered the required information, simply click save.

If you submit a prompt to the chatbot and receive an error, it's likely that the Vertex AI datastore path is incorrect. Once you have the correct path to connect the ChatBot to the datastore, you can prompt the application with questions related to the specific domain information stored in the datastore. 

In our case, we asked for a conversion of a short Python script to Snakemake code. 

As an example, the following Python script is used to be converted into Snakemake code:

```
from collections import Counter

def count_nucleotides(dna_sequence):  #Counts the frequency of each nucleotide in a DNA sequence
    return Counter(dna_sequence)
    
dna = "ATGCATGCATGCATGC"

counts = count_nucleotides(dna)
print(counts)

```


![prompt](../../images/16_chatbot_grounding.png)

In this particular case, the prompt is requesting to the application to convert a simple one-line Bash script to Snakemake. 

Prompt: Would you convert the following Bash script to Snakemake?:

```
#!/usr/bin/env bash 

fastqc -t 23 *.fastq.gz -o /path/to/output/dir && multiqc /path/to/output/dir -o /path/to/output/dir
```

![prompt02](../../images/17_chatbot_grounding.png)

## Conclusion

You have learned how to create a Website data store using the AI Applications and Grounding in Vertex AI Studio. Additionally, you now understand the key parameters needed to properly configure grounding for accurate information extraction from the Website data store.

## Clean Up

Please remember to delete or stop your Jupyter notebook and delete your data store and search app on Agent Builder to prevent incurring charges. And if you have created any other services like buckets, please remember to delete them as well.