<a href="https://colab.research.google.com/drive/1Sw9Hr-x9P8gy505TQ8gV2WND2fs8PD_4?resourcekey=0-xk7yw3YuoU02NhvJhXxClw">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Ask My Document
A Q&A Service Powered by AI (Vertex AI Palm2 model, VectorDB, Embeddings API, LangChain & Streamlit.) Hosted on Cloud Run.

[see it in action](https://askmydoc.app/)

## Setup

In [None]:
from IPython.display import clear_output

!pip -q install google-cloud-aiplatform==1.33.1
!pip -q install langchain==0.0.300 chromadb==0.4.12 watchdog==3.0.0
!pip -q install streamlit==1.27.0

clear_output()

# !pip show <packagename> to get information about the package

Restart the runtime: Runtime--> Restart runtime or run the 2 rows below

In [None]:
# import os
# os.kill(os.getpid(), 9)

🔐 Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.

In [None]:
from google.colab import auth as google_auth
google_auth.authenticate_user()

## Building Blocks

### Document Loaders

[WebBaseLoader](https://python.langchain.com/docs/integrations/document_loaders/web_base)

In [None]:
from langchain.document_loaders import WebBaseLoader

url = "https://cloud.google.com/blog/products/ai-machine-learning/the-rise-of-geneng-how-ai-changes-the-developer-role"
loader = WebBaseLoader(url)

data = loader.load()

data

### Text Spliters

Sample text to use

In [None]:
text="""Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.

Last year COVID-19 kept us apart. This year we are finally together again.

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.

With a duty to one another to the American people to the Constitution.

And with an unwavering resolve that freedom will always triumph over tyranny.

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated.

He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined.

He met the Ukrainian people.

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.

Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.

In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.

Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.

Please rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people.

Throughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos.

They keep moving.

And the costs and the threats to America and the world keep rising.

That’s why the NATO Alliance was created to secure peace and stability in Europe after World War 2.

The United States is a member along with 29 other nations.

It matters. American diplomacy matters. American resolve matters.

Putin’s latest attack on Ukraine was premeditated and unprovoked.


And fourth, let’s end cancer as we know it.

This is personal to me and Jill, to Kamala, and to so many of you.

Cancer is the #2 cause of death in America–second only to heart disease.

Last month, I announced our plan to supercharge
the Cancer Moonshot that President Obama asked me to lead six years ago.

Our goal is to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers from death sentences into treatable diseases.

More support for patients and families.

To get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health.

It’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more.

ARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimer’s, diabetes, and more.

A unity agenda for the nation.

We can do this.

My fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy.

In this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things.

We have fought for freedom, expanded liberty, defeated totalitarianism and terror.

And built the strongest, freest, and most prosperous nation the world has ever known.

Now is the hour.

Our moment of responsibility.

Our test of resolve and conscience, of history itself.

It is in this moment that our character is formed. Our purpose is found. Our future is forged.

Well I know this nation.

We will meet the test.

To protect freedom and liberty, to expand fairness and opportunity.

We will save democracy.

As hard as these times have been, I am more optimistic about America today than I have been my whole life.

Because I see the future that is within our grasp.

Because I know there is simply nothing beyond our capacity.

We are the only nation on Earth that has always turned every crisis we have faced into an opportunity.

The only nation that can be defined by a single word: possibilities.

So on this night, in our 245th year as a nation, I have come to report on the State of the Union.

And my report is this: the State of the Union is strong—because you, the American people, are strong.

We are stronger today than we were a year ago.

And we will be stronger a year from now than we are today.

Now is our moment to meet and overcome the challenges of our time.

And we will, as one people.

One America.

The United States of America.

May God bless you all. May God protect our troops."""



Split the text and show the chunks

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

length_function = len

splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", " ", ""],
    chunk_size=400,
    chunk_overlap=100,
    length_function=length_function,
)
text = text
splits = splitter.split_text(text)

for split in splits:
  print(split)
  print("=" * 200)

### Embeddings

Create embeddings and show the first 2

In [None]:
from langchain.embeddings import VertexAIEmbeddings

# Input variables
project_id = "landing-zone-demo-341118" # @param {type:"string"}

embeddings_model = VertexAIEmbeddings(project=project_id)

embeddings = embeddings_model.embed_documents(splits)
# print(splits[0])
# print("=" * 100)
# print(splits[1])

from tabulate import tabulate
table_list = [
    ['Split', 'Embedding'],
    [splits[0], embeddings[0]],
    [splits[1], embeddings[1]]
]

print(tabulate(table_list[1:],headers=table_list[0], tablefmt="grid"))

# len(embeddings), len(embeddings[0])
# embeddings[0]

### Vector DB

#### Chroma

Load the embeddings to Vector DB. Show the first DB record

In [None]:
from langchain.vectorstores import Chroma

url_text="https://cloud.google.com/blog/products/ai-machine-learning/the-rise-of-geneng-how-ai-changes-the-developer-role"

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
docs = WebBaseLoader(url_text).load()
split_texts = text_splitter.split_documents(docs)

store = Chroma.from_documents(split_texts, VertexAIEmbeddings(), collection_name="hello")

# store.get(include=['embeddings','documents', 'metadatas'], limit=1)

embeddings=store.get(include=['embeddings'], limit=1)

documents = store.get(include=['documents'], limit=1)

from tabulate import tabulate
table_list = [
    ['embeddings', 'documents'],
    [embeddings, documents],

]

print(tabulate(table_list[1:],headers=table_list[0], tablefmt="grid"))



#### Cloud SQL

##### Read Me

*   Since this Colab runs outside of GCP, it will only connect to Cloud SQL with Public IP access. (Not recommended for production)
*   You will have to add the external IP of this colab to the A*uthorized networks* of your Cloud SQL instance.
  *   Run *!curl ipecho.net/plain* to get the external IP
*   List item






In [None]:
!pip install -q google-cloud-secret-manager==2.16.2 pgvector==0.1.8

In [None]:
from google.cloud import secretmanager

def get_from_secrets_manager(secret_name):
    client = secretmanager.SecretManagerServiceClient()

    name = f"projects/{project_id}/secrets/{secret_name}/versions/1"

    # Access the secret version.
    response = client.access_secret_version(request={"name": name})

    # Extract the payload.
    payload = response.payload.data.decode("UTF-8")

    return payload

In [None]:
# Get the external IP of the Colab
!curl ipecho.net/plain


In [None]:
from langchain.vectorstores.pgvector import PGVector
import uuid

CONNECTION_STRING = get_from_secrets_manager("pgvector")

url_text="https://cloud.google.com/blog/products/ai-machine-learning/the-rise-of-geneng-how-ai-changes-the-developer-role"

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
docs = WebBaseLoader(url_text).load()
split_texts = text_splitter.split_documents(docs)

embeddings = VertexAIEmbeddings()

# Create a vectorstore from documents
store = PGVector(
                    collection_name="demo-" + str(uuid.uuid4()),
                    connection_string=CONNECTION_STRING,
                    embedding_function=embeddings,
                           )

# Add documents to the vectorstore
store.add_documents(split_texts)

## Run on Streamlit

Clone [Github Repo](https://github.com/UriKatsirPrivate/askmydoc-colab)

In [None]:
!git clone https://github.com/UriKatsirPrivate/askmydoc-colab.git

Input variables

In [None]:
# Input variables
project_id = "landing-zone-demo-341118" # @param {type:"string"}
region = "us-central1" # @param {type:"string"}
model_name = "text-bison" # @param ["text-bison", "text-bison-32k", "code-bison", "code-bison-32k"]
max_output_tokens = 1024 # @param {type:"number"}
temperature = 0.1 # @param {type:"slider", min:0, max:1, step:0.1}
top_p = 0.8 # @param {type:"slider", min:0, max:1, step:0.1}
top_k = 40 # @param {type:"slider", min:1, max:40, step:1}

Install localtunnel

In [None]:
!npm install -q localtunnel

Run the Service

In [None]:
!streamlit run /content/askmydoc-colab/app.py {project_id}  \
                                                {region}  \
                                                {model_name}  \
                                                {max_output_tokens} \
                                                {temperature} \
                                                {top_p} \
                                                {top_k} \
                                               &>/content/logs.txt & curl ipv4.icanhazip.com \
                                               & npx localtunnel --port 8501


# !streamlit run /content/askmydoc-colab/app.py "landing-zone-demo-341118"  \
#                                                 {region}  \
#                                                 "text-bison"  \
#                                                 1024 \
#                                                 0.1 \
#                                                 0.8 \
#                                                 40 \
#                                                &>/content/logs.txt & curl ipv4.icanhazip.com \
#                                                & npx localtunnel --port 8501

## Deploy to Cloud Run

###Prerequisites

####GCP
1.   [IAM service account](https://cloud.google.com/iam/docs/service-accounts-create#creating) with Cloud Run Invoker and Vertex AI User permissions

2.  [Artifact registry Docker repo](https://cloud.google.com/artifact-registry/docs/repositories/create-repos#create-console) (Standard)




####For this Colab

1.   Add *Editor* permissions to the user you used in the *Authenticate to GCP* above.
2.   Review and modify deploy.sh
  * Replace SERVICE_ACCOUNT_EMAIL with your own service account.
        * The service account should have _Cloud Run Invoker_ and _Vertex AI User_ permissions.
  * Replace ARTIFACT_REGISTRY_NAME with your own.
  * Replace GOOGLE_CLOUD_PROJECT with your own.
  * Replace SERVICE_NAME with your own.
3.   Review and modify initialization.py
  * Replace *project* with your own.

###Deploy the Service

In [None]:
%%shell

cd askmydoc-colab
chmod +x deploy.sh
sh deploy.sh

## Utilities

### Create a python file

In [None]:
# %%writefile app_sample.py

# import streamlit as st

# st.write('Hello, *World1* :sunglasses:')



###Shell Files

[examples](https://colab.sandbox.google.com/drive/1N7p0B-7QWEQ9TIWRgYLueW03uJgJLmka#scrollTo=i7cDqnvavT9i) and [here](https://colab.sandbox.google.com/github/PlantsAndPython/PlantsAndPython/blob/master/M_7_SCRIPTING_WITH_BASH/0_Lessons/7.1_Scripting_with_bash.ipynb)

In [None]:
# %%shell

# export GOOGLE_CLOUD_PROJECT=landing-zone-demo-341118

### Zip Folder

In [None]:
# Folder zip to ssistwith folder download

# !zip -r /content/file.zip /content/askmydoc-workshop

### Print Tables

[Print Simple Tables](https://colab.sandbox.google.com/github/darrida/darrida-fastpages/blob/master/_notebooks/2020-11-23-Print%20Simple%20Tables.ipynb)