# The Covid Helpline

This notebook implements a RAG model based a Llama dataset with information about Covid-19. The dataset from Llama is converted to embeddings using SBERT and the RAG model is made upon the "mistralai/Mistral-7B-Instruct-v0.1" model from HuggingFace.
The notebook is divided into 7 overall parts:

1) **Install and load libraries**

2) **Load dataset from LLama and create pdf**: This section loads the dataset fromn Llama, creates one long test with all the data, and exports the dataset to a pdf.

3) **Load pdf and split into chunks**: The pdf created in 2) is loaded into the notebook from Github. Furthermore, the dataset is split into chunks such that no chunk contains more tokens than allowed in the SBERT model.

4) **Create embeddings and save in ChromaDB**: All chunks created in 3) is convert to embeddings with 768 dimensions. These embeddings are stored in a Chromadb.

5) **Build the model**: The RAG is built by loading the "mistralai/Mistral-7B-Instruct-v0.1" model from HuggingFace.

6) **Prompt tuning**: Different prompts are tried tested and evaluated to determinne which prompt results in the best response from the model.

7) **Gradio interface**: The RAG model is implemented in a Gradio interface to provide a betterr user experience.

## Install and load libraries

In [None]:
!pip install accelerate --q

In [None]:
%time
!pip install pypdf --q
!pip install -qqq chromadb==0.4.10 --progress-bar off
!pip install -qqq sentence_transformers==2.2.2 --progress-bar off!pip install -Uqqq pip --progress-bar off
!pip install -qqq langchain==0.0.299 --progress-bar off
!pip install -qqq xformers==0.0.21 --progress-bar off
!pip install -qqq sentence_transformers==2.2.2 --progress-bar off
!pip install -qqq tokenizers==0.14.0 --progress-bar off
!pip install -qqq optimum==1.13.1 --progress-bar off
!pip install -qqq auto-gptq==0.4.2 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ --progress-bar off
!pip install -qqq unstructured==0.10.16 --progress-bar off
!pip install llama-index
!pip install reportlab

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 6.44 µs
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m286.1/286.1 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for pypika (pyproject.toml) ... [?25l[?25hdone

Usage:   
  pip3 install [options] <requirement specifier> [package-index-options] ...
  pip3 install [options] -r <requirements file> [package-index-options] ...
  pip3 install [options] [-e] <vcs project url> ...
  pip3 install [options] [-e] <local project path> ...
  pip3 install [options] <archive url/path> ...

option --progress-bar: invalid choice: 'off!pip' (choose from 'on', 'off')
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Prepari

In [None]:
# langchain: library for building language processing pipeline
from langchain.document_loaders import UnstructuredMarkdownLoader # loading unstructed text data in markdown format
from langchain.document_loaders import PyPDFLoader # loading text data from pdf files
from langchain.llms import HuggingFaceHub # for accessing models and dataset from HuggingFace
from langchain.text_splitter import RecursiveCharacterTextSplitter # splitting text data into chunks
from langchain.embeddings import HuggingFaceEmbeddings # working with embeddings using HuggingFace models
from langchain.vectorstores import Chroma # for managing vector stores and performinng similarity search
from langchain import HuggingFacePipeline #for creating pipelines with HF models
from langchain.chains import RetrievalQA # for building RAG system
from langchain import PromptTemplate # for create prompt template

from getpass import getpass # securely getting password inputs
import os # for filemanagement
from textwrap import fill # for wrapping text to speficy width
import torch # tensoroperations
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline #for NLP tasks using pretrained SBERT

# For downloading llama datasets
from llama_index.core.llama_dataset import download_llama_dataset

# For creating pdf
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet

## Load dataset from Llama and create pdf

First the dataset is downloaded from Llama open-source dataset. The chosen dataset contains questions and answers regarding Covid-19.

In [None]:
# Download dataset from llama
rag_dataset, documents = download_llama_dataset(
    "OriginOfCovid19Dataset", "./data"
)

In [None]:
df = rag_dataset.to_pandas()
df.head()

Unnamed: 0,query,reference_contexts,reference_answer,reference_answer_by,query_by
0,"What is the main focus of the article ""The Ori...","[Am. J. Trop. Med. Hyg. , 103(3), 2020, pp. 95...","The main focus of the article ""The Origin of C...",ai (gpt-3.5-turbo),ai (gpt-3.5-turbo)
1,"According to the article, what actions should ...","[Am. J. Trop. Med. Hyg. , 103(3), 2020, pp. 95...","According to the article, vigorous scientific,...",ai (gpt-3.5-turbo),ai (gpt-3.5-turbo)
2,"According to the context information, what war...","[In 2007, scientists studying coronaviruses wa...",Scientists studying coronaviruses warned in 20...,ai (gpt-3.5-turbo),ai (gpt-3.5-turbo)
3,How are viruses different from living organism...,"[In 2007, scientists studying coronaviruses wa...",Viruses are different from living organisms be...,ai (gpt-3.5-turbo),ai (gpt-3.5-turbo)
4,What are some examples of animal viruses that ...,[Studying animal viruses that have previously ...,Some examples of animal viruses that have prev...,ai (gpt-3.5-turbo),ai (gpt-3.5-turbo)


### Convert questions and answers into one long text

In [None]:
# Remove irrelevant columns
df = df[['query', 'reference_answer']]
df = df.rename(columns={'query': 'question',
                        'reference_answer': 'answer'})

# Create column with both question and answer
df['q_a'] = df['question'] + " " + df['answer']
df.head()

Unnamed: 0,question,answer,q_a
0,"What is the main focus of the article ""The Ori...","The main focus of the article ""The Origin of C...","What is the main focus of the article ""The Ori..."
1,"According to the article, what actions should ...","According to the article, vigorous scientific,...","According to the article, what actions should ..."
2,"According to the context information, what war...",Scientists studying coronaviruses warned in 20...,"According to the context information, what war..."
3,How are viruses different from living organism...,Viruses are different from living organisms be...,How are viruses different from living organism...
4,What are some examples of animal viruses that ...,Some examples of animal viruses that have prev...,What are some examples of animal viruses that ...


In [None]:
# Create one long string with all questions and answers
all_text = df['q_a'].str.cat(sep = ' [SEP] ') # each question/answer is seperated with [SEP] to indicate to the model that it's different  questions/answers
all_text



### Export as pdf

The dataset from Llama is converted into a pdf and exported to provide the correct format for creating embeddings

In [None]:
def create_pdf(text, filename):
    # Create a new PDF file
    doc = SimpleDocTemplate(filename, pagesize=letter)
    # Create a style sheet
    styles = getSampleStyleSheet()
    # Create a text object
    text_obj = []
    # Split the text into paragraphs
    paragraphs = text.split('\n')
    # Add each paragraph to the text object
    for para in paragraphs:
        text_obj.append(Paragraph(para, styles["Normal"]))
    # Add the text object to the PDF
    doc.build(text_obj)

pdf_filename = "covid.pdf"
create_pdf(all_text, pdf_filename)

## Load pdf and split into chunks

The dataset is loaded from GitHub as a pdf

In [None]:
loader = PyPDFLoader("https://github.com/AlexanderB111/Deep-Learning/raw/main/Final%20assignment/covid.pdf")
docs = loader.load()
len(docs) # number of pages in the pdf

6

Due to constraints in the SBERT framework, the pdf is split into chunks with a maximum size of 768 tokens. The chunks are set to overlap each other with 64 tokens to avoid the loss of context.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=768, chunk_overlap=64)
texts = text_splitter.split_documents(docs)
len(texts)

39

## Create embeddings and save in ChromaDB

Embeddings are created for each token using a general SBERT model. The output is of 768 dimensions

In [None]:
embeddings = HuggingFaceEmbeddings(
    model_name="all-mpnet-base-v2", # SBERT model
    model_kwargs={"device": "cuda"},
    encode_kwargs={"normalize_embeddings": True},
)

query_result = embeddings.embed_query(texts[0].page_content)
print(len(query_result))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


768


The embeddings are saved in a Chromadb to create the datalayer for our Gradio interface

In [None]:
db = Chroma.from_documents(texts, embeddings, persist_directory="db")

# Find the most similar chunk to the sentence "prevent the spreading of covid-19"
results = db.similarity_search("prevent the spreading of covid-19", k=2)
print(results[0].page_content) # control that the result is reasonable.

understanding how COVID-19 emerged? Some potential consequences of not understanding how
COVID-19 emerged include the possibility of additional coronavirus pandemics and the global spread
unable to effectively prevent future pandemics from occurring. [SEP] How can we prevent future
pandemics and the global spread of infectious agents? To prevent future pandemics and the global
spread of infectious agents, it is important to understand how they emerge and take necessary
measures. Some ways to prevent these emergencies include:
1. Early detection and surveillance: Implementing robust systems for early detection and surveillance of


## Build the model

This section will build the model. The model used is a "mistralai/Mistral-7B-Instruct-v0.1" model.

In [None]:
MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, torch_dtype=torch.float16, trust_remote_code=True, device_map="auto"
)

# Create a configuration for text generation based on the specified model name
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)

# Set the maximum number of new tokens in the generated text to 1024.
# This limits the length of the generated output to 1024 tokens.
generation_config.max_new_tokens = 1024

# Set the temperature for text generation. Lower values (e.g., 0.0001) make output more deterministic, following likely predictions.
# Higher values make the output more random.
generation_config.temperature = 0.0001 #Changing this value would be finetuning

# Set the top-p sampling value. A value of 0.95 means focusing on the most likely words that make up 95% of the probability distribution.
generation_config.top_p = 0.95 #Changing this value would be finetuning

# Enable text sampling. When set to True, the model randomly selects words based on their probabilities, introducing randomness.
generation_config.do_sample = True

# Set the repetition penalty. A value of 1.15 discourages the model from repeating the same words or phrases too frequently in the output.
generation_config.repetition_penalty = 1.15 #Changing this value would be finetuning


# Create a text generation pipeline using the initialized model, tokenizer, and generation configuration
text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=generation_config,
)

# Create a LangChain pipeline that wraps the text generation pipeline and set a specific temperature for generation
llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"temperature": 0})

tokenizer_config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]



As the model have been build, now some engineering to make the model fit our data better, will be performed.

## Prompt tuning

This section is doing prompt engineering to make the RAG-model better at answering questions related to the topic, which the model is trained for. The section tries 4 different prompt templates.

### Prompt1: Act as a Virologists. Use the following information to answer the question at the end

In [None]:
template = """
<s>[INST] <<SYS>>
Act as a Virologists.
Use the following information to answer the question at the end.<</SYS>>

{context}

{question} [/INST]
"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])


qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 2}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)

In [None]:
# Test the model
result = qa_chain(
    "Is Covid dangerous? Explain it to me as if i had no prior information about Covid")
print(fill(result["result"].strip(), width=80))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Covid-19, also known as the Coronavirus disease 2019, is an infectious
respiratory illness caused by the SARS-CoV-2 virus. It was first identified in
Wuhan, China in December 2019 and has since become a global pandemic. The virus
primarily spreads through respiratory droplets when an infected person talks,
coughs or sneezes, but it can also be contracted by touching surfaces
contaminated with the virus.  The symptoms of Covid-19 can range from mild to
severe and can include fever, dry cough, fatigue, shortness of breath, sore
throat, headache, muscle pain, loss of taste or smell, nausea or vomiting,
diarrhea, abdominal pain, and difficulty sleeping. In some cases, Covid-19 can
lead to serious complications such as pneumonia, acute respiratory distress
syndrome (ARDS), sepsis, and organ failure.  It's important to note that
Covid-19 can be particularly dangerous for certain populations, including older
adults, people with underlying medical conditions, pregnant women, and those who
have

### Prompt2: Act as a virologists. Use the following information to try and educate the questioner about Covid

In [None]:
template = """
<s>[INST] <<SYS>>
Act as a virologists.
Use the following information to try and educate the questioner about Covid
<</SYS>>

{context}

{question} [/INST]
"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])


qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 2}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)

In [None]:
# Test the model
result = qa_chain(
    "Is Covid dangerous? Explain it to me as if i had no prior information about Covid")
print(fill(result["result"].strip(), width=80))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Covid-19, also known as the Coronavirus disease 2019, is an infectious
respiratory illness caused by the SARS-CoV-2 virus. It was first identified in
Wuhan, China in December 2019 and has since become a global pandemic. The virus
primarily spreads through respiratory droplets when an infected person talks,
coughs or sneezes, but it can also be contracted by touching surfaces
contaminated with the virus.  The symptoms of Covid-19 can range from mild to
severe and can include fever, dry cough, fatigue, shortness of breath, sore
throat, headache, new loss of taste or smell, muscle pain or body aches, chills,
nausea, vomiting, diarrhea, abdominal pain, skin rash, hair loss, and
conjunctival congestion (red or stuffy eyes). In some cases, Covid-19 can lead
to serious complications such as pneumonia, acute respiratory distress syndrome
(ARDS), multi-organ failure, and death.  It's important to note that while most
people recover from Covid-19 without any long-term effects, there are still ma

### Prompt3: Act as a virologists. Use the following information to try and explain the dangers of covid


In [None]:
template = """
<s>[INST] <<SYS>>
Act as a virologists.
Use the following information to try and explain the dangers of covid
<</SYS>>

{context}

{question} [/INST]
"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])


qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 2}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)

In [None]:
# Test the model
result = qa_chain(
    "Is Covid dangerous? Explain it to me as if i had no prior information about Covid")
print(fill(result["result"].strip(), width=80))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Covid-19, also known as the Coronavirus disease 2019, is a highly contagious
virus that was first identified in Wuhan, China in December 2019. It belongs to
the same family of viruses that causes the common cold and SARS (severe acute
respiratory syndrome). The virus primarily spreads through respiratory droplets
when an infected person talks, coughs or sneezes, but it can also be contracted
by touching surfaces contaminated with the virus.  The symptoms of Covid-19 are
similar to those of the flu, including fever, dry cough, fatigue, body aches,
headache, new loss of taste or smell, sore throat, congestion or runny nose,
nausea or vomiting, and diarrhea. However, many people with Covid-19 do not
experience any symptoms at all, making it difficult to detect and control its
spread.  One of the main concerns about Covid-19 is its high mortality rate,
which has been estimated to be around 2% globally. This percentage is higher
among older adults and individuals with underlying health cond

### Prompt4: Act as a virologists. Use the following information to try and educate the questioner about Covid. Write it in 10 bullet points


In [None]:
template = """
<s>[INST] <<SYS>>
Act as a virologists.
Use the following information to try and educate the questioner about Covid. Write it in 10 bullet points
<</SYS>>

{context}

{question} [/INST]
"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])


qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 2}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)

In [None]:
# Test the model
result = qa_chain(
    "Is Covid dangerous? Explain it to me as if i had no prior information about Covid")
print(fill(result["result"].strip(), width=80))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


* COVID-19 (Coronavirus disease 2019) is an infectious respiratory illness
caused by the SARS-CoV-2 virus. It was first identified in Wuhan, China in
December 2019 and has since become a global pandemic. * The virus primarily
spreads through respiratory droplets when an infected person talks, coughs or
sneezes. It can also be contracted by touching surfaces contaminated with the
virus. * COVID-19 can cause a wide range of symptoms, ranging from mild to
severe, including fever, cough, difficulty breathing, fatigue, body aches,
headache, new loss of taste or smell, sore throat, congestion or runny nose,
nausea or vomiting, diarrhea, abdominal pain, and difficulty sleeping. * In some
cases, COVID-19 can lead to serious complications such as pneumonia, acute
respiratory distress syndrome (ARDS), multi-organ failure, and death. * The risk
of developing severe illness from COVID-19 increases with age, and certain
underlying health conditions such as diabetes, heart disease, cancer, and
weake

### Final remarks

As prompt2 seems to be most detailed, we will continue forward with this. To ensure that this is the current prompt template used we will rerun the prompt for this. Further a restriction is implemented due to the output restrictions set by gradio

In [None]:
template = """
<s>[INST] <<SYS>>
Act as a virologists.
Use the following information to try and educate the questioner about Covid.
Do not reply with more than 4 sentences!
<</SYS>>

{context}

{question} [/INST]
"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])


qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 2}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)

## Gradio interface

A Gradio interface is implemeted to enhance the user experience of the RAG model

In [None]:
# import modules to avoid issues stemming from underlying locale settings in Google Collab
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
!pip install gradio==3.50.2 -qqq
!pip install "pydantic>=1.9,<2.0" -qqq

In [None]:
import gradio as gr

In [None]:
# Define the function used for Gradio interface
def rag_model(query):
  result = qa_chain(query)
  result = fill(result["result"].strip(), width=80)
  return result

In [None]:
# Confirm that it works
result = rag_model("How can I avoid Covid-19?")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [None]:
print(result)

There are several steps you can take to avoid getting COVID-19: 1. Wear a mask
in public settings: Wearing a mask helps to prevent respiratory droplets from
spreading when an infected person talks, sneezes, or coughs. Make sure your mask
covers both your nose and mouth and fits securely against the sides of your
face. 2. Practice physical distancing: Keep at least 6 feet between yourself and
others outside of your household. Avoid large gatherings and crowded spaces. 3.
Wash your hands frequently: Use soap and water and wash for at least 20 seconds,
especially after being in a public place or after blowing your nose, coughing or
sneezing. If soap and water are not available, use hand sanitizer that contains
at least 60% alcohol. 4. Clean and disinfect regularly touched surfaces: Clean
and disinfect frequently touched objects and surfaces daily. This includes
doorknobs, light switches, phones, keyboards, and countertops. 5. Stay home if
you're feeling sick: If you develop symptoms of CO

In [None]:
# Build the Gradio interface
demo = gr.Interface(fn=rag_model,
                    inputs=[gr.Textbox(label = "Please ask me a question!")],
                    outputs=[gr.Text(label='Answer')],
                    title="The Covid Hotline",
                    description="""
                      Welcome to The Covid Helpline: Your Virtual Virologist.
                      In the face of the ongoing global pandemic, staying informed and receiving reliable guidance is crucial. The Covid Helpline is your trusted resource, powered by a Language Model (LLM) acting as your virtual virologist, dedicated to providing essential information, support, and assistance during these challenging times.

                      Our virtual virologist, equipped with the latest advancements in language understanding and reasoning, is here to answer your questions based on a comprehensive dataset of questions and answers regarding COVID-19. Whether you're seeking guidance on preventive measures, symptoms, testing locations, vaccination resources, or simply need someone to talk to during these uncertain times, The Covid Helpline's virtual virologist is here to assist you.

                      Our mission is simple: to empower individuals and communities with accurate, up-to-date information about COVID-19, and to offer guidance on navigating its impact on health, safety, and well-being.

                      At The Covid Helpline, we believe that together, with the help of our virtual virologist, we can overcome the challenges posed by COVID-19. We're here to support you every step of the way.

                      Stay informed, stay safe, and remember, our virtual virologist is at your service.

                      Welcome to The Covid Helpline: Your Virtual Virologist.
                    """
                    )
demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://3d8e0548779e79275e.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




## Export components

Finnally, the relevant model components are exported inn preperation for a Streamlit application. For further information regarding the streamlit, visit the README.md file in the Github repository on the following link:

https://github.com/AlexanderB111/Deep-Learning/tree/main/Final%20assignment

In [None]:
import pickle # To export model components

# Export prompt
pickle.dump(prompt, open('prompt.pkl','wb'))

# Export texts
pickle.dump(texts, open('texts.pkl','wb'))