<a href="https://colab.research.google.com/github/Laiba-Abid-Dev/Llama2_ChatBot/blob/main/ProjectTask1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Details**
  **Pinecone API Key:** The Pinecone vector database can store vector embeddings of documents or conversation history, allowing the chatbot to retrieve relevant responses based on the user’s input.
  
  **Streamlit Replicate API Key:** This is how we will apply the Llama2 model for our chatbot.

  **Langchain:** For creating chains and RAG, HuggingFace (for embeddings)

In [None]:
!pip install pinecone-client langchain
!pip install pypdf
!pip install sentence-transformers
!pip install replicate

In [18]:
import os
import sys
import pinecone
from langchain.llms import Replicate
from langchain.vectorstores import Pinecone
import pinecone
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import ConversationalRetrievalChain

In [4]:
# replicate API Token
os.environ['REPLICATE_API_TOKEN'] = "r8_VIMdddF3fhNUVT1Gm2W3X8MYnHsaoMF38GXfr"

# initialize pinecone
pinecone.init(api_key="8e218c31-dd8d-4291-8d15-5d9af2032d55", environment="gcp-starter")

In [5]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"

Mounted at /content/gdrive


In [14]:
# load and preprocess the pdf doc
import glob

pdf_dir = '/content/gdrive/MyDrive/Dataset'
pdf_files = glob.glob("%s/*.pdf" % pdf_dir)

for file in pdf_files:
  loader = PyPDFLoader(file)
  documents = loader.load()

  # split the docs into smaller chunks for processing
  text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
  texts = text_splitter.split_documents(documents)
  # print(texts)



In [26]:
# using huggingFace embeddings for transforming text into numerical vectors
embeddings = HuggingFaceEmbeddings()

Downloading (…)99753/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)0cdb299753/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)db299753/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)753/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)99753/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading (…)9753/train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

Downloading (…)0cdb299753/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)b299753/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [29]:
# setup pinecone vector db
# index = pinecone.Index("my-index")
# vectorDB = Pinecone.from_existing_index("my-index", embeddings)

# pinecone.list_indexes()

index_name = "demo-index"

# First, check if our index already exists. If it doesn't, we create it
if index_name not in pinecone.list_indexes():
    # we create a new index
    pinecone.create_index(name=index_name, metric="cosine", dimension=768)
# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`
vectorDB = Pinecone.from_documents(texts, embeddings, index_name=index_name)

In [30]:
# initialize replicate llama2 model

llm = Replicate(
    model="a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5",
    input={"temperature": 0.75, "max_length": 3000}
)



In [31]:
# setup conversational retrieval chain

qa_chain = ConversationalRetrievalChain.from_llm(
    llm,
    vectorDB.as_retriever(search_kwargs={'k':2}),
    return_source_documents=True
)

In [37]:
chat_history = []
while True:
  query = input('Prompt: ')
  if query.lower() in ['q', 'quit', 'exit']:
    print('Bye....')
    sys.exit()
  result = qa_chain({'question': query, 'chat_history': chat_history})
  print('Answer:  ' + result['answer'] + '\n')
  chat_history.append((query, result['answer']))

Prompt: what is future of AI in salesforce
Answer:  Based on the information presented, it appears that AI is poised to play a significant role in the future of business, particularly in the areas of data analysis, decision-making, and autonomous labor. However, there are also potential risks and challenges associated with the adoption of AI, such as unwanted costs, infrastructure limitations, and the need for expertise in the field. Additionally, the use of AI in warfare and other ethical considerations raises concerns about the future of human dignity. Overall, it is important for businesses to carefully consider the benefits and risks of AI and to approach its adoption with caution and strategic planning.

Prompt: exit
Bye....


SystemExit: ignored

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


# **Sample Questions**

1.   What is the main goal of the thesis discussed in the provided text?
2.   Can you please clarify what you mean by "the analytic side of AI" and "Salesforce's Einstein"?
3.   What is future role of AI in CRM?
4.   what are the uses of AI in CRM
5.   how is AI integrated for routine tasks
6.   what is future of AI in salesforce

