<a href="https://colab.research.google.com/github/InduwaraGayashan001/Generative-AI/blob/main/Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

In [None]:
!pip -q install langchain langchain_community
!pip -q install pypdf
!pip -q install sentence_transformers
!pip install openai
!pip install tiktoken
!pip install tokenizers
!pip install faiss-cpu
!pip -q install unstructured

In [None]:
!pip install numpy==1.24.4
!pip install nltk==3.9.1

In [2]:
!pip install langchain_openai

Collecting langchain_openai
  Downloading langchain_openai-0.3.22-py3-none-any.whl.metadata (2.3 kB)
Downloading langchain_openai-0.3.22-py3-none-any.whl (65 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.3/65.3 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: langchain_openai
Successfully installed langchain_openai-0.3.22


In [3]:
import sys
import os
import torch
from langchain.document_loaders import UnstructuredURLLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQAWithSourcesChain

In [5]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [15]:
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get('GITHUB_TOKEN')

# Data Extraction

In [6]:
URLs =[
    'https://medium.com/@srikar.appal/paper-review-2-multi-digit-number-recognition-from-street-view-imagery-using-deep-convolutional-c915f2bdde67',
    'https://lmsys.org/blog/2023-03-30-vicuna/',
    'https://stability.ai/research/facecraft4d-animated-3d-facial-avatar-generation-from-a-single-image'

]

In [7]:
loaders = UnstructuredURLLoader(urls=URLs)
data = loaders.load()

In [8]:
data

[Document(metadata={'source': 'https://medium.com/@srikar.appal/paper-review-2-multi-digit-number-recognition-from-street-view-imagery-using-deep-convolutional-c915f2bdde67'}, page_content='Sitemap\n\nSign in\n\nWrite\n\nSign in\n\nPaper Review 2 — Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks\n\nSrikar Appalaraju\n\nSrikar Appalaraju\n\n3 min read\n\nApr 13, 2018\n\n--\n\nThis paper published in ICLR 2014 is again a rather old paper but following up on the last reviewed paper this paper follows up on it. The authors show that using a CNN they were able to automate street number extraction from street images.\n\nTraditional OCR techniques to extract text from images is performed in three stages — localization (figure out where in the image the text is), segmentation (segment out the individual characters from the localized part of the image), recognition (recognize the segmented characters). Traditionally all these three stages were do

In [9]:
len(data)

3

# Chunking

In [10]:
text_splitter = CharacterTextSplitter(separator='\n',chunk_size=1000, chunk_overlap=200)

In [11]:
text_chunks = text_splitter.split_documents(data)



In [12]:
len(text_chunks)

25

# Embeddings

In [16]:
endpoint = "https://models.inference.ai.azure.com"
model_name = "text-embedding-3-small"

embeddings = OpenAIEmbeddings(
    model = model_name,
    openai_api_base = endpoint,
)

In [17]:
query_result = embeddings.embed_query("Hello world")
len(query_result)

1536

# Vector Database

In [18]:
vectorstore = FAISS.from_documents(text_chunks, embeddings)

# Integrate with LLM

In [19]:
llm = ChatOpenAI(
             model = "openai/gpt-4o-mini",
             api_key=userdata.get('GITHUB_TOKEN'),
             base_url="https://models.github.ai/inference"
)

In [20]:
llm.predict("What is capital of France")

  llm.predict("What is capital of France")


'The capital of France is Paris.'

In [21]:
chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=vectorstore.as_retriever())

In [22]:
result = chain({"question": "How good is Vicuna?"}, return_only_outputs=True)

  result = chain({"question": "How good is Vicuna?"}, return_only_outputs=True)


In [23]:
result['answer']

"Vicuna is capable of generating detailed and well-structured answers, achieving more than 90% quality of outputs compared to OpenAI's ChatGPT and Google Bard. It has been evaluated favorably against other models, outperforming them in over 90% of cases. However, like other large language models, Vicuna has limitations, particularly with reasoning, mathematics, and ensuring the factual accuracy of its outputs.\n\n"

# Chatbot

In [24]:
while True:
  query=input(f"Prompt: ")
  if query == "exit":
    print("Exiting")
    sys.exit()
  if query == "":
    continue
  result = chain({"question": query}, return_only_outputs=True)
  print(f"Answer: {result['answer']}")


Prompt: What is Vicuna?
Answer: Vicuna is an open-source chatbot that has been fine-tuned from the LLaMA base model using user-shared conversations collected from ShareGPT. It is designed to generate detailed and well-structured answers, achieving over 90% quality compared to OpenAI's ChatGPT and Google Bard, while outperforming other models in most cases. The training of Vicuna-13B cost around $300, and its code and weights are publicly available for non-commercial use.


Prompt: How can we generate a 3D avatar model by single image?
Answer: A novel framework for generating high-quality, animatable 4D avatars from a single image has been proposed. This system utilizes shape, image, and video priors to create full-view, animatable avatars. Initially, a coarse shape is obtained through 3D-GAN inversion, followed by enhancing multiview textures for consistency using depth-guided warping signals. The framework also incorporates a video prior for expression animation, ensuring synchronizat

SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
