<a href="https://colab.research.google.com/github/Seenuprime/Summarizer-URL-/blob/main/PDFQUERY_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -q cassio datasets langchain openai tiktoken

In [2]:
!pip install langchain langchain_groq langchain-community



In [3]:
from langchain.vectorstores.cassandra import Cassandra
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain_groq import ChatGroq
from langchain.embeddings import HuggingFaceBgeEmbeddings




In [4]:
from datasets import load_dataset

## With Cassio, the engine powering the Astra DB integration in LangChain
## also initialize the DB connection
import cassio

In [5]:
!pip install PyPDF2



In [6]:
from PyPDF2 import PdfReader

In [7]:
ASTRADB_APP_TOKEN = 'Your access token'
ASTRADB_ID = 'Your AstraID'

In [8]:
hugging_face = "your huggingface token"
groq_api = 'your groq api token'

In [9]:
pdf_loader = PdfReader('ROBOTICS.pdf')

In [10]:
from typing_extensions import Concatenate

raw_text = ''
for i, page in enumerate(pdf_loader.pages):
  content = page.extract_text()
  if content:
    raw_text += content

### Initialize the connection to the database

In [11]:
cassio.init(token=ASTRADB_APP_TOKEN, database_id=ASTRADB_ID)

### Create a langchain Embeddings and LLM objects for later usage

In [12]:
!pip install sentence_transformers



In [31]:
llm = ChatGroq(model="llama3-groq-70b-8192-tool-use-preview", api_key=groq_api, max_tokens=300)
embedding = HuggingFaceBgeEmbeddings(model_name='all-MiniLM-L6-v2')



### Creating Langchain VectorStore backed by Astra DB

In [14]:
astra_vector_store = Cassandra(
    embedding=embedding,
    table_name = 'GenAI_Udemy',
    session=None,
    keyspace=None
)

In [15]:
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator='\n',
    chunk_size=800,
    chunk_overlap=200,
    length_function=len
)

texts = text_splitter.split_text(raw_text)

In [16]:
print(texts[0])

ENGINEERING ROBOTICS -
PROFESSIONAL ELECTIVEThe field ofrobotics has itsorigins inscience fiction .The term robot was derived
from theEnglish translation ofafantasy play written inCzechoslovakia around 1920 .
Ittook another 40years before themodern technology ofindustrial robotics began .
Today, Robots are highly automated mechanical manipulators controlled by
computers .
Robotics :-
Robotics isanapplied engineering science that has been referred toasa
combination ofmachine tool technology and computer science .Itincludes machine
design, production theory, micro electronics, computer programming &artificial
intelligence .
OR
"Robotics" isdefined asthe science ofdesigning and building Robots which are
suitable for real life application inautomated manufacturing and other non-


### Load the dataset into vector store

In [17]:
astra_vector_store.add_texts(texts[:50])

astra_vector_index = VectorStoreIndexWrapper(vectorstore=astra_vector_store)

In [18]:
astra_vector_index

VectorStoreIndexWrapper(vectorstore=<langchain_community.vectorstores.cassandra.Cassandra object at 0x781bd34d4df0>)

## Run the Q&A Cycle

In [35]:
query = "what are the specifications of robotics?"
answer = astra_vector_index.query(question=query, llm=llm).strip()
print(answer)



The specifications of robotics include:
1. Axis of motion
2. Work stations
3. Speed
4. Acceleration
5. Payload capacity
6. Accuracy
7. Repeatability
