In [45]:
# !pip install weaviate-client langchain tiktoken pypdf rapidocr-onnxruntime


# weaviate :
-  it is vector database similar to faiss and pinecone

It provides built-in features for data management, such as data ingestion, indexing, and query processing, all via API endpoints.



Weaviate can store data both locally and in the cloud, depending on how it is deployed and configured


`1. Local Storage:`

Local Deployment: When you run Weaviate on your local machine or on-premises servers, it stores data on the local filesystem. This is typically done using local disk storage, and the data resides on the hardware where Weaviate is installed.
File System and Disk Storage: Weaviate uses local disk space to store both vector embeddings and any associated metadata. This can include storing vector indexes, schema definitions, and other data objects.
Use Case: Local storage is suitable for development, testing, or scenarios where you want full control over data management and storage without relying on external cloud services.


`2. Cloud Storage:`

Cloud Deployment: Weaviate is designed to be cloud-native, which means it can be deployed on cloud infrastructure such as AWS, Google Cloud Platform (GCP), Microsoft Azure, and other cloud providers.
Containerization and Kubernetes: Weaviate can be deployed using Docker containers or orchestrated with Kubernetes. In a cloud environment, these containers can utilize cloud storage solutions like AWS EBS (Elastic Block Store), Google Cloud Persistent Disks, or Azure Managed Disks for data persistence.
Object Storage Integration: Weaviate can integrate with cloud object storage services (e.g., AWS S3, Google Cloud Storage, Azure Blob Storage) to handle backups, snapshots, or large data sets that are stored externally and accessed on demand.
Managed Cloud Services: There are also managed Weaviate services available that handle cloud storage and scalability for you, abstracting away the need to configure storage manually.



`3. Hybrid Storage Configurations:`

Hybrid Deployment: You can configure Weaviate to use both local storage and cloud storage simultaneously. For example, vector embeddings might be stored locally for fast access, while backup data or less frequently accessed information is stored in cloud storage.
Data Replication and Backup: Weaviate can use cloud storage for data replication and backup strategies, ensuring that data is safely stored off-site and can be restored if needed.

In [47]:
# from langchain.vectorstores import Weaviate
# !pip install -U langchain-community

In [48]:
# from langchain.vectorstores import Weaviate
from langchain.vectorstores import Weaviate
import weaviate



client = weaviate.Client(
    url=WEAVIATE_URL, auth_client_secret=weaviate.AuthApiKey(WEAVIATE_API_KEY)
)
    


            your code to use Python client v4 `weaviate.WeaviateClient` connections and methods.

            For Python Client v4 usage, see: https://weaviate.io/developers/weaviate/client-libraries/python
            For code migration, see: https://weaviate.io/developers/weaviate/client-libraries/python/v3_v4_migration
            


In [49]:
# !pip install sentence-transformers
from langchain.embeddings import HuggingFaceEmbeddings

embedding_model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings = HuggingFaceEmbeddings(
  model_name=embedding_model_name,
  # model_kwargs=model_kwargs
)




In [50]:
# load multiple types of pdf using the langchain just check with the document

'https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/pdf/'

'https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/pdf/'

In [1]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader(r"D:\study\RAG_PIPLELINE_WITH_LLAMA_INDEX_LANGCHAIN_GPT_GEMINI\RAG_APPLICATION_USING_MISTRAL_WEVIATE\sensors.pdf", extract_images=True)



In [2]:
pages = loader.load()

In [3]:
pages

[Document(metadata={'source': 'D:\\study\\RAG_PIPLELINE_WITH_LLAMA_INDEX_LANGCHAIN_GPT_GEMINI\\RAG_APPLICATION_USING_MISTRAL_WEVIATE\\sensors.pdf', 'page': 0}, page_content='Citation: Takenaka, K.; Kondo, K.;\nHasegawa, T. Segment-Based\nUnsupervised Learning Method in\nSensor-Based Human Activity\nRecognition. Sensors 2023 ,23, 8449.\nhttps://doi.org/10.3390/s23208449\nAcademic Editor: Eui Chul Lee\nReceived: 24 August 2023\nRevised: 22 September 2023\nAccepted: 11 October 2023\nPublished: 13 October 2023\nCopyright: © 2023 by the authors.\nLicensee MDPI, Basel, Switzerland.\nThis article is an open access article\ndistributed under the terms and\nconditions of the Creative Commons\nAttribution (CC BY) license (https://\ncreativecommons.org/licenses/by/\n4.0/).\nsensors\nArticle\nSegment-Based Unsupervised Learning Method in Sensor-Based\nHuman Activity Recognition\nKoki Takenaka *, Kei Kondo†and Tatsuhito Hasegawa\nGraduate School of Engineering, University of Fukui, Fukui 910-8507, 

In [4]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
docs = text_splitter.split_documents(pages)


In [55]:


vector_db = Weaviate.from_documents(
    docs, embeddings, client=client, by_text=False
)

In [56]:

print(
    vector_db.similarity_search(
        "what is HAR?", k=3)[1].page_content
    )

activities wherein the data are measured by sensors such as accelerometers and gyroscopes.
For example, human activities such as “walking” and “running” were predicted from
the measured accelerometer data. This technology is used in various applications from
analyzing sports movements [ 1] to healthcare such as health awareness maintenance and
the detection of risky movements by patients [ 2–4]. HAR is an essential technology because
the predicted results have some inﬂuence on decision-making.
Recently, neural networks (NNs) have been used for HAR [ 5–8]. HAR is typically
implemented using machine learning methods [ 9,10] such as support vector machines
(SVMs) and hidden Markov models (HMMs). Machine learning models require a training
dataset consisting of handcrafted features extracted from sensor data and corresponding
activity labels. In contrast to traditional machine learning, deep learning methods such


In [5]:
from langchain.prompts import ChatPromptTemplate

template="""You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use ten sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
"""
     

prompt=ChatPromptTemplate.from_template(template)
     

In [58]:
prompt


ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks.\nUse the following pieces of retrieved context to answer the question.\nIf you don't know the answer, just say that you don't know.\nUse ten sentences maximum and keep the answer concise.\nQuestion: {question}\nContext: {context}\nAnswer:\n"))])

In [67]:
# now in need llm model by huggingface we access using huggingface pipeline or using apikey
huggingkey = "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"

In [68]:
# !pip install -U langchain-huggingface
import os
os.environ['huggingkey'] = huggingkey
os.environ['huggingkey']

'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'

In [69]:
os.getenv('huggingkey')

'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'

In [70]:
from langchain import HuggingFaceHub
from langchain import HuggingFaceHub

model = HuggingFaceHub(
    huggingfacehub_api_token = huggingkey,
    repo_id="mistralai/Mistral-7B-Instruct-v0.1",
    model_kwargs={'temperature':1,"max_length":180}
)

In [71]:
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

In [72]:
output_parse = StrOutputParser() # create the object of the parser class


In [73]:
retrivers = vector_db.as_retriever()



In [74]:
rag_chain = (
    {'context':retrivers,
     'question':RunnablePassthrough()}
     | prompt | model |output_parse
)

In [79]:
text = rag_chain.invoke("what is human activity prediction")

In [82]:
text.split("\n")[-1]

'Human activity prediction is a technology that uses sensors to measure human activities such as walking and running. It is used in various applications such as sports movements analysis and healthcare. The predicted results have some influence on decision-making. Neural networks (NNs) have been used for human activity recognition (HAR) recently. HAR is typically implemented using machine learning methods such as support vector machines (SVMs) and hidden Markov models (HMMs). Machine learning models require a training'