![image.png](attachment:image.png)

https://www.advancinganalytics.co.uk/blog/2023/11/7/10-reasons-why-you-need-to-implement-rag-a-game-changer-in-ai

![image.png](attachment:image.png)

![image.png](attachment:image.png)

# what is use of tiktoken
![image.png](attachment:image.png)

`RapidOCR` is an open-source, high-performance optical character recognition (OCR) toolkit designed to recognize text in images. It is particularly popular for its speed and accuracy in real-time text recognition applications. RapidOCR is based on deep learning models and is often used in various scenarios, such as:

Text Extraction from Images: RapidOCR can be used to extract text from images, which is useful in applications like scanning documents, reading text from photos, and extracting information from screenshots.

Real-Time OCR: Due to its optimized performance, RapidOCR is suitable for real-time applications, such as reading text from a camera feed or processing video frames on the fly.

Multilingual Support: RapidOCR supports multiple languages, making it suitable for global applications where text in various languages needs to be recognized and processed.

Text Detection and Recognition: It not only recognizes text but also detects the regions of interest in images that contain text, which is valuable for document analysis and automated data entry tasks.

In [56]:

# !pip install langchain tiktoken rapidocr-onnxruntime

# Data Ingestion


In [58]:
# dowload the data 

import requests

url = 'https://www.brainwired.in/'
response = requests.get(url)
response

response.text

'<!DOCTYPE html><html lang="en"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width"/><meta property="og:type" content="website"/><meta property="og:url" content="https://www.brainwired.in"/><meta property="og:title" content="BrainWired | WeStock ( Enhancing Livestock productivity )"/><meta property="og:description" content="Brainwired is an agritech startup based out of India, which has developed a livestock health monitoring, and tracking system named WeSTOCK, that uses an IoT ear tag and a unique ML algorithm to identify sick and pregnant livestock and alert farmers accordingly. "/><meta property="og:image" content=""/><meta property="twitter:card" content="summary_large_image"/><meta property="twitter:url" content="https://www.brainwired.in"/><meta property="twitter:title" content="BrainWired | WeStock ( Enhancing Livestock productivity )"/><meta property="twitter:description" content="Brainwired is an agritech startup based out of India, which has develo

In [59]:
# !pip install -qU langchain_community

In [1]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(["https://www.brainwired.in/about-us","https://www.brainwired.in/","https://www.brainwired.in/blog","https://www.brainwired.in/career","https://www.brainwired.in/our-team"],)
loader


USER_AGENT environment variable not set, consider setting it to identify your requests.


In [2]:
loader = WebBaseLoader(["https://www.brainwired.in/about-us","https://www.brainwired.in/","https://www.brainwired.in/blog","https://www.brainwired.in/career","https://www.brainwired.in/our-team"],)
loader


<langchain_community.document_loaders.web_base.WebBaseLoader at 0x2131b90f370>

In [3]:
loader.requests_kwargs

{}

In [4]:
docs = loader.load()

In [5]:
docs[0]

Document(metadata={'source': 'https://www.brainwired.in/about-us', 'title': 'About Us | Brainwired', 'description': ' Brainwired is an agritech startup based out of India, which has developed a livestock health monitoring, and tracking system named WeSTOCK', 'language': 'en'}, page_content='About Us | BrainwiredAbout Us “Providing affordable and efficient livestock health monitoring and tracking system.“Brainwired is an agritech startup based out of India, which has developed a livestock health monitoring, and tracking system named WeSTOCK, that uses an IoT ear tag and a unique ML algorithm to identify sick and pregnant livestock and alert farmers accordingly.“Your Livestock, Our Care“Our team has a passion for making things with real value. This has led us to assemble a multi-talented group that can do just about anything. We believe in creating a brand through strategy, story-telling, products, and integrated experiences on web, mobile, and in the world. So that we become a household

In [6]:
docs[1]

Document(metadata={'source': 'https://www.brainwired.in/', 'title': 'BrainWired | WeStock ( Enhancing Livestock productivity )', 'description': 'Brainwired is an agritech startup based out of India, which has developed a livestock health monitoring, and tracking system named WeSTOCK, that uses an IoT ear tag and a unique ML algorithm to identify sick and pregnant livestock and alert farmers accordingly.', 'language': 'en'}, page_content="BrainWired | WeStock ( Enhancing Livestock productivity )Brainwired is an agritech startup based out of India, which has developed a livestock health monitoring, and tracking system named WeSTOCK. Our motive is to empower the lifestyle of all livestock farmers with high end technology at an affordable rate. Our team has a passion for making things with real value. This has led us to assemble a multi-talented group that can do just about anything. We believe in creating a brand through strategy, story-telling, products, and integrated experiences on web

In [7]:
docs[2]

Document(metadata={'source': 'https://www.brainwired.in/blog', 'title': 'Blog| Brainwired', 'description': 'Our Achievement, Accolades and Dreams', 'language': 'en'}, page_content="Blog| BrainwiredOur BlogsMy leadership learnings and regrets: Shark Tank India tested every ounce of my inner strength  This is the third article in a series published every week, where Namita Thapar, Executive Director, Emcure Pharmaceuticals relives her experiences as a Shark on Shark Tank India..... Read more ... Namita Thapar/8 Feb, 2022From Get-A-Whey To Skippi, Here's How These Startups' Sales Changed After Coming To Shark Tank IndiaNoShark Tank India season 1 was a refreshing change for prime time Indian television. It was the show that brought the entrepreneurial conversation to Indian  .... Read more ...CAREER/May 14, 2022 Agri India Hackathon Winners   WESTOCK - IoT/ML based ear tag for livestock health monitoring and tracking system while observing livestock behavioural parameters...... Read more 

In [8]:
docs[3]

Document(metadata={'source': 'https://www.brainwired.in/career', 'title': 'Career| Brainwired', 'description': 'Come, Be a part of Brainwired', 'language': 'en'}, page_content="Career| BrainwiredJoin the team !  We're looking for passionate and dedicated people who share our vision to join our growing team.Our Cultures Brainwired works to create a company culture that sustains a creative workforce and encourages a healthy work-life balance.Our employees have opportunities to attend events,share their work, and take time off to volunteer or learn new skills. And We're looking for passionate and dedicated people who share our vision to join our growing team. As part of our commitment to diversity and inclusion, we're cultivating an equitable and empathetic workplace, a listening and learning culture, and an empowered and inspired community.Our Core ValuesLead with passionWe love what we do and we do what we love. We are driven, take ownership for our work, and hold each other responsible

In [9]:
docs[4]

Document(metadata={'source': 'https://www.brainwired.in/our-team', 'title': 'Team| Brainwired', 'description': 'Brilliant minds behind our Company', 'language': 'en'}, page_content="Team| BrainwiredOur LeadersSreeshankar S Nair Co-founder & CEO   An Innovation isn't worthful unless it is meaningful for the less fortunate, Brainwired is a dream to bring the smiles back to our farmersRomeo P Jerard Co-founder & COO     There is more happiness in giving than in receiving, Brainwired tries to bring happiness to our farmersPraveen Ramachandran CRO    The path to the goal becomes clear when we know exactly where we want to go. My dream is to become an humble industrialist, a person who can help the generations to come for pursuing to build an ecologically sustainable solutions to face the global problemsSajil V S CTO  I see a world where machines would work with humans and I work on my version of JARVIS to fulfill this dream, I believe in people, and people who share the same vision and work

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [11]:
text_splitter  = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=50)

In [12]:
text_splitter=RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=50)


In [13]:
text_splitter

<langchain_text_splitters.character.RecursiveCharacterTextSplitter at 0x2131ba58940>

In [14]:
text_chunks=text_splitter.split_documents(docs)


In [15]:
text_chunks

[Document(metadata={'source': 'https://www.brainwired.in/about-us', 'title': 'About Us | Brainwired', 'description': ' Brainwired is an agritech startup based out of India, which has developed a livestock health monitoring, and tracking system named WeSTOCK', 'language': 'en'}, page_content='About Us | BrainwiredAbout Us “Providing affordable and efficient livestock health monitoring and tracking system.“Brainwired is an agritech startup based out of India, which has developed a livestock health monitoring, and tracking system named WeSTOCK, that uses an IoT ear tag and a unique ML algorithm to identify sick and pregnant livestock and alert farmers accordingly.“Your Livestock, Our Care“Our team has a passion for making things with real value. This has led us to assemble a'),
 Document(metadata={'source': 'https://www.brainwired.in/about-us', 'title': 'About Us | Brainwired', 'description': ' Brainwired is an agritech startup based out of India, which has developed a livestock health mo

In [16]:
text_chunks[3].page_content

'the importance of modern-day agricultural practices in the livestock sector. WeSTOCK monitors the day-to-day activity of particular livestock and alerts farmers in case of any emergency. The whole system is highly customizable giving farmers the ability to choose from features allowing affordability and still ensuring quality. The technology being built in-house is created to support the farmers and livestock in the country.Our team has a passion for making things with real value. This has led'

In [17]:
from langchain.embeddings import OpenAIEmbeddings # open ai embedding
from langchain.vectorstores import FAISS  # it is vector database and store the data in local memory space

In [31]:
# convet the doc in numerical

# embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
# vectors= FAISS.from_documents(text_chunks,embeddings)

In [32]:
# !pip install faiss-cpu

In [33]:
# !pip install sentence_transformers
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')



  from tqdm.autonotebook import tqdm, trange
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [35]:
word_embedding = model.encode(['how are you'])

array([[-3.13689522e-02,  3.78304906e-02,  7.63081610e-02,
         4.56995815e-02, -1.20470289e-03, -7.47690275e-02,
         8.15784112e-02,  1.02093192e-02, -1.12205490e-01,
         4.07343172e-02, -4.47057299e-02, -9.02858656e-03,
        -2.29762625e-02, -9.18258820e-03,  7.13361055e-03,
        -3.52973081e-02,  7.89650157e-02, -9.91560221e-02,
        -1.21038355e-01,  3.25737856e-02, -9.91876945e-02,
         3.19695435e-02,  1.73943152e-03,  8.81588385e-02,
        -2.66701225e-02,  1.70916747e-02, -4.13507894e-02,
        -3.91818732e-02,  3.50091718e-02, -7.60820881e-02,
        -6.47241548e-02,  2.26762388e-02, -4.95691597e-02,
        -2.75941417e-02, -5.72695732e-02, -4.05301377e-02,
         1.65401939e-02, -1.00596644e-01, -4.87324372e-02,
        -2.27846205e-02,  2.08549947e-02, -6.30239621e-02,
        -1.94441956e-02, -2.73096170e-02,  7.85230994e-02,
        -4.11422066e-02,  1.53613957e-02,  3.34872752e-02,
         9.57964063e-02,  5.92160821e-02, -9.70731378e-0