# **DATA INGESTION PROCESS**

In [15]:
from langchain_community.document_loaders import TextLoader

loader=TextLoader("talk.txt")

text_from_txtfile=loader.load()

text_from_txtfile

[Document(metadata={'source': 'talk.txt'}, page_content='I am happy to join with you today in what will go down in history as the greatest demonstration for\nfreedom in the history of our nation.\nFive score years ago a great American in whose symbolic shadow we stand today signed the\nEmancipation Proclamation. This momentous decree is a great beacon light of hope to millions of Negro\nslaves who had been seared in the flames of withering injustice. It came as a joyous daybreak to end the\nlong night of their captivity. But 100 years later the Negro still is not free. One hundred years later the\nlife of the Negro is still badly crippled by the manacles of segregation and the chains of discrimination.\nOne hundred years later the Negro lives on a lonely island of poverty in the midst of a vast ocean of\nmaterial prosperity. One hundred years later the Negro is still languished in the corners of American\nsociety and finds himself in exile in his own land. So we‚Äôve come here today to

In [17]:
from langchain_community.document_loaders import WebBaseLoader
import bs4


loader2 = WebBaseLoader(
    web_path=("https://en.wikipedia.org/wiki/Martin_Luther_King_Jr.",),
    bs_kwargs={
        'parse_only': bs4.SoupStrainer(
            class_=("mw-page-container")
        )
    }
)

txt_from_url=loader2.load()

In [20]:
from langchain_community.document_loaders import PyPDFLoader

loader3=PyPDFLoader("paper2.pdf")

text_from_pdf=loader3.load()
text_from_pdf

[Document(metadata={'source': 'paper2.pdf', 'page': 0}, page_content='American Sign Language Alphabet Recognition  using \nDeep Learning  \nNikhil Kasukurthi1 Brij Rokad2 Shiv Bidani3 Aju D ennisan4 \n{ 1nikhil.kasukurthi, 2brij.rokad, 3shivbidani  }@gmail.com    |   4daju@vit.ac.in           \nVIT University  \n \nAbstract - Tremendous headway has been made in the field of 3D hand pose estimation but the 3D depth cameras are \nusually inaccessible. We propose a model to recognize  American Sign Language alphabet from RGB images. Images \nfor the training were resized and pre -processed before training the Deep Neural Network. The model was trained on \na squeezenet architecture to make it capable of running on mobile devices with an acc uracy of 83.29%.   \n \nKeywords: Sign Language detection; Squeezenet; Deep Neural Network; Stochastic Gradient Descent  \n \n1. Introduction  \nSign language conversion has  been a long standing computer vision problem[1]. Several solutions have come 

# **Transforming data**

In [25]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

txt_splitter=RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200) #dividing the documents to smaller chunks

documents=txt_splitter.split_documents(text_from_pdf)

documents[0:5]


[Document(metadata={'source': 'paper2.pdf', 'page': 0}, page_content='American Sign Language Alphabet Recognition  using \nDeep Learning  \nNikhil Kasukurthi1 Brij Rokad2 Shiv Bidani3 Aju D ennisan4 \n{ 1nikhil.kasukurthi, 2brij.rokad, 3shivbidani  }@gmail.com    |   4daju@vit.ac.in           \nVIT University  \n \nAbstract - Tremendous headway has been made in the field of 3D hand pose estimation but the 3D depth cameras are \nusually inaccessible. We propose a model to recognize  American Sign Language alphabet from RGB images. Images \nfor the training were resized and pre -processed before training the Deep Neural Network. The model was trained on \na squeezenet architecture to make it capable of running on mobile devices with an acc uracy of 83.29%.   \n \nKeywords: Sign Language detection; Squeezenet; Deep Neural Network; Stochastic Gradient Descent  \n \n1. Introduction  \nSign language conversion has  been a long standing computer vision problem[1]. Several solutions have come 

In [31]:
#now we will convert chunks to vectors

#vector embeddings and vector store

from langchain_community.embeddings import OllamaEmbeddings

# we need to store in the vectorstore so we will use croma db

from langchain_community.vectorstores import Chroma

db=Chroma.from_documents(documents,OllamaEmbeddings(model="llama3"))

In [44]:
query="who are the authors of American Sign Language Alphabet Recognition using Deep Learning"

result=db.similarity_search(query)

In [46]:
result

[Document(metadata={'page': 3, 'source': 'paper2.pdf'}, page_content='parameters, and in turn the siz e of the network.  \n \n  \n \n \n \n \nThe concatenated layer is fed onto the expand layer and hence the number of interconnections between the squeeze layer and \nthe expand layer are minimal. This ensures that the size of the network is low. The expand layer comprises of 3x3 filters \nalong with more 1x1 filters. These are concatenated in order to attain the result.  \n \n3.4 Testing  \nThe weights of the trained model are updated through training it over multiple epochs. The weights and biases of the \nnetwork are trained over the epochs to determine a final network. This net is used to evaluate the input image. The \nevaluation is a low computation process and can be carried out on a handheld mobile device.  \n   \n \nFig. 3 Squeezenet Architecture'),
 Document(metadata={'page': 1, 'source': 'paper2.pdf'}, page_content='[8] has utilized a number of parameters including posture of 