In [1]:
from langchain.document_loaders import TextLoader

In [2]:
# Writing text to local file
text = """
Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio and synthetic data. The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.

The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.

On the one hand, this newfound capability has opened up opportunities that include better movie dubbing and rich educational content. It also unlocked concerns about deepfakes -- digitally forged images or videos -- and harmful cybersecurity attacks on businesses, including nefarious requests that realistically mimic an employee's boss.

Two additional recent advances that will be discussed in more detail below have played a critical part in generative AI going mainstream: transformers and the breakthrough language models they enabled. Transformers are a type of machine learning that made it possible for researchers to train ever-larger models without having to label all of the data in advance. New models could thus be trained on billions of pages of text, resulting in answers with more depth. In addition, transformers unlocked a new notion called attention that enabled models to track the connections between words across pages, chapters and books rather than just in individual sentences. And not just words: Transformers could also use their ability to track connections to analyze code, proteins, chemicals and DNA.

The rapid advances in so-called large language models (LLMs) -- i.e., models with billions or even trillions of parameters -- have opened a new era in which generative AI models can write engaging text, paint photorealistic images and even create somewhat entertaining sitcoms on the fly. Moreover, innovations in multimodal AI enable teams to generate content across multiple types of media, including text, graphics and video. This is the basis for tools like Dall-E that automatically create images from a text description or generate text captions from images.

These breakthroughs notwithstanding, we are still in the early days of using generative AI to create readable text and photorealistic stylized graphics. Early implementations have had issues with accuracy and bias, as well as being prone to hallucinations and spitting back weird answers. Still, progress thus far indicates that the inherent capabilities of this generative AI could fundamentally change enterprise technology how businesses operate. Going forward, this technology could help write code, design new drugs, develop products, redesign business processes and transform supply chains.
"""

with open("my_file.txt", "w") as f:
    f.write(text)

In [4]:
# Using text loader to load text from local file
loader = TextLoader('my_file.txt')
docs_from_file = loader.load()
print(len(docs_from_file))

1


# CharacterTextSplitter


In [5]:
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(chunk_size = 200, chunk_overlap = 20)

docs = text_splitter.split_documents(docs_from_file)

print(len(docs))

Created a chunk of size 326, which is longer than the specified 200
Created a chunk of size 340, which is longer than the specified 200
Created a chunk of size 338, which is longer than the specified 200
Created a chunk of size 792, which is longer than the specified 200
Created a chunk of size 564, which is longer than the specified 200


6


# Using Embeddings

In [7]:
from langchain.embeddings import GooglePalmEmbeddings

embedding = GooglePalmEmbeddings()

  from .autonotebook import tqdm as notebook_tqdm


## Creating a Deeplake store

In [8]:
from langchain.vectorstores import DeepLake

my_activeloop_org_id = "samman"
my_activeloop_dataset_name = "langchain_course_indexers_retrievers"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embedding)

# Adding documents to deeplake dataset
db.add_documents(docs)


Using embedding function is deprecated and will be removed in the future. Please use embedding instead.


Your Deep Lake dataset has been successfully created!


Creating 6 embeddings in 1 batches of size 6:: 100%|██████████| 1/1 [00:24<00:00, 24.76s/it]

Dataset(path='hub://samman/langchain_course_indexers_retrievers', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype     shape     dtype  compression
  -------    -------   -------   -------  ------- 
   text       text      (6, 1)     str     None   
 metadata     json      (6, 1)     str     None   
 embedding  embedding  (6, 768)  float32   None   
    id        text      (6, 1)     str     None   





['d23477dd-ac9f-11ee-b60b-60189524c791',
 'd23477de-ac9f-11ee-98a7-60189524c791',
 'd23477df-ac9f-11ee-b082-60189524c791',
 'd23477e0-ac9f-11ee-b8f9-60189524c791',
 'd23477e1-ac9f-11ee-9aea-60189524c791',
 'd23477e2-ac9f-11ee-8297-60189524c791']

# Retriever

In [9]:
# Creating retriever from db
retriever = db.as_retriever()

# question-answering

In [10]:
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI

In [13]:
# creating a retrieval chain
chain = RetrievalQA.from_chain_type(
    llm=ChatGoogleGenerativeAI(model='gemini-pro', temperature=0, convert_system_message_to_human=True),
    chain_type='stuff',
    retriever=retriever,
)

In [14]:
query = "What is generative Ai and its importance?"
response = chain.run(query)
print(response)

Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio, and synthetic data. It is important because it has the potential to fundamentally change enterprise technology and how businesses operate. It could help write code, design new drugs, develop products, redesign business processes, and transform supply chains.


# Document Compressor

In [15]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

llm = ChatGoogleGenerativeAI(model='gemini-pro', temperature=0)

# creating compressor for retrieval
compressor = LLMChainExtractor.from_llm(llm=llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=retriever
)

In [17]:
# retrieving compressed documents
retrieved_docs = compression_retriever.get_relevant_documents(
    "When was generative ai introduced?"
)
print(retrieved_docs)



[Document(page_content='The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.', metadata={'source': 'my_file.txt'})]


In [18]:
print(retrieved_docs[0].page_content)

The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.
