## Import necessary packages

In [141]:
import pathlib
import textwrap
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from IPython.display import display
from IPython.display import Markdown
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

## Load Model

In [142]:
# Import the Python SDK
import google.generativeai as genai

my_key = 'AIzaSyAiL6Qbk4XHjZ0DsPYB4vzqFBym17yTB3o'
genai.configure(api_key=my_key)

model = genai.GenerativeModel('gemini-pro')

In [143]:
load_dotenv()
llm = ChatGoogleGenerativeAI(model="gemini-pro")

In [144]:
loader = PyPDFLoader(r"E:\Courses\projects\Google-AI-Hackathon\ch1.pdf")
pages = loader.load_and_split()
doc = '\n'.join(str(p.page_content) for p in pages[:3])

In [156]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=50,
    length_function = len,
)
texts = text_splitter.split_text(doc)
len(texts)

4

In [157]:
embeddings = GoogleGenerativeAIEmbeddings(model='models/embedding-001')

In [158]:
docsearch = FAISS.from_texts(texts, embeddings)

In [148]:
query = 'examples of ML Applications'
print(docsearch.similarity_search(query)[0].page_content)

Chapter 1. The Machine Learning
Landscape
Not so long ago, if you had picked up your phone and asked it the way
home, it would have ignored you—and people would have questioned your
sanity . But machine learning is no longer science fiction: billions of people
use it every day . And the truth is it has actually been around for decades in
some specialized applications, such as optical character recognition (OCR).
The first ML application that really became mainstream, improving the
lives of hundreds of millions of people, took over the world back in the
1990s: the spam filter . It’s not exactly a self-aware robot, but it does
technically qualify as machine learning: it has actually learned so well that
you seldom need to flag an email as spam anymore. It was followed by
hundreds of ML applications that now quietly power hundreds of products
and features that you use regularly: voice prompts, automatic translation,
image search, product recommendations, and many more.


Object `similarity_search` not found.


Generate Summary

Prompt to get the most important sentences

In [149]:
def create_context(idx):
  if idx == 0:
    return ''
  else:
    past = ''
    for i in range(idx):
      past = past + texts[i]
      prompt = "Please provide a detailed summarize to the following text:\n\n\n" + past
      response = model.generate_content(prompt)
      context = """This is a summary of the previous text, use it to help you understand the current text more in order to give better results \n\n\n""" + response.text
    return context

In [150]:
idx = len(texts)

In [151]:
results = []
for i in range(idx):
    prompt = create_context(i) + """\n\n Prompt: {Can you tell me exactly what are the important sentences in this text?
    don't write anything except for the important sentences please.} \n\n\n""" + texts[i]
    response = model.generate_content(prompt)
    results.append(response.text)


In [152]:
prompt_html = """Can you turn the following text into html code, Use styles and change fonts but never change any sentence or add any sentence, use only sentences from the text, please don't change any sentence: """.join(results)

In [153]:
response = model.generate_content(prompt_html)

In [154]:
# Specify the file path where you want to save the text file
file_path = "page"+str(idx)+".html"

# Open the file in write mode and write the content of the string to it
with open(file_path, "w") as file:
    file.write(response.text)

print("String content saved to:", file_path)


String content saved to: page4.html


In [159]:
query = results
print(docsearch.similarity_search(query)[2].page_content)

(it’s the only chapter without much code), all rather simple, but my goal is
to ensure everything is crystal clear to you before we continue on to the rest
of the book. So grab a cof fee and let’ s get started!
TIP
If you are already familiar with machine learning basics, you may want to skip directly to Chapter 2 .
If you are not sure, try to answer all the questions listed at the end of the chapter before moving on.
What Is Machine Learning?
Machine learning is the science (and art) of programming computers so
they can learn fr om data .
Here is a slightly more general definition:
[Machine learning is the] field of study that gives computers the
ability to learn without being explicitly pr ogrammed.
—Arthur Samuel, 1959
And a more engineering-oriented one:
A computer pr ogram is said to learn fr om experience E with r espect
to some task T and some performance measur e P, if its performance
on T, as measur ed by P, impr oves with experience E.
—Tom Mitchell, 1997


### Next Steps:
1. write prompt template to generate summary for document
2. pass summary as query to docsearch
3. use result of search similarity to be highlighted