# Vector database creation

In [1]:

!pip install -qU pinecone-client langchain_community cohere PyMuPDF PyPDF2


In [2]:
import cohere
from PyPDF2 import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter
from pinecone import Pinecone, ServerlessSpec
import os
import getpass
from langchain.schema import Document
from langchain.embeddings import CohereEmbeddings
from langchain_community.vectorstores import Pinecone as Pinecone_Langchain

In [3]:
index = 'mega-hackathon-2024-app'
os.environ['PINECONE_API_KEY'] = pinecone_secret_key = getpass.getpass('Enter Pinecone secret key:')
cohere_secret_key = getpass.getpass('Enter Cohere secret key:')

Enter Pinecone secret key:··········
Enter Cohere secret key:··········


In [4]:
pc = Pinecone(api_key=pinecone_secret_key)
if index not in pc.list_indexes().names():
  pc.create_index(
    name = index,
    dimension = 4096,
    metric = 'cosine',
    spec=ServerlessSpec(
      cloud="aws",
      region="us-east-1"
      )
  )
pc.describe_index(index)

{'deletion_protection': 'disabled',
 'dimension': 4096,
 'host': 'mega-hackathon-2024-app-dl1pia4.svc.aped-4627-b74a.pinecone.io',
 'metric': 'cosine',
 'name': 'mega-hackathon-2024-app',
 'spec': {'serverless': {'cloud': 'aws', 'region': 'us-east-1'}},
 'status': {'ready': True, 'state': 'Ready'}}

In [5]:

# Each document will have a size of 700 or less
character_text_splitter = CharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap  = 200,
    length_function = len,
    is_separator_regex=False,
)
recursive_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap  = 200,
    length_function = len,
    is_separator_regex=False,
)

# Books

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [7]:

import fitz

## Biology Content

### Cell document

In [8]:
text= ""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/biology/Cell .pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

The Cell
The cell is the basic unit of life. Some organisms are made up of a single cell, like bacteria,
while others are made up of trillions of cells. Human beings are made up of cells, too.
Different Types of Cells
There are lots of different types of cells. Each type of cell is different and performs a different
function. In the human body, we have nerve cells which can be as long as from our feet to our
spinal cord. Nerve cells help to transport messages around the body. We also have billions of
tiny little brain cells which help us think and muscle cells which help us move around. There
are many more cells in our body that help us to function and stay alive.
Although there are lots of different kinds of cells, they are often divided into two main
categories: prokaryotic and eukaryotic.
Prokaryotic Cells - The prokaryotic cell is a simple, small cell with no nucleus. Organisms made
from prokaryotic cells are very small, such as bacteria. There are three main regions of the
prokary

In [78]:

chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 54


In [10]:
documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Biology","Topic": "Cell" ,"Source": "https://www.ducksters.com/science/biology/"},
  ))

In [11]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

  warn_deprecated(


In [12]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('Can you explain the Cell Cycle for Mitosis? ')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)

  warn_deprecated(


### Disease document

In [13]:
text =""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/biology/Disease.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

Infectious Disease
What is an infectious disease?
An infectious disease is any disease caused by a pathogen (germ) such as a virus,
bacteria, parasite, or fungus. Although we will mostly discuss infectious diseases in
people on this page, other living organisms such as animals, plants, and
microorganisms can all be made ill by an infectious disease.
Pathogens
"Pathogen" is the scientific name for "germ." Infectious diseases are caused by
pathogens. When your mom says to wash your hands because of germs, she wants
you to get all the pathogens off your hands so they won't go into your mouth and body.
Maybe after reading this, you will wash your hands a bit more!
Pathogens are tiny organisms (called microorganisms) that invade the body and make it
sick. Examples of pathogens are viruses, bacteria, parasites, and fungi. Click on the
word to learn more.
Examples
Different kinds of pathogens cause different kinds of diseases. Here are some example
diseases caused by each type of pathogen:
●


In [14]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 66


In [15]:

documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Biology","Topic":"Disease", "Source": "https://www.ducksters.com/science/biology/"},
  ))

In [16]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [17]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('What are the two main categories of legal drugs?')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

In the United States, the government has regulations in order to insure that drugs are as
safe as possible. Companies must run certain tests and pass requirements before they
can sell a drug. The agency that watches over drugs is called the Food and Drug
Administration, which is often shortened to FDA.
Types of Drugs
Most legal drugs can be divided into two categories:
●
Prescription - Prescription drugs require approval from a doctor to purchase.
They are handed out by pharmacists who insure that the correct doses are given
and that people understand the proper way to take the drug.
●
Over the Counter - Over the counter drugs can be purchased without a
{'Source': 'https://www.ducksters.com/science/biology/', 'Topic': 'Disease', 'Type': 'Biology'}

## Document 1

and that people understand the proper way to take the drug.
●
Over the Counter - Over the counter drugs can be purchased without a
prescription. Examples of these include pain relievers such as Advil and aspiri

### Genetics document

In [18]:
text = ""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/biology/Genetics.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

Genetics
What is genetics?
Genetics is the study of genes and heredity. It studies how living organisms, including
people, inherit traits from their parents. Genetics is generally considered part of the
science of biology. Scientists who study genetics are called geneticists.
What are genes?
Genes are the basic units of heredity. They consist of DNA and are part of a larger
structure called the chromosome. Genes carry information that determine what
characteristics are inherited from an organism's parents. They determine traits such as
the color of your hair, how tall you are, and the color of your eyes.
What are chromosomes?
Chromosomes are tiny structures inside cells made from DNA and protein. The
information inside chromosomes acts like a recipe that tells cells how to function.
Humans have 23 pairs of chromosomes for a total of 46 chromosomes in each cell.
Other plants and animals have different numbers of chromosomes. For example, a
garden pea has 14 chromosomes and an elephant h

In [19]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 41


In [20]:

documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Biology","Topic": "Genetics","Source": "https://www.ducksters.com/science/biology/"},
  ))

In [21]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [22]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('What are the four different types of nucleotides that make up DNA?')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

deoxyribonucleic acid. The molecules of DNA are organized into special structures called
chromosomes. Sections of DNA are called genes which hold hereditary information such as
eye color and height. You can go here to learn more about DNA and chromosomes.
Other Functions
●
RNA - In addition to DNA the nucleus holds another type of nucleic acid called RNA
(ribonucleic acid). RNA plays an important role in making proteins called protein
synthesis or translation.
●
DNA replication - The nucleus can make exact copies of its DNA.
●
Transcription - The nucleus makes RNA which can be used to carry messages and
copies of DNA instructions.
●
{'Source': 'https://www.ducksters.com/science/biology/', 'Topic': 'Cell', 'Type': 'Biology'}

## Document 1

●
DNA replication - The nucleus can make exact copies of its DNA.
●
Transcription - The nucleus makes RNA which can be used to carry messages and
copies of DNA instructions.
●
Translation - The RNA is used to configure amino acids int

### Living Organisms document

In [23]:
text = ""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/biology/Living Organisms.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
name. From the most threatened animal to least threatened these names are:
1) critically endangered
2) endangered
3) vulnerable
There are also some animals that only exist in captivity (for example in a zoo). These
animals are called "extinct in the wild".
What are some of the most endangered animals?
These are animals that are categorized as Critically Endangered. Here is just a
sampling from the list:
Black Rhinoceros - There are only a few black rhinos left. They mostly live in Western
Africa. They are mostly threatened due to hunters killing them for their horns.
Red Wolf - The red wolf originally lived in the Southeastern United States. There are
only a few hundred left, most of them living in captivity.
Others include the Siberian Tiger, Florida Panther, Mountain Gorilla, California Condor,
and the Giant Ibis.
Some "endangered" animals include the Sea Otter, Loggerhead Sea Turtle, Giant
Panda, Blue Whale, Albatross,

In [24]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 236


In [25]:
documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Biology","Topic": "Living Organisms","Source": "https://www.ducksters.com/science/biology/"},
  ))

In [26]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [27]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('what is the Scientific Classification ?')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

Scientific Classification
Biological Classification is the way scientists use to categorize and organize all of life. It
can help to distinguish how similar or different living organisms are to each other.
An example of Classification
Biological classification works a bit like the library
does. Inside the library, books are divided up into
certain areas: the kids books in one section, the
adult books in another, and the teen books in
another section. Within each of those sections,
there will be more divisions like fiction,
non-fiction. Within those sections there will be
even more divisions such as mystery, science
fiction, and romance novels in the fiction section.
{'Source': 'https://www.ducksters.com/science/biology/', 'Topic': 'Living Organisms', 'Type': 'Biology'}

## Document 1

divisions which would be like fiction, non-fiction,
mystery, etc. Finally, you get to the species,
which is sort of like getting to the book in the
library.
7 Major Levels of Classificatio

### Nutrition document

In [28]:
text =""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/biology/Nutrition.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

Nutrition
Nutrition is how we get the food we need to grow healthy and strong. Vitamins and
minerals help our bodies to function and grow.
Why is nutrition important for kids?
Eating good foods is especially important for kids because they are still growing. Kids'
bodies need nutrition to grow strong healthy bones and muscles. If you don't get all the
vitamins and minerals you need while you are growing, you won't grow as tall and as
strong as you could be.
Food Groups
There are five main food groups that you should eat every day. By eating a variety of
foods in each of these food groups, you will get the nutrition you need to grow and be
healthy.
●
Grains - breads, cereal, pasta, rice
●
Dairy - milk, cheese, yogurt
●
Fruits - apples, oranges, berries, grapes, bananas
●
Vegetables - broccoli, beans, spinach, carrots, peas
●
Protein - beef, chicken, pork, eggs, nuts, fish
My Plate
The United States Department of Agriculture (USDA) has come up with a picture of a
plate to help us to make

In [29]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 28


In [30]:
documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Biology","Topic": "Nutrition","Source": "https://www.ducksters.com/science/biology/"},
  ))

In [31]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [32]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('What do Vitamins mean ? ')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

Vitamins and
Minerals
We learned on the nutrition page that proper nutrition means eating the right foods so
our body will get the vitamins and minerals it needs. Our bodies need vitamins and
minerals to function. Different parts of our bodies like our eyes, brains, muscles, and
bones need different nutrients to grow and be healthy.
Here is a list of vitamins and minerals that our bodies need:
Vitamins
Vitamin A
●
Eyes, immune system, skin
●
Milk, eggs, orange and green vegetables
Vitamin C
●
Bones, blood vessels, teeth, gums, healing, brain
●
Berries, bell peppers, oranges, spinach, tomatoes
Vitamin D
●
Bones
●
Sunlight, milk, fish oil, eggs
Vitamin E
●
Blood, cells
●
{'Source': 'https://www.ducksters.com/science/biology/', 'Topic': 'Nutrition', 'Type': 'Biology'}

## Document 1

●
The average American will eat around 12 pounds of chocolate in a year.
●
People will try all sorts of crazy diets, but the best way to lose weight is to eat
healthy and exercise.
●
Empty cal

### Plants

In [33]:
text =""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/biology/Plants.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

Photosynthesis
What is photosynthesis?
Have you ever noticed that plants need sunlight to live? It seems sort of strange doesn't
it? How can sunlight be a type of food? Well, sunlight is energy and photosynthesis is
the process plants use to take the energy from sunlight and use it to convert carbon
dioxide and water into food.
Three things plants need to live
Plants need three basic things to live: water, sunlight, and carbon dioxide. Plants
breathe carbon dioxide just like we breathe oxygen. When plants breathe carbon
dioxide in, they breathe out oxygen. Plants are the major source of oxygen on planet
Earth and help keep us alive.
We know now that plants use sunlight as energy, they get water from rain, and they get
carbon dioxide from breathing. The process of taking these three key ingredients and
making them into food is called photosynthesis.
How do plants capture sunlight?
Plants capture sunlight using a compound called chlorophyll. Chlorophyll is green,
which is why so many pla

In [34]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 37


In [35]:
documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Biology","Topic": "Plants","Source": "https://www.ducksters.com/science/biology/"},
  ))

In [36]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [37]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('Photosynthesis')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

●
Other - Chloroplasts have their own DNA and ribosomes for making proteins from
RNA.
Photosynthesis
Chloroplasts use photosynthesis to turn sunlight into food. The chlorophyll captures energy
from light and stores it in a special molecule called ATP (which stands for adenosine
triphosphate). Later, the ATP is combined with carbon dioxide and water to make sugars such
as glucose that the plant can use as food.
Other Functions
Other functions of chloroplasts include fighting off diseases as part of the cell's immune
system, storing energy for the cell, and making amino acids for the cell.
Interesting Facts about Chloroplasts
●
{'Source': 'https://www.ducksters.com/science/biology/', 'Topic': 'Cell', 'Type': 'Biology'}

## Document 1

throughout all of human history. Trees have also been a great source of fuel as fires for
keeping warm and cooking food. We also gather a lot of our food from trees such as
fruit and nuts. However, trees are also important to our environment

### The Human Body document

In [38]:
text =""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/biology/The Human Body.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

Human Body
The human body is a complex biological system involving cells, tissues, organs, and
systems all working together to make up a human being.
Main Structures
From the outside, the human body can be divided into several main structures. The
head houses the brain which controls the body. The neck and trunk house many of the
important systems that keep the body alive and healthy. The limbs (arms and legs) help
the body to move about and function in the world.
Senses
The human body has five main senses that it uses to convey information about the
outside world to the brain. These senses include sight (eyes), hearing (ears)Hearing
and the Ear, smell (nose), taste (tongue), and touch (skin).
Organ Systems
The human body consists of several organ systems. Each system is made up of organs
and other body structures that work together to perform a specific function. Most
scientists divide the body into 11 systems.
1. Skeletal System - The skeletal system is made up of bones, ligaments, a

In [39]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 92


In [40]:
documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Biology","Topic": "The Human Body","Source": "https://www.ducksters.com/science/biology/"},
  ))

In [41]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [42]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('The Human Brain')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

The Human
Brain
The brain is where we do our thinking. All our senses are tied into our brain allowing us
to experience the outside world. We remember, have emotions, solve problems, worry
about stuff, dream about the future, and control our bodies in our brain.
For such an awesome organ, the brain doesn't look like much. It's a ball of gray looking
wrinkled tissue about the size of two of your fists put together. The brain sits in our hard,
thick skull with membranes and fluid around it to protect it.
How the Brain Communicates
The brain is part of the nervous system. Together with the spinal cord, it makes up the
{'Source': 'https://www.ducksters.com/science/biology/', 'Topic': 'The Human Body', 'Type': 'Biology'}

## Document 1

blood. There are lots of blood vessels and blood flowing through the brain at all times.
The brain actually uses around twenty percent of the body's energy.
The Brain Has Two Halves
The brain is divided into two halves. Since the nerves cross

## Physics

### Motion document

In [43]:
text =""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/physics/_Motion.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

Scalars and Vectors
There are a lot of different mathematical quantities used in physics. Examples of these
include acceleration, velocity, speed, force, work, and power. These different quantities
are often described as being either "scalar" or "vector" quantities. Below we will discuss
what these words mean as well as introduce some basic vector math.
What is a scalar?
A scalar is a quantity that is fully described by a magnitude only. It is described by just a
single number. Some examples of scalar quantities include speed, volume, mass,
temperature, power, energy, and time.
What is a vector?
A vector is a quantity that has both a magnitude and a direction. Vector quantities are
important in the study of motion. Some examples of vector quantities include force,
velocity, acceleration, displacement, and momentum.
What is the difference between a scalar and vector?
A vector quantity has a direction and a magnitude, while a scalar has only a magnitude.
You can tell if a quantity is a v

In [44]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 65


In [45]:
documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Physics","Topic": "Motion","Source": "https://www.ducksters.com/science/physics/"},
  ))

In [46]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [47]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('Mass and Weight')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

Example:
What is the weight of a 50 kg mass object?
weight = 50 kg * 9.8 m/s2
weight = 490 N
Is mass the same as size?
No, mass is different than size or volume. This is because the type of atoms or
molecules as well as their density helps to determine the mass. For example, a balloon
filled with helium will have much less mass than a similar sized item made of solid gold.
The Law of Conservation of Mass
The law of conservation of mass states that the mass of a closed system must remain
constant over time. This means that although changes are being made to the objects in
a system, the overall mass of the system must remain the same.
Interesting Facts about Mass and Weight
●
{'Source': 'https://www.ducksters.com/science/physics/', 'Topic': 'Motion', 'Type': 'Physics'}

## Document 1

an object exerts on other objects. It can also be the measurement of how much
gravity an object experiences from another object.
When scientists want to express mass in terms of atoms and mo

### Astronomy document

In [48]:
text =""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/physics/Astronomy.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

Astronomy for Kids
What is Astronomy?
Astronomy is the branch of science that studies outer space focusing on celestial bodies
such as stars, comets, planets, and galaxies.
History of Astronomy
Perhaps one of the oldest sciences, we have record of people studying astronomy as
far back as Ancient Mesopotamia. Later civilizations such as the Greeks, Romans, and
Mayans also studied astronomy. However, all of these early scientists had to observe
space with just their eyes. There was only so much they could see. With the invention of
the telescope in the early 1600s, scientists were able to see much further objects as
well as get a better view of closer objects like the moon and the planets.
Major Discoveries and Scientists
Galileo Galilei made major improvements to the telescope allowing close observations
of the planets. He made many discoveries including the 4 major satellites of Jupiter (the
Galilean moons) and sunspots.
Johannes Kepler was a famous astronomer and mathematician who cam

In [49]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 88


In [50]:
documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Physics","Topic": "Astrnomy","Source": "https://www.ducksters.com/science/physics/"},
  ))

In [51]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [52]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('The Solar System')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

final state including interesting objects like red giants, black holes, supernovas,
and neutron stars.
The Solar System
The center of the Solar System is the Sun. The Solar System is made up of the Sun and
all the planets, asteroids, and other objects that orbit the Sun.
The Planets
There are eight planets in our Solar System. Starting with the closest to the sun they are
Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. The closest four
planets (Mercury, Venus, Earth, and Mars) are termed terrestrial planets, meaning they
have a hard rocky surface. The furthest four planets (Jupiter, Saturn, Uranus, and
{'Source': 'https://www.ducksters.com/science/physics/', 'Topic': 'Astrnomy', 'Type': 'Physics'}

## Document 1

●
Scientists estimate there are around 200 billion stars in the Milky Way galaxy.
●
Pluto was once considered a full planet, but was redefined as a dwarf planet in
2006.
●
About 99.85% of the mass of the Solar System is the Sun. All the other

### Elictricity document

In [53]:
text =""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/physics/Electricity.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

Intro
What is Electricity?
In order to understand the basics of electricity, it helps to first understand about atoms.
Atoms are small particles that make up all matter. They are so small that it takes billions
and billions of them just to make something useful like a pencil. Inside the atom are
even smaller objects called electrons, protons, and neutrons. Electrons have a negative
charge (-) and the protons have a positive charge (+). The protons and neutrons stick to
together in the center of the atom, called the nucleus. The electrons spin fast around the
outside. The positive charge of the protons keeps the electrons from flying off and
leaving the atom.
The electrons in the atom are where electricity gets its name. In some elements, there
are electrons on the outside of the atom that, when a force is applied, can come loose
and move to another atom. When a bunch of atoms are together and electrons are
moving from one atom to the other in the same direction, this is called electric

In [54]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 95


In [55]:
documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Physics","Topic": "Electricity","Source": "https://www.ducksters.com/science/physics/"},
  ))

In [56]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [57]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('Electrical Conductors and Insulators')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

Basic Concepts
On this page we will describe some of the basic concepts of electricity. Knowing what
these terms mean will help you to better understand the rest of our pages on electricity.
What are some important things to know about electricity?
●
Conductors and insulators - Conductors are materials that allow electricity to
flow easily. Most types of metal are good conductors, which is why we use metal
for electrical wire. Copper is a good conductor and isn't too expensive, so it's
used a lot for the wiring in homes today.
Insulators are the opposite of conductors. An insulator is a material that doesn't
carry electricity. Insulators are important because they can protect us from
{'Source': 'https://www.ducksters.com/science/physics/', 'Topic': 'Electricity', 'Type': 'Physics'}

## Document 1

to your computer or television is covered with a rubber-like insulator that protects you
from getting electrocuted. Good insulators include glass, the air, and paper.
Semicond

### Light and Optics document

In [58]:
text =""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/physics/Light and Optics.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

Science of Light
What is light made of?
This is not an easy question. Light has no mass and is not really considered matter. So
does it even exist? Of course it does! We couldn't live without light. Today scientists say
light is a form of energy made of photons. Light is unique in that it behaves like both a
particle and a wave.
Why does light go through some things and not others?
Depending on the type of matter it comes into contact with, light will behave differently.
Sometimes light will pass directly through the matter, like with air or water. This type of
matter is called transparent. Other objects completely reflect light, like an animal or a
book. These objects are called opaque. A third type of object does some of both and
tends to scatter the light. These objects are called translucent objects.
Light helps us to survive
Without sunlight our world would be a dead dark place. Sunlight does more than just
help us see (which is pretty great, too). Sunlight keeps the Earth warm, s

In [59]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 43


In [60]:
documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Physics","Topic": "Light and Optics","Source": "https://www.ducksters.com/science/physics/"},
  ))

In [61]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [62]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('Science of Light')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

Sight and the Eye
Sight is one of the five senses that help us to get information about what is going on in
the world around us. We see through our eyes, which are organs that take in light and
images and turn them into electrical impulses that our brain can understand.
How do we see?
When we see something, what we are seeing is actually reflected light. Light rays
bounce off of objects and into our eyes.
Our Amazing Eyeballs
Pupil and Iris:
Eyes are amazing and complex organs. In order for us to see, light enters our eyes
through the black spot in the middle which is really a hole in the eye called the pupil.
{'Source': 'https://www.ducksters.com/science/biology/', 'Topic': 'The Human Body', 'Type': 'Biology'}

## Document 1

the eye. If it's dark, the iris will open the pupil up so more light can get into the eye.
Retina:
Once the light is in our eye it passes through fluids and lands on the retina at the back
of the eye. The retina turns the light rays into signals t

### Nuclear Physics and Relativity

In [63]:
text =""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/physics/Nuclear Physics and Relativity.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

The Atom
Science >> Chemistry for Kids
The atom is the basic building block for all matter in the universe. Atoms are extremely
small and are made up of a few even smaller particles. The basic particles that make up
an atom are electrons, protons, and neutrons. Atoms fit together with other atoms to
make up matter. It takes a lot of atoms to make up anything. There are so many atoms
in a single human body we won't even try to write the number here. Suffice it to say that
the number is trillions and trillions (and then some more).
There are different kinds of atoms based on the number of electrons, protons, and
neutrons each atom contains. Each different kind of atom makes up an element. There
are 92 natural elements and up to 118 when you count in man-made elements.
Atoms last a long time, in most cases forever. They can change and undergo chemical
reactions, sharing electrons with other atoms. But the nucleus is very hard to split,
meaning most atoms are around for a long time.
Struct

In [64]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 50


In [65]:
documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Physics","Topic": "Nuclear Physics and Relativity","Source": "https://www.ducksters.com/science/physics/"},
  ))

In [66]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [67]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('The Atom')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

The Atom
Science >> Chemistry for Kids
The atom is the basic building block for all matter in the universe. Atoms are extremely
small and are made up of a few even smaller particles. The basic particles that make up
an atom are electrons, protons, and neutrons. Atoms fit together with other atoms to
make up matter. It takes a lot of atoms to make up anything. There are so many atoms
in a single human body we won't even try to write the number here. Suffice it to say that
the number is trillions and trillions (and then some more).
There are different kinds of atoms based on the number of electrons, protons, and
{'Source': 'https://www.ducksters.com/science/physics/', 'Topic': 'Nuclear Physics and Relativity', 'Type': 'Physics'}

## Document 1

Intro
What is Electricity?
In order to understand the basics of electricity, it helps to first understand about atoms.
Atoms are small particles that make up all matter. They are so small that it takes billions
and billions of them

### Waves and Sound

In [68]:
text =""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/physics/Waves and Sound.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

Waves
What is a wave?
When we think of the word "wave" we usually picture someone moving their hand back
and forth to say hello or maybe we think of a curling wall of water moving in from the
ocean to crash on the beach.
In physics, a wave is a disturbance that travels through space and matter transferring
energy from one place to another. When studying waves it's important to remember that
they transfer energy, not matter.
Waves in Everyday Life
There are lots of waves all around us in
everyday life. Sound is a type of wave that
moves through matter and then vibrates our
eardrums so we can hear. Light is a special kind
of wave that is made up of photons. You can
drop a rock into a pond and see waves form in
the water. We even use waves (microwaves) to
cook our food really fast.
Types of Waves
Waves can be divided into various categories depending on their characteristics. Below
we describe some of the different terms that scientists use to describe waves.
Mechanical Waves and Electrom

In [69]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 52


In [70]:
documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Physics","Topic": "Waves and Sound","Source": "https://www.ducksters.com/science/physics/"},
  ))

In [71]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [72]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('Waves')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

Waves
What is a wave?
When we think of the word "wave" we usually picture someone moving their hand back
and forth to say hello or maybe we think of a curling wall of water moving in from the
ocean to crash on the beach.
In physics, a wave is a disturbance that travels through space and matter transferring
energy from one place to another. When studying waves it's important to remember that
they transfer energy, not matter.
Waves in Everyday Life
There are lots of waves all around us in
everyday life. Sound is a type of wave that
moves through matter and then vibrates our
eardrums so we can hear. Light is a special kind
of wave that is made up of photons. You can
{'Source': 'https://www.ducksters.com/science/physics/', 'Topic': 'Waves and Sound', 'Type': 'Physics'}

## Document 1

same direction as the sound is moving.
Interesting Facts about Waves
●
Waves in the ocean are mostly generated by the wind moving across the ocean
surface.
●
The "medium" is the substance or m

### Work and Energy document

In [73]:
text =""
doc = fitz.open("/content/drive/MyDrive/MEGA_HACKATHON_2024/Data/physics/Work and Energy.pdf")
for page in doc:
  text = text + page.get_text()
  print(text)

Energy
What is Energy?
The simplest definition of energy is "the ability to do work". Energy is how things change
and move. It's everywhere around us and takes all sorts of forms. It takes energy to
cook food, to drive to school, and to jump in the air.
Different forms of Energy
Energy can take a number of different forms. Here are some examples:
●
Chemical - Chemical energy comes from atoms and molecules and how they
interact.
●
Electrical - Electrical energy is generated by the movement of electrons.
●
Gravitational - Large objects such as the Earth and the Sun create gravity and
gravitational energy.
●
Heat - Heat energy is also called thermal energy. It comes from molecules of
different temperatures interacting.
●
Light - Light is called radiant energy. The Earth gets a lot of its energy from the
light of the Sun.
●
Motion - Anything that is moving has energy. This is also called kinetic energy.
●
Nuclear - Huge amounts of nuclear energy can be generated by splitting atoms.
●
Poten

In [74]:
chunks = recursive_text_splitter.split_text(text)
print(f"Number of chunks is {len(chunks)}")

Number of chunks is 54


In [75]:
documents = []
for chunk in chunks:
  documents.append(Document(
        page_content=chunk,
        metadata={"Type": "Physics","Topic": "Work and Energy","Source": "https://www.ducksters.com/science/physics/"},
  ))

In [76]:
embeddings = CohereEmbeddings(cohere_api_key= cohere_secret_key, user_agent='mega-hackathon-2024')
vector_store = Pinecone_Langchain.from_documents(documents, embeddings, index_name=index)

In [77]:
retriever = vector_store.as_retriever()
matched_docs = retriever.get_relevant_documents('Power')
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)
    print(d.metadata)


## Document 0

your computer at home than you could hope to read in a lifetime.
Fun facts about Electronic Communications
●
There are around 250 billion emails sent every day. Around 80% of these are
spam.
●
Around 20 hours of video are uploaded to YouTube every minute.
●
Fiber optics are good because they use less energy and are better for the
environment than electrical wires. They are also very resistant to weather.
●
The first telephone pole was built in 1876.
●
There are over 4 billion cell phones in the world. Over 100 million cell phones are
thrown away every year.
●
The first cell phone was invented by the company Motorola.
Uses and
Applications
{'Source': 'https://www.ducksters.com/science/physics/', 'Topic': 'Electricity', 'Type': 'Physics'}

## Document 1

and Fukushima Daiichi (Japan).
●
The first nuclear powered submarine was the U.S.S. Nautilus which put out to
sea in 1954.
●
One uranium pellet can generate the same amount of energy as around 1,000
kilograms of coal.
●
T