In [1]:
!pip install llama-index-llms-openai
!pip install llama-index
!pip install llama-index-embeddings-huggingface


Collecting llama-index-core<0.12.0,>=0.11.7 (from llama-index-llms-openai)
  Using cached llama_index_core-0.11.23-py3-none-any.whl.metadata (2.5 kB)
Using cached llama_index_core-0.11.23-py3-none-any.whl (1.6 MB)
Installing collected packages: llama-index-core
  Attempting uninstall: llama-index-core
    Found existing installation: llama-index-core 0.12.3
    Uninstalling llama-index-core-0.12.3:
      Successfully uninstalled llama-index-core-0.12.3
Successfully installed llama-index-core-0.11.23


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index-embeddings-cohere 0.4.0 requires llama-index-core<0.13.0,>=0.12.0, but you have llama-index-core 0.11.23 which is incompatible.
llama-index-embeddings-huggingface 0.4.0 requires llama-index-core<0.13.0,>=0.12.0, but you have llama-index-core 0.11.23 which is incompatible.
llama-index-extractors-entity 0.3.0 requires llama-index-core<0.13.0,>=0.12.0, but you have llama-index-core 0.11.23 which is incompatible.
llama-index-readers-database 0.3.0 requires llama-index-core<0.13.0,>=0.12.0, but you have llama-index-core 0.11.23 which is incompatible.
llama-index-readers-deeplake 0.3.0 requires llama-index-core<0.13.0,>=0.12.0, but you have llama-index-core 0.11.23 which is incompatible.
llama-index-storage-docstore-mongodb 0.3.0 requires llama-index-core<0.13.0,>=0.12.0, but you have llama-index-core 0.11.2

Collecting llama-index-core<0.13.0,>=0.12.0 (from llama-index-embeddings-huggingface)
  Using cached llama_index_core-0.12.3-py3-none-any.whl.metadata (2.5 kB)
Using cached llama_index_core-0.12.3-py3-none-any.whl (1.6 MB)
Installing collected packages: llama-index-core
  Attempting uninstall: llama-index-core
    Found existing installation: llama-index-core 0.11.23
    Uninstalling llama-index-core-0.11.23:
      Successfully uninstalled llama-index-core-0.11.23
Successfully installed llama-index-core-0.12.3


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index 0.11.23 requires llama-index-core<0.12.0,>=0.11.23, but you have llama-index-core 0.12.3 which is incompatible.
llama-index-agent-openai 0.3.4 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.12.3 which is incompatible.
llama-index-cli 0.3.1 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.12.3 which is incompatible.
llama-index-embeddings-ollama 0.3.1 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.12.3 which is incompatible.
llama-index-embeddings-openai 0.2.5 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.12.3 which is incompatible.
llama-index-indices-managed-llama-cloud 0.4.0 requires llama-index-core<0.12.0,>=0.11.13.post1, but you have llama-index-core 0.12.3 which is incompatible.
llama-in

In [2]:
import os
import logging
from llama_index.core import Settings
from llama_index.core.node_parser import SentenceWindowNodeParser, SentenceSplitter
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.postprocessor import MetadataReplacementPostProcessor
from llama_index.llms.ollama import Ollama

# Configure logging
logging.basicConfig(level=logging.ERROR)

# Configure Ollama LLM and Embeddings
llm = Ollama(
    model="llama3.2:latest",
    base_url="http://localhost:11434",
    temperature=0.1,
)

embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-mpnet-base-v2", max_length=512
)

# Set global settings
Settings.llm = llm
Settings.embed_model = embed_model


In [3]:
# Create Sentence Window Node Parser
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text"
)

# Set a base text splitter for default indexing
text_splitter = SentenceSplitter()
Settings.text_splitter = text_splitter


In [None]:
# Download sample document
!curl -o ../data_ipcc/IPCC_AR6_WGII_Chapter03.pdf https://www.ipcc.ch/report/ar6/wg2/downloads/report/IPCC_AR6_WGII_Chapter03.pdf

In [4]:
from llama_index.core import SimpleDirectoryReader

# Load the document

documents = SimpleDirectoryReader(input_files=["../data_ipcc/IPCC_AR6_WGII_Chapter03.pdf"]).load_data()

In [5]:
len(documents)

172

In [6]:
# Extract nodes using Sentence Window Parser
nodes = node_parser.get_nodes_from_documents(documents)

# Extract base nodes with default text splitting
base_nodes = text_splitter.get_nodes_from_documents(documents)


In [7]:
print(len(nodes))
print(len(base_nodes))

11276
463


In [8]:
from llama_index.core import VectorStoreIndex

# Create indexes
sentence_index = VectorStoreIndex(nodes)
base_index = VectorStoreIndex(base_nodes)


In [9]:
# Query Engine with Metadata Replacement PostProcessor
query_engine = sentence_index.as_query_engine(
    similarity_top_k=2,
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ]
)

# Execute query
response = query_engine.query("What are the concerns surrounding the AMOC?")
print(response)

# Extract context window and original sentence
window = response.source_nodes[0].node.metadata["window"]
original_sentence = response.source_nodes[0].node.metadata["original_text"]

print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {original_sentence}")


The Atlantic Meridional Overturning Circulation (AMOC) is a subject of concern due to low confidence in reconstructed and modeled changes for the 20th century. There is also uncertainty regarding the quantification of AMOC changes, mainly due to disagreements among quantitative reconstructed and simulated trends. Additionally, direct observational records since the mid-2000s are considered too short to determine the relative contributions of internal variability, natural forcing, and anthropogenic forcing to AMOC change.
Window: 4.3.2.2, 9.6.3 (Fox-Kemper 
et al., 2021; Lee et al., 
2021)
Extreme sea levels
Relative sea level rise is driving a global increase 
in the frequency of extreme sea levels (high 
confidence).
 9.6.4 (Fox-Kemper et al., 
2021)
Rising mean relative sea level will continue to 
drive an increase in the frequency of extreme sea 
levels (high confidence).
 9.6.4 (Fox-Kemper et al., 
2021)
Ocean circulation
Ocean stratification
‘The upper ocean has become more stably

In [10]:
# Default query engine
base_query_engine = base_index.as_query_engine(similarity_top_k=2)

# Execute query
base_response = base_query_engine.query("What are the concerns surrounding the AMOC?")
print(base_response)


The Atlantic Meridional Overturning Circulation (AMOC) is a critical component of the Earth's climate system. However, its stability is under threat due to climate change.

Research suggests that human activities, particularly greenhouse gas emissions, may be weakening the AMOC. This could have severe consequences for global ocean circulation and regional climates, potentially leading to more frequent and intense heatwaves, droughts, and storms.

The concerns surrounding the AMOC are multifaceted:

1. **Weakening of the circulation**: Studies indicate that the AMOC is slowing down, which could lead to a reduction in heat transport from the equator towards the poles.
2. **Impacts on regional climates**: A weakening AMOC would likely have significant effects on regional climates, including increased temperatures and altered precipitation patterns.
3. **Consequences for marine ecosystems**: Changes in ocean circulation and temperature could have devastating impacts on marine ecosystems, l

In [11]:
# Compare retrieved nodes for both methods
print("Sentence Window Method:")
for source_node in response.source_nodes:
    print(source_node.node.metadata["original_text"])
    print("--------")

print("Base Index Method:")
for source_node in base_response.source_nodes:
    print("AMOC mentioned?", "AMOC" in source_node.node.text)
    print("--------")


Sentence Window Method:
2.3.3.4, 9.2.3 (Fox-Kemper 
et al., 2021; Gulev et al., 
2021)
The AMOC will decline over the 21st century 
(high confidence, but low confidence for 
quantitative projections).

--------
Over the 21st century, AMOC will very likely decline for all SSP 
scenarios but will not involve an abrupt collapse before 2100 (WGI 
AR6 Sections 4.3.2, 9.2.3.1; Fox-Kemper et al., 2021; Lee et al., 2021).

--------
Base Index Method:
AMOC mentioned? False
--------
AMOC mentioned? False
--------
