# Testing Pipelines

### Initiating observation via Phoenix

In [1]:
# %pip pip install arize-phoenix
# %pip install llama-index-callbacks-arize-phoenix
# observability
import phoenix as px
px.launch_app()

import llama_index.core
llama_index.core.set_global_handler("arize_phoenix", endpoint="http://localhost:6006/v1/traces")

🌍 To view the Phoenix app in your browser, visit http://localhost:6006/
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix


### Save default config

In [4]:
import pathlib
import yaml

from evidence_seeker.confirmation_analysis import ConfirmationAnalyzerConfig

configfile = pathlib.Path("../configs/confirmation_analysis_config_default.yaml")

default_config = ConfirmationAnalyzerConfig()
default_config_dict = default_config.model_dump()

configfile.write_text(yaml.dump(default_config_dict))


3489

## Confirmation Analysis Pipeline

### Visualizing Workflow

In [2]:

import os
from llama_index.utils.workflow import draw_all_possible_flows
from evidence_seeker.confirmation_analysis import (
    SimpleConfirmationAnalysisWorkflow
)

# create dir ../tmp if not exists
os.makedirs("../TMP", exist_ok=True)

draw_all_possible_flows(
    SimpleConfirmationAnalysisWorkflow, filename="../TMP/SimpleConfirmationAnalysisWorkflow.html"
)


None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


<class 'NoneType'>
<class 'llama_index.core.workflow.events.StopEvent'>
<class 'evidence_seeker.confirmation_analysis.MultipleChoiceConfirmationAnalysisEvent'>
<class 'evidence_seeker.confirmation_analysis.CollectAnalysesEvent'>
<class 'evidence_seeker.confirmation_analysis.FreetextConfirmationAnalysisEvent'>
../TMP/SimpleConfirmationAnalysisWorkflow.html


I0000 00:00:1733908524.128929  100746 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1733908524.164964  100746 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers


### Used example

In [1]:
from evidence_seeker.models import (
    CheckedClaim,
    Document
)

docs = [Document(text='There is high confidence that oxygen levels have \ndropped in many regions since the mid 20th century and \nthat the geographic range of many marine organisms has \nchanged over the last two decades. \n The amount of ocean warming observed since 1971 \nwill likely at least double by 2100 under a low warming \nscenario (SSP1-2.6) and will increase by 4–8 times under \na high warming scenario (SSP5-8.5).  Stratification (virtually \ncertain), acidification ( virtually certain ), deoxygenation \n(high confidence ) and marine heatwave frequency ( high \nconfidence) will continue to increase in the 21st century. \n While there is low confidence in 20th century AMOC change, \nit is very likely that AMOC will decline over the 21st century \n(Figure TS.11).  {2.3, 3.5, 3.6, 4.3.2, 5.3, 7.2, 9.2, Box\xa09.2, 12.4}\nIt is virtually certain that the global ocean has warmed since at least \n1971, representing about 90% of the increase in the global energy \ninventory (Section TS.3.1).  The ocean is currently warming faster than \nat any other time since at least the last deglacial transition (medium \nconfidence), with warming extending to depths well below 2000 m \n(very high confidence ).  It is extremely likely that human influence \nwas the main driver of this recent ocean warming. ', uid='1f47ce98-4105-4ddc-98a9-c4956dab2000', metadata={'page_label': '74', 'file_name': 'IPCC_AR6_WGI_TS.pdf', 'author': 'IPCC Working Group I', 'original_text': 'While there is low confidence in 20th century AMOC change, \nit is very likely that AMOC will decline over the 21st century \n(Figure TS.11). '}),
 Document(text='Based on recent refined \nanalyses of the available observations, there is high confidence  \nthat it increased by 4.9 ± 1.5% from 1970–2018, which is about \ntwice as much as assessed in SROCC, and will continue to increase \nthroughout the 21st century at a rate depending on the emissions \nscenario (virtually certain).  {2.3.3, 9.2.1}\nIt is virtually certain that since 1950 near-surface high-salinity \nregions have become more saline, while low-salinity regions have \nbecome fresher, with medium confidence  that this is linked to an \nintensification of the hydrological cycle (Box TS.6).  It is extremely \nlikely that human influence has contributed to this salinity change \nand that the large-scale pattern will grow in amplitude over the 21st \ncentury (medium confidence).  {2.3.3, 3.5.2, 9.2.2, 12.4.8}\nThe AMOC was relatively stable during the past 8000 years (medium \nconfidence).  There is low confidence in the quantification of AMOC \nchanges in the 20th century because of low agreement in quantitative \nreconstructed and simulated trends, missing key processes in both \nmodels and measurements used for formulating proxies, and new \nmodel evaluations.  Direct observational records since the mid-2000s \nare too short to determine the relative contributions of internal \nvariability, natural forcing and anthropogenic forcing to AMOC \nchange (high confidence).  An AMOC decline over the 21st century \nis very likely for all SSP scenarios (Figure TS.11b); a possible abrupt \ndecline is assessed further in Box TS.3. ', uid='6fcd6c0f-99a1-48e7-881f-f79758c54769', metadata={'page_label': '74', 'file_name': 'IPCC_AR6_WGI_TS.pdf', 'author': 'IPCC Working Group I', 'original_text': '{2.3.3, 3.5.2, 9.2.2, 12.4.8}\nThe AMOC was relatively stable during the past 8000 years (medium \nconfidence). '}),
 Document(text='{2.3.3, 3.5.2, 9.2.2, 12.4.8}\nThe AMOC was relatively stable during the past 8000 years (medium \nconfidence).  There is low confidence in the quantification of AMOC \nchanges in the 20th century because of low agreement in quantitative \nreconstructed and simulated trends, missing key processes in both \nmodels and measurements used for formulating proxies, and new \nmodel evaluations.  Direct observational records since the mid-2000s \nare too short to determine the relative contributions of internal \nvariability, natural forcing and anthropogenic forcing to AMOC \nchange (high confidence).  An AMOC decline over the 21st century \nis very likely for all SSP scenarios (Figure TS.11b); a possible abrupt \ndecline is assessed further in Box TS.3.  {2.3.3, 3.5.4, 4.3.2, 8.6.1, \n9.2.3, Cross-Chapter Box\xa012.3}\nThere is high confidence that many ocean currents will change in \nthe 21st century in response to changes in wind stress.   There is low \nconfidence in 21st century change of Southern Ocean circulation, \ndespite high confidence  that it is sensitive to changes in wind \npatterns and increased ice-shelf melt.  Western boundary currents \nand subtropical gyres have shifted poleward since 1993 ( medium \nconfidence). ', uid='f52c120f-ff9c-4893-822e-bfca72eaa9c6', metadata={'page_label': '74', 'file_name': 'IPCC_AR6_WGI_TS.pdf', 'author': 'IPCC Working Group I', 'original_text': 'An AMOC decline over the 21st century \nis very likely for all SSP scenarios (Figure TS.11b); a possible abrupt \ndecline is assessed further in Box TS.3. '}),
 Document(text='73\nTechnical Summary\nTS\nBox TS.3 (continued)\nWhile there is medium confidence  that the projected decline in the AMOC (Section TS.2.4) will not involve an abrupt collapse \nbefore\xa02100, such a collapse might be triggered by an unexpected meltwater influx from the Greenland Ice Sheet.  If an AMOC collapse \nwere to occur, it would very likely cause abrupt shifts in the regional weather patterns and water cycle, such as a southward shift in the \ntropical rain belt, and could result in weakening of the African and Asian monsoons, strengthening of Southern Hemisphere monsoons, \nand drying in Europe.  (See also Boxes TS.9 and TS.13).  {4.7.2, 8.6.1, 9.2.3}\nVery rare extremes and compound or concurrent events, such as the 2018 concurrent heatwaves across the Northern Hemisphere, are \noften associated with large impacts.  The changing climate state is already altering the likelihood of extreme events, such as decadal \ndroughts and extreme sea levels, and will continue to do so under future warming. ', uid='1f8242fe-50a2-45e2-bfda-986466f966d4', metadata={'page_label': '73', 'file_name': 'IPCC_AR6_WGI_TS.pdf', 'author': 'IPCC Working Group I', 'original_text': 'If an AMOC collapse \nwere to occur, it would very likely cause abrupt shifts in the regional weather patterns and water cycle, such as a southward shift in the \ntropical rain belt, and could result in weakening of the African and Asian monsoons, strengthening of Southern Hemisphere monsoons, \nand drying in Europe. '}),
 Document(text='Some processes suspected of having tipping points, such as the Atlantic Meridional Overturning \nCirculation (AMOC), have been found to often undergo recovery after temperature stabilization with a time delay ( low confidence). \n However, substantial irreversibility is further substantiated for some cryosphere changes, ocean warming, sea level rise, and ocean \nacidification.  {4.7.2, 5.3.3, 5.4.9, 9.2.2, 9.2.4, 9.4.1, 9.4.2, 9.6.3}\nSome climate system components are slow to respond, such as the deep ocean overturning circulation and the ice sheets.  It is likely that \nunder stabilization of global warming at 1.5°C, 2.0°C or 3.0°C relative to 1850–1900, the AMOC will continue to weaken for several \ndecades by about 15%, 20% and 30% of its strength and then recover to pre-decline values over several centuries (medium confidence). \n At sustained warming levels between 2°C and 3°C, there is limited evidence that the Greenland and West Antarctic ice sheets will be lost \nalmost completely and irreversibly over multiple millennia; both the probability of their complete loss and the rate of mass loss increases \nwith higher surface temperatures ( high confidence).  At sustained warming levels between 3°C and 5°C, near-complete loss of the \nGreenland Ice Sheet and complete loss of the West Antarctic Ice Sheet is projected to occur irreversibly over multiple millennia (medium \nconfidence); with substantial parts or all of Wilkes Subglacial Basin in East Antarctica lost over multiple millennia (low confidence).  Early-\nwarning signals of accelerated sea level rise from Antarctica could possibly be observed within the next few decades. ', uid='87697b86-aa91-4bdb-b02c-7dbd2af4dd9c', metadata={'page_label': '106', 'file_name': 'IPCC_AR6_WGI_TS.pdf', 'author': 'IPCC Working Group I', 'original_text': 'It is likely that \nunder stabilization of global warming at 1.5°C, 2.0°C or 3.0°C relative to 1850–1900, the AMOC will continue to weaken for several \ndecades by about 15%, 20% and 30% of its strength and then recover to pre-decline values over several centuries (medium confidence). \n'}),
 Document(text='{2.3.3, 9.2.1}\nIt is virtually certain that since 1950 near-surface high-salinity \nregions have become more saline, while low-salinity regions have \nbecome fresher, with medium confidence  that this is linked to an \nintensification of the hydrological cycle (Box TS.6).  It is extremely \nlikely that human influence has contributed to this salinity change \nand that the large-scale pattern will grow in amplitude over the 21st \ncentury (medium confidence).  {2.3.3, 3.5.2, 9.2.2, 12.4.8}\nThe AMOC was relatively stable during the past 8000 years (medium \nconfidence).  There is low confidence in the quantification of AMOC \nchanges in the 20th century because of low agreement in quantitative \nreconstructed and simulated trends, missing key processes in both \nmodels and measurements used for formulating proxies, and new \nmodel evaluations.  Direct observational records since the mid-2000s \nare too short to determine the relative contributions of internal \nvariability, natural forcing and anthropogenic forcing to AMOC \nchange (high confidence).  An AMOC decline over the 21st century \nis very likely for all SSP scenarios (Figure TS.11b); a possible abrupt \ndecline is assessed further in Box TS.3.  {2.3.3, 3.5.4, 4.3.2, 8.6.1, \n9.2.3, Cross-Chapter Box\xa012.3}\nThere is high confidence that many ocean currents will change in \nthe 21st century in response to changes in wind stress.  ', uid='abaed8de-7f35-40d8-bc9d-2a6a8e543586', metadata={'page_label': '74', 'file_name': 'IPCC_AR6_WGI_TS.pdf', 'author': 'IPCC Working Group I', 'original_text': 'There is low confidence in the quantification of AMOC \nchanges in the 20th century because of low agreement in quantitative \nreconstructed and simulated trends, missing key processes in both \nmodels and measurements used for formulating proxies, and new \nmodel evaluations. '}),
 Document(text='Models that exhibit such tipping points are characterized by abrupt changes once the threshold is crossed, and even \na return to pre-threshold surface temperatures or to atmospheric carbon dioxide concentrations does not guarantee \nthat the tipping elements return to their pre-threshold state.  Monitoring and early warning systems are being put into \nplace to observe tipping elements in the climate system.  {1.3, 1.4.4, 1.5, 4.3.2, Table\xa04.10, 5.3.4, 5.4.9, 7.5.3, 9.2.2, \n9.2.4, 9.4.1, 9.4.2, 9.6.3, Cross-chapter Box\xa012.1}\nUnderstanding of multi-decadal reversibility (i.e., the system returns to the previous climate state within multiple decades after \nthe radiative forcing is removed) has improved since AR5 for many atmospheric, land surface and sea ice climate metrics following \nsea surface temperature recovery.  Some processes suspected of having tipping points, such as the Atlantic Meridional Overturning \nCirculation (AMOC), have been found to often undergo recovery after temperature stabilization with a time delay ( low confidence). \n However, substantial irreversibility is further substantiated for some cryosphere changes, ocean warming, sea level rise, and ocean \nacidification.  {4.7.2, 5.3.3, 5.4.9, 9.2.2, 9.2.4, 9.4.1, 9.4.2, 9.6.3}\nSome climate system components are slow to respond, such as the deep ocean overturning circulation and the ice sheets.  It is likely that \nunder stabilization of global warming at 1.5°C, 2.0°C or 3.0°C relative to 1850–1900, the AMOC will continue to weaken for several \ndecades by about 15%, 20% and 30% of its strength and then recover to pre-decline values over several centuries (medium confidence). \n', uid='64dce431-7e8d-46cd-9dd8-dc7e2ac18443', metadata={'page_label': '106', 'file_name': 'IPCC_AR6_WGI_TS.pdf', 'author': 'IPCC Working Group I', 'original_text': 'Some processes suspected of having tipping points, such as the Atlantic Meridional Overturning \nCirculation (AMOC), have been found to often undergo recovery after temperature stabilization with a time delay ( low confidence). \n'}),
 Document(text='72\nTechnical Summary\nTS\nBox TS.3 | Low-likelihood, High-warming Storylines\nFuture global warming exceeding the assessed very likely range cannot be ruled out and is potentially associated \nwith the highest risks for society and ecosystems.  Such low-likelihood, high-warming storylines tend to exhibit \nsubstantially greater changes in the intensity of regional drying and wetting than the multi-model mean.  Even at \nlevels of warming within the very likely range, global and regional low-likelihood outcomes might occur, such as large \nprecipitation changes, additional sea level rise associated with collapsing ice sheets (see Box TS.4), or abrupt ocean \ncirculation changes.  While there is medium confidence that the Atlantic Meridional Overturning Circulation (AMOC) \nwill not experience an abrupt collapse before 2100, if it were to occur, it would very likely cause abrupt shifts in \nregional weather patterns and water cycle.  The probability of these low-likelihood outcomes increases with higher \nglobal warming levels.  If the real-world climate sensitivity lies at the high end of the assessed range, then global \nand regional changes substantially outside the very likely range projections occur for a given emissions scenario. \n With increasing global warming, some very rare extremes and some compound events (multivariate or concurrent \nextremes) with low likelihood in past and current climate will become more frequent, and there is a\xa0higher chance \nthat events unprecedented in the observational record occur ( high confidence). ', uid='d7162beb-25d2-4653-aecd-2734bfd39693', metadata={'page_label': '72', 'file_name': 'IPCC_AR6_WGI_TS.pdf', 'author': 'IPCC Working Group I', 'original_text': 'While there is medium confidence that the Atlantic Meridional Overturning Circulation (AMOC) \nwill not experience an abrupt collapse before 2100, if it were to occur, it would very likely cause abrupt shifts in \nregional weather patterns and water cycle. '})]

claim = CheckedClaim(
    text="The AMOC is slowing down",
    negation="The AMOC is not changing",
    uid="123",
    documents=docs[0:2]
)



None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


### Running `ConfirmationAnalyzer`

In [3]:
from evidence_seeker.confirmation_analysis import (
    ConfirmationAnalyzer, ConfirmationAnalyzerConfig
)

#config_file = "../configs/simple_confirmation_analysis_config.yaml" 
config = ConfirmationAnalyzerConfig()

confirmation_analyzer = ConfirmationAnalyzer(config=config)
checked_claim = await confirmation_analyzer(claim=claim) 
#print(checked_claim.confirmation_by_document)
print(checked_claim)

[32m2024-12-12 09:13:08.329[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.backend[0m:[36mget_openai_llm[0m:[36m117[0m - [34m[1mInstantiating OpenAILike model (model: meta-llama/Llama-3.1-70B-Instruct,base_url: https://huggingface.co/api/integrations/dgx/v1).[0m
[32m2024-12-12 09:13:08.332[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.confirmation_analysis.workflows[0m:[36mfreetext_analysis[0m:[36m72[0m - [34m[1mConfirmation analysis.[0m
[32m2024-12-12 09:13:08.378[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.confirmation_analysis.workflows[0m:[36mfreetext_analysis[0m:[36m72[0m - [34m[1mConfirmation analysis.[0m


Fetching api key via env var: HF_TOKEN_EVIDENCE_SEEKER


[32m2024-12-12 09:13:12.467[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.backend[0m:[36mget_openai_llm[0m:[36m117[0m - [34m[1mInstantiating OpenAILike model (model: meta-llama/Llama-3.2-3B-Instruct,base_url: https://dchi8b9swca6gxbe.eu-west-1.aws.endpoints.huggingface.cloud/v1/).[0m
[32m2024-12-12 09:13:12.468[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.confirmation_analysis.workflows[0m:[36mmultiple_choice[0m:[36m102[0m - [34m[1mUsed regex in MultipleChoiceConfirmationAnalysis: [ABC][0m
[32m2024-12-12 09:13:12.517[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.backend[0m:[36mget_openai_llm[0m:[36m117[0m - [34m[1mInstantiating OpenAILike model (model: meta-llama/Llama-3.2-3B-Instruct,base_url: https://dchi8b9swca6gxbe.eu-west-1.aws.endpoints.huggingface.cloud/v1/).[0m
[32m2024-12-12 09:13:12.518[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.confirmation_analysis.workflows[0m:[36mmultiple_choice[0m:[36m102[0m - [34m[1mUsed regex i

Fetching api key via env var: HF_TOKEN_EVIDENCE_SEEKER
Fetching api key via env var: HF_TOKEN_EVIDENCE_SEEKER


[32m2024-12-12 09:13:14.989[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.confirmation_analysis.workflows[0m:[36mmultiple_choice[0m:[36m116[0m - [34m[1mReturned probabilities: {'A': np.float64(0.9999999999999059), 'C': np.float64(9.357622968839294e-14), 'B': np.float64(6.711761854005965e-16)}[0m
[32m2024-12-12 09:13:14.992[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.confirmation_analysis.workflows[0m:[36mmultiple_choice[0m:[36m116[0m - [34m[1mReturned probabilities: {'B': np.float64(1.0), 'C': np.float64(1.4828213355760042e-17), 'A': np.float64(4.821267946339375e-21)}[0m
[32m2024-12-12 09:13:14.995[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.confirmation_analysis.workflows[0m:[36mfreetext_analysis[0m:[36m72[0m - [34m[1mConfirmation analysis.[0m
[32m2024-12-12 09:13:15.002[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.confirmation_analysis.workflows[0m:[36mfreetext_analysis[0m:[36m72[0m - [34m[1mConfirmation analysis.[0m
[32m2

Fetching api key via env var: HF_TOKEN_EVIDENCE_SEEKER
Fetching api key via env var: HF_TOKEN_EVIDENCE_SEEKER


[32m2024-12-12 09:13:22.974[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.confirmation_analysis.workflows[0m:[36mmultiple_choice[0m:[36m116[0m - [34m[1mReturned probabilities: {'C': np.float64(0.9999999999999856), 'A': np.float64(1.435037960133805e-14), 'B': np.float64(5.525960833850168e-23)}[0m
[32m2024-12-12 09:13:22.978[0m | [34m[1mDEBUG   [0m | [36mevidence_seeker.confirmation_analysis.workflows[0m:[36mmultiple_choice[0m:[36m116[0m - [34m[1mReturned probabilities: {'B': np.float64(0.999999999999263), 'C': np.float64(7.360340600573476e-13), 'A': np.float64(8.095930175206634e-16)}[0m


text='The AMOC is slowing down' negation='The AMOC is not changing' uid='123' n_evidence=None average_confirmation=None evidential_uncertainty=None verbalized_confirmation=None documents=[Document(text='There is high confidence that oxygen levels have \ndropped in many regions since the mid 20th century and \nthat the geographic range of many marine organisms has \nchanged over the last two decades. \n The amount of ocean warming observed since 1971 \nwill likely at least double by 2100 under a low warming \nscenario (SSP1-2.6) and will increase by 4–8 times under \na high warming scenario (SSP5-8.5).  Stratification (virtually \ncertain), acidification ( virtually certain ), deoxygenation \n(high confidence ) and marine heatwave frequency ( high \nconfidence) will continue to increase in the 21st century. \n While there is low confidence in 20th century AMOC change, \nit is very likely that AMOC will decline over the 21st century \n(Figure TS.11).  {2.3, 3.5, 3.6, 4.3.2, 5.3, 7.2, 9.2

### Running the workflow directly

In [4]:
from pprint import pprint

from evidence_seeker.confirmation_analysis import (
    SimpleConfirmationAnalysisWorkflow
)

config_file = "../configs/simple_confirmation_analysis_config.yaml" 
pw = SimpleConfirmationAnalysisWorkflow(config_file=config_file)

evidence_item = docs[0].text

result = await pw.run(
    clarified_claim=claim,
    evidence_item=evidence_item
)
pprint(result)
print(result['confirmation'])

Loading config from ../configs/simple_confirmation_analysis_config.yaml
Used api key name: kideku_toxicity_app_nim
Instantiating OpenAILike model (model: meta-llama/Llama-3.1-70B-Instruct,base_url: https://huggingface.co/api/integrations/dgx/v1).
Confirmation analysis.
Using workflow model: model_1 for freetext_confirmation_analysis_event
Confirmation analysis.
Using workflow model: model_1 for freetext_confirmation_analysis_event
Used regex in multiple_choice_confirmation_analysis_event: [AB]
Using event specific model: model_3 for multiple_choice_confirmation_analysis_event
Used api key name: token_debatelab_hf_endpoints
Instantiating OpenAILike model (model: meta-llama/Llama-3.2-3B-Instruct,base_url: https://dchi8b9swca6gxbe.eu-west-1.aws.endpoints.huggingface.cloud/v1/).
Using event specific model: model_3 for multiple_choice_confirmation_analysis_event
Used regex in multiple_choice_confirmation_analysis_event: [AB]
Using event specific model: model_3 for multiple_choice_confirmati