# Option 1


 - SentenceTransformers Splitter
 - SentenceTransformerEmbeddings
 - Annoy
 - MultiQueryRetriever
 - RetrievalQA with "Reduce"
 - RetrievalQA with "Re-rank"
 - RetrievalQA with "Stuff" and memory
 - Memory

In [14]:
import pandas as pd
import os
import openai
import logging
import sys, pathlib, fitz
from typing import List
from langchain import LLMChain
from pydantic import BaseModel, Field
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from langchain.text_splitter import SentenceTransformersTokenTextSplitter
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.chains import RetrievalQA
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.document_loaders import PyMuPDFLoader
from langchain.vectorstores import Annoy
from langchain.chat_models import ChatOpenAI

In [15]:
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

In [3]:
# os.environ["OPENAI_API_KEY"] = ""


In [16]:
# Helper function for printing docs

def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

## Extract text from a PyMuPDF

According to [LangChain](https://python.langchain.com/docs/modules/data_connection/document_loaders/how_to/pdf#using-pymupdf), PyMuPDFLoader is the fastest PDF parsing option and contains detailed metadata about the PDF and its pages, as well as returns one document per page.

In [17]:
# Read in some Data
loader = PyMuPDFLoader("hms/fhl_2014_Charifson_34622 (1).pdf")

# Now that we have our PDF document loaded into a loader object, we move onto text splitters
pages = loader.load()

In [18]:
len(pages)

19

## Text Splitters

The concept of text splitters revolves around the need to break down long pieces of text into smaller, meaningful chunks. 

The goal is to split the text in a way that keeps semantically related pieces together, with th definition of "semantically related depending on the specific type of text being processed.

Language models have a token limit. You should not exceed the token limit. When you split your text into chunks it is therefore a good idea to count the number of tokens. There are many tokenizers. When you count tokens in your text you should use the same tokenizer as used in the language model.

Text splitter allow customization:

1. Chunk_size parameter determines the number of text inputs that will be grouped together as a single request or chunk. This parameter allows you to control the granularity of the chnks and how much text is processed together at once.


2. Chunk_overlap parameter refers to the maximum overlap between consecutive chunks. By overlap inclusion, the TextSplitter ensures that is a continuity and context maintained between the chunks i.e. preserves the flow of information and avoids abrupt transitions between chunks.

### SentenceTransformers

The SentenceTransformersTokenTextSplitter is a specialized text splitter for use with the sentence-transformer models. The default behaviour is to split the text into chunks that fit the token window of the sentence transformer model that you would like to use.

In [7]:
text_splitter = SentenceTransformersTokenTextSplitter(chunk_size=1000,chunk_overlap=20, length_function = len)

In [8]:
docs = text_splitter.split_documents(pages)

In [9]:
len(docs)

28

In [10]:
docs[0].metadata

{'source': 'hms/fhl_2014_Charifson_34622 (1).pdf',
 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf',
 'page': 0,
 'total_pages': 19,
 'format': 'PDF 1.5',
 'title': '',
 'author': 'David',
 'subject': '',
 'keywords': '',
 'creator': 'Microsoft® Office Word 2007',
 'producer': 'Microsoft® Office Word 2007',
 'creationDate': "D:20140723120649-04'00'",
 'modDate': "D:20140723120649-04'00'",
 'trapped': ''}

## Text embedding

Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.


SentenceTransformers embeddings are called using the HuggingFaceEmbeddings integration. We have also added an alias for SentenceTransformerEmbeddings for users who are more familiar with directly using that package.

SentenceTransformers is a python package that can generate text and image embeddings, originating from Sentence-BERT

In [11]:
embeddings = SentenceTransformerEmbeddings()

### Annoy

Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.


NOTE: Annoy is read-only - once the index is built you cannot add any more emebddings!
If you want to progressively add new entries to your VectorStore then better choose an alternative!

In [12]:
vs2 = Annoy.from_documents(docs, embeddings)

In [13]:
query = "what is the paper about?"

In [14]:
#the score is a distance metric, so lower is better

vs2.similarity_search_with_score(query, k=3)

[(Document(page_content='and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 4, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
  1.3306306600570679),
 (Document(page_content='charifson 13 table 1 : sma regressions of carapace width and propus measures. a ) the relationship between carapace width and propal height. x is carapace width and y is propal height. b ) the relationship between carapace width and propal height. x is carapace width and y is propal width. sma regression a n carapace width vs propal height r2 female h. nudus 13 y = 0. 273 * x - 0. 678 0. 976 male h. nudus 13 y = 0. 311 *

In [15]:
query2 = "What are the scientific names of the species mentioned in this paper?"

In [16]:
vs2.similarity_search_with_score(query2, k=3)

[(Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
  1.0272287130355835),
 (Document(page_content='charifson 15 figure 1 : relationship of carapace width and propal height in hemigrapsus. line of best fit from sma regression. see table 1a for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (

In [17]:
query3 = "Does this paper contain observational or experimental research conducted in the natural environment or with organisms collected in nature?"

In [18]:
vs2.similarity_search_with_score(query3, k=3)

[(Document(page_content='charifson 19 figure 5 : consumption rates by individual h. nudus. mean consumption rates ( n = 8 trials ) of 3 female ( fe1 to fe3 ) and 3 male ( ma1 to ma3 ) h. nudus. crabs fe1, fe3, and ma3 did not consume snails. the individuals that eat snails did not differ in their consumption rates ( f2, 21 = 2. 52, p = 0. 104 ). error bars represent standard error of the mean. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1 fe1 fe2 fe3 ma1 ma2 ma3 mean consumption rate ( snails consumed / hour ) individual h. nudus', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 18, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
  1.056341290473938),
 (Document(page_content='#

In [19]:
query4 = "Terms that may be used to identify an observation include “in the field”, “this study”, “observed”, “taken”, “collected”, “sampled”, “collection”, “seen”, “harvested”, “found”, etc. Does the paper include one or more observations? "

In [20]:
vs2.similarity_search_with_score(query4, k=4)

[(Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
  1.2470710277557373),
 (Document(page_content='vincta in the field. it should be noted that h. nudus and la. vincta usually occupy different portions of the intertidal and may have little contact with each other, unlike the relationship between h. nudus and li. scutulata. there is some potential for overlap in the winter when la. vi

In [21]:
query5 = "Does this paper contain observational or experimental research conducted in the natural environment or with organisms collected in nature?"

vs2.similarity_search_with_score(query5, k=4)

[(Document(page_content='charifson 19 figure 5 : consumption rates by individual h. nudus. mean consumption rates ( n = 8 trials ) of 3 female ( fe1 to fe3 ) and 3 male ( ma1 to ma3 ) h. nudus. crabs fe1, fe3, and ma3 did not consume snails. the individuals that eat snails did not differ in their consumption rates ( f2, 21 = 2. 52, p = 0. 104 ). error bars represent standard error of the mean. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1 fe1 fe2 fe3 ma1 ma2 ma3 mean consumption rate ( snails consumed / hour ) individual h. nudus', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 18, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
  1.056341290473938),
 (Document(page_content='#

In [22]:
query6= "Does the paper mention where the species were observed or collected, and if so, what locations are given?"

vs2.similarity_search_with_score(query6, k=4)

[(Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
  1.018962025642395),
 (Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_201

In [23]:
query7 = "In what habitat were the species found?"

vs2.similarity_search_with_score(query7, k=4)

[(Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
  0.9531382322311401),
 (Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_20

In [24]:
query8 = "Does the paper mention a year, date and/or time that species were collected or observed, and if so, what was mentioned?"

vs2.similarity_search_with_score(query8, k=4)

[(Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
  1.0603522062301636),
 (Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2

In [37]:
query9 = "Can you give a more specific location?"

vs2.similarity_search_with_score(query9, k=4)

[(Document(page_content='to right sides of the propus. all claw measurements were made on the left cheliped.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 3, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
  1.2532789707183838),
 (Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 

In [38]:
query10 = "In what habitat were the species found?"

vs2.similarity_search_with_score(query10, k=4)

[(Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
  0.9531382322311401),
 (Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_20

In [39]:
query11 = "Are any coordinate locations given in latitude / longitude, and if so, what are they?"

vs2.similarity_search_with_score(query11, k=4)

[(Document(page_content='to right sides of the propus. all claw measurements were made on the left cheliped.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 3, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
  1.2676454782485962),
 (Document(page_content='charifson 15 figure 1 : relationship of carapace width and propal height in hemigrapsus. line of best fit from sma regression. see table 1a for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 14, 'total_pages': 19, 'format': 'PDF 1.5', 'tit

In [40]:
query12 = "Are there any maps, figures, tables or diagrams in the paper??"

vs2.similarity_search_with_score(query12, k=4)

[(Document(page_content='to right sides of the propus. all claw measurements were made on the left cheliped.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 3, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
  1.3229398727416992),
 (Document(page_content='charifson 16 figure 2 : relationship of carapace width and propal width in hemigrapsus. line of best fit from sma regression. see table 1b for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 15, 'total_pages': 19, 'format': 'PDF 1.5', 'titl

# MultiQueryRetriever

The MultiQueryRetriever automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the MultiQueryRetriever might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results.


Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded documents based on "distance". But, retrieval may produce difference results with subtle changes in query wording or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.

### Supplying own Prompt

In [41]:
#Set up logging for the queries

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [42]:
# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
    # "lines" is the key (attribute name) of the parsed output
    lines: List[str] = Field(description="Lines of text")


class LineListOutputParser(PydanticOutputParser):
    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = text.strip().split("\n")
        return LineList(lines=lines)


output_parser = LineListOutputParser()

In [43]:
llm = ChatOpenAI(temperature=0)

retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=vs2.as_retriever(), llm=llm
)

In [44]:
retriever_from_llm.get_relevant_documents(query = query)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you provide a summary of the content in the paper?', '2. Could you give me an overview of the main topics covered in the paper?', '3. What are the key themes or subjects discussed in the paper?']


[Document(page_content='and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 4, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 13 table 1 : sma regressions of carapace width and propus measures. a ) the relationship between carapace width and propal height. x is carapace width and y is propal height. b ) the relationship between carapace width and propal height. x is carapace width and y is propal width. sma regression a n carapace width vs propal height r2 female h. nudus 13 y = 0. 273 * x - 0. 678 0. 976 male h. nudus 13 y = 0. 311 * x - 1. 385 0. 868 female

In [45]:
retriever_from_llm.get_relevant_documents(query= query2)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you provide me with the scientific names of the species discussed in this paper?', "2. I'm interested in knowing the scientific names of the species mentioned in this paper. Could you help me with that?", '3. Could you please list the scientific names of the species that are referenced in this paper?']


[Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 15 figure 1 : relationship of carapace width and propal height in hemigrapsus. line of best fit from sma regression. see table 1a for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hm

In [46]:
retriever_from_llm.get_relevant_documents(query= query3)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Is there any research in this paper that involves observations or experiments conducted in the natural environment or with organisms collected from nature?', '2. Are there any sections in this paper that discuss observational or experimental research carried out in the natural environment or with organisms collected in nature?', '3. Does this paper include any information about research conducted in the natural environment or with organisms collected from nature, either through observations or experiments?']


[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 19 figure 5 : consumption rates by individual h. nudus. mean consumption rates ( n = 8 trials ) of 3 female ( fe1 to fe3 ) and 3 male ( ma1 to ma3 ) h. nudus. crabs fe1, fe3, and ma3 did not consume snails. the individuals that eat snails did not differ in their consumption rates ( f2, 21 = 2. 52, p =

In [47]:
retriever_from_llm.get_relevant_documents(query= query4)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Are there any mentions of observations in the paper, using terms such as "in the field," "this study," "observed," "taken," "collected," "sampled," "collection," "seen," "harvested," "found," etc.?', '2. Does the paper discuss any instances of observations, using terms like "in the field," "this study," "observed," "taken," "collected," "sampled," "collection," "seen," "harvested," "found," etc.?', '3. Are there any references to observations in the paper, using phrases such as "in the field," "this study," "observed," "taken," "collected," "sampled," "collection," "seen," "harvested," "found," etc.?']


[Document(page_content='and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 4, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keyword

In [48]:
retriever_from_llm.get_relevant_documents(query= query5)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Is there any research in this paper that involves observations or experiments conducted in the natural environment or with organisms collected from nature?', '2. Are there any sections in this paper that discuss observational or experimental research carried out in the natural environment or with organisms collected in nature?', '3. Does this paper include any information about research conducted in the natural environment or with organisms collected from nature, either through observations or experiments?']


[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 19 figure 5 : consumption rates by individual h. nudus. mean consumption rates ( n = 8 trials ) of 3 female ( fe1 to fe3 ) and 3 male ( ma1 to ma3 ) h. nudus. crabs fe1, fe3, and ma3 did not consume snails. the individuals that eat snails did not differ in their consumption rates ( f2, 21 = 2. 52, p =

In [49]:
retriever_from_llm.get_relevant_documents(query= query6)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Are there any references in the paper to the locations where the species were observed or collected?', '2. Does the paper provide any information about the specific locations where the species mentioned were observed or collected?', '3. Can I find any details in the paper about the places where the species mentioned were observed or collected?']


[Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pd

In [50]:
retriever_from_llm.get_relevant_documents(query= query7)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What is the natural environment where the species were discovered?', '2. Can you provide information about the habitat where the species were located?', '3. Where were the species typically found in terms of their habitat?']


[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).p

In [51]:
retriever_from_llm.get_relevant_documents(query= query8)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What information does the paper provide about the collection or observation of species, including any mention of a year, date, or time?', '2. Are there any references in the paper to the year, date, or time of species collection or observation?', '3. Can you find any details in the paper regarding the specific year, date, or time when the species were collected or observed?']


[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 17 figure 3 : differences in propal height : carapace width ratio between sex and species. the sex factor was statistically significant ( f = 125. 6. p < 0. 001 ), while the species factor was insignificant ( f > 0. 01, p = 0. 983 ). there was a significant interaction ( f = 4. 39, p = 0. 042 ). error

In [52]:
retriever_from_llm.get_relevant_documents(query= query9)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Could you please provide a more precise location?', '2. Can you be more specific about the exact location you are referring to?', '3. Is it possible to give a more detailed description of the specific location you are interested in?']


[Document(page_content='to right sides of the propus. all claw measurements were made on the left cheliped.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 3, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 15 figure 1 : relationship of carapace width and propal height in hemigrapsus. line of best fit from sma regression. see table 1a for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 14, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David

In [53]:
retriever_from_llm.get_relevant_documents(query= query10)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What is the natural environment where the species were discovered?', '2. Can you provide information about the habitat where the species were located?', '3. Where were the species typically found in terms of their habitat?']


[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).p

In [54]:
retriever_from_llm.get_relevant_documents(query= query11)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you provide the latitude and longitude coordinates for any given locations?', '2. Are there any specific latitude and longitude coordinates available for the mentioned locations?', '3. What are the latitude and longitude coordinates associated with the given locations, if any?']


[Document(page_content='charifson 15 figure 1 : relationship of carapace width and propal height in hemigrapsus. line of best fit from sma regression. see table 1a for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 14, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 16 figure 2 : relationship of carapace width and propal width in hemigrapsus. line of best fit from sma regression. see table 1b for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source'

In [55]:
retriever_from_llm.get_relevant_documents(query= query12)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Does the paper contain any visual aids such as maps, figures, tables, or diagrams?', '2. Are there any graphical representations like maps, figures, tables, or diagrams included in the paper?', '3. Can I find any maps, figures, tables, or diagrams within the paper?']


[Document(page_content='and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 4, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 16 figure 2 : relationship of carapace width and propal width in hemigrapsus. line of best fit from sma regression. see table 1b for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 15, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 

In [56]:
template = """
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
{context}
</ctx>
------
<hs>
{history}
</hs>
------
{question}
Answer:
"""
prompt = PromptTemplate(
    input_variables=["history", "context", "question"],
    template=template,
)

# Retrieval QA


## Map Refine Document Chain

The refine documents chain constructs a response by looping over the input documents and iteratively updating its answer. For each document, it passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to get a new answer.

Since the Refine chain only passes a single document to the LLM at a time, it is well-suited for tasks that require analyzing more documents than can fit in the model's context. The obvious tradeoff is that this chain will make far more LLM calls than, for example, the Stuff documents chain. There are also certain tasks which are difficult to accomplish iteratively. For example, the Refine chain can perform poorly when documents frequently cross-reference one another or when a task requires detailed information from many documents.

In [57]:
llm = ChatOpenAI()

In [58]:
qa_refine = RetrievalQA.from_chain_type(llm =llm, chain_type = "refine", 
                                        retriever = vs2.as_retriever(),
                                       return_source_documents = True
                                       )

In [59]:
query = "what is this paper about?"

#This format when we include return_source_documents
result = qa_refine({"query": query})



In [60]:
result["result"]

'Based on the additional context provided, it appears that the paper focuses on analyzing the consumption rates of individual H. nudus (Hermit crabs) and comparing the mean consumption rates between female and male crabs. The study includes data from eight trials, involving three female crabs (fe1 to fe3) and three male crabs (ma1 to ma3). The paper investigates whether there are differences in consumption rates among individuals that consumed snails and those that did not. The authors report the mean consumption rates along with the standard error of the mean and conduct statistical analysis, including a comparison of consumption rates between snail-consuming individuals.'

In [61]:
result["source_documents"]

[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 4, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keyword

In [62]:
query2 ="Write a one sentence summary of the purpose of the paper"

result2 = qa_refine({"query": query2})

In [63]:
result2["result"]

'The purpose of the paper is to analyze the differences in propal width: carapace width ratio between sexes and species of crabs, determining the statistical significance of these factors and the interaction between them using a two-way ANOVA.'

In [64]:
result2["source_documents"]

[Document(page_content='charifson 13 table 1 : sma regressions of carapace width and propus measures. a ) the relationship between carapace width and propal height. x is carapace width and y is propal height. b ) the relationship between carapace width and propal height. x is carapace width and y is propal width. sma regression a n carapace width vs propal height r2 female h. nudus 13 y = 0. 273 * x - 0. 678 0. 976 male h. nudus 13 y = 0. 311 * x - 1. 385 0. 868 female h. oregonensis 9 y = 0. 351 * x - 0. 833 0. 894 male h. oregonensis 14 y = 0. 39 * x - 1. 149 0. 693 sma regression b n carapace width vs propal width r2 female h. nudus 13 y = 0. 157 * x - 0. 386 0. 927 male h. nudus 13 y = 0. 209 * x - 1. 288 0. 859 female h. oregonensis 9 y = 0. 175 * x + 0. 037 0. 724 male h. oregonensis 14 y = 0. 244 * x - 0. 688 0. 534', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 12, 'total_pages': 19, 'format': 'PDF 1.5'

In [65]:
query4 = "Summarize the paper concisely with reference to materials and methods."

result4 = qa_refine({"query": query4})

In [66]:
result4["result"]

"Based on the additional context provided, the paper utilized SMA regressions to examine the correlation between carapace width and propal height in different groups (female and male) of two species (H. nudus and H. oregonensis). The relationship between carapace width and propal height is illustrated in Figure 1 of Charifson's study, which includes a line of best fit derived from the SMA regression. Descriptive statistics for each group can be found in Table 1a. However, no specific information about the materials and methods used in the study is provided in the given context."

In [67]:
result4["source_documents"]

[Document(page_content='and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 4, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 13 table 1 : sma regressions of carapace width and propus measures. a ) the relationship between carapace width and propal height. x is carapace width and y is propal height. b ) the relationship between carapace width and propal height. x is carapace width and y is propal width. sma regression a n carapace width vs propal height r2 female h. nudus 13 y = 0. 273 * x - 0. 678 0. 976 male h. nudus 13 y = 0. 311 * x - 1. 385 0. 868 female

In [68]:
query5="Terms that may be used to identify an observation include “in the field”, “this study”, “observed”, “taken”, “collected”, “sampled”, “collection”, “seen”, “harvested”, “found”, etc. Does the paper include one or more observations?"
result5 = qa_refine({"query": query5})

In [69]:
result5["result"]

'The additional context provided does not directly relate to the presence of observations in the paper. Therefore, the original answer remains applicable. The paper includes observations related to the consumption rates of H. nudus crabs.'

In [70]:
result5["source_documents"]

[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='vincta in the field. it should be noted that h. nudus and la. vincta usually occupy different portions of the intertidal and may have little contact with each other, unlike the relationship between h. nudus and li. scutulata. there is some potential for overlap in the winter when la. vincta migrates up shore. a

In [71]:
query6 = "Does this paper contain observational or experimental research conducted in the natural environment or with organisms collected in nature?"

In [72]:
result6 = qa_refine({"query": query6})

In [73]:
result6["result"]

'Based on the additional context provided, it is clear that the paper contains experimental research conducted in the natural environment or with organisms collected in nature. The author mentions logistical issues with the experimental design, the need for more replication, and the inclusion of more crabs. The acknowledgments also mention the permission to collect organisms, indicating that the study involved collecting organisms from the natural environment.'

In [74]:
result6["source_documents"]

[Document(page_content='charifson 19 figure 5 : consumption rates by individual h. nudus. mean consumption rates ( n = 8 trials ) of 3 female ( fe1 to fe3 ) and 3 male ( ma1 to ma3 ) h. nudus. crabs fe1, fe3, and ma3 did not consume snails. the individuals that eat snails did not differ in their consumption rates ( f2, 21 = 2. 52, p = 0. 104 ). error bars represent standard error of the mean. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1 fe1 fe2 fe3 ma1 ma2 ma3 mean consumption rate ( snails consumed / hour ) individual h. nudus', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 18, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='##f, f. j. 2011. biometry

In [75]:
query7= "What are the scientific names of the species mentioned in this paper?"

In [76]:
result7 = qa_refine({"query": query7})

In [77]:
result7["result"]

'Based on the additional context provided, the paper mentions two species:\n\n1. H. nudus (scientific name: Hemigrapsus nudus)\n   - Female H. nudus\n   - Male H. nudus\n\n2. H. oregonensis (scientific name: Hemigrapsus oregonensis)\n   - Female H. oregonensis\n   - Male H. oregonensis'

In [78]:
result7["source_documents"]

[Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 15 figure 1 : relationship of carapace width and propal height in hemigrapsus. line of best fit from sma regression. see table 1a for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hm

In [79]:
# qa_refine.run(query7)

In [80]:
query8 = "Does the paper mention where the species were observed or collected, and if so, what locations are given?"

In [81]:
result8 = qa_refine({"query": query8})

In [82]:
result8["result"]

'The new context provided does not mention the locations where the species were observed or collected. Therefore, the original answer remains the same. The paper does not mention the specific locations where the species were observed or collected.'

In [83]:
result8["source_documents"]

[Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pd

In [84]:
query8b = "Can you give a more specific location of where this study took place?"

In [85]:
result8b = qa_refine({"query": query8b})

In [86]:
result8b["result"]

'Thank you for providing the additional context. However, the given information still does not specify the specific location where the study on regression analyses of carapace width and propus measures for different species of crabs (H. nudus and H. oregonensis) took place.'

In [87]:
result8b["source_documents"]

[Document(page_content='and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 4, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 19 figure 5 : consumption rates by individual h. nudus. mean consumption rates ( n = 8 trials ) of 3 female ( fe1 to fe3 ) and 3 male ( ma1 to ma3 ) h. nudus. crabs fe1, fe3, and ma3 did not consume snails. the individuals that eat snails did not differ in their consumption rates ( f2, 21 = 2. 52, p = 0. 104 ). error bars represent standard error of the mean. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1 fe1 fe2 fe3 ma1 ma2 ma3 mean

In [88]:
# qa_refine.run(query8)

In [89]:
query9 = "Are any coordinate locations given in latitude / longitude, and if so, what are they?"

In [90]:
result9 = qa_refine({"query": query9})

In [91]:
result9["result"]

'Apologies for the confusion, but even with the additional context, there are no specific coordinate locations mentioned in the provided text. The text seems to revolve around analyzing differences in cw:ph and cw:pw using a two-way ANOVA. Therefore, there are still no coordinate locations given in latitude/longitude.'

In [92]:
result9["source_documents"]

[Document(page_content='to right sides of the propus. all claw measurements were made on the left cheliped.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 3, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 15 figure 1 : relationship of carapace width and propal height in hemigrapsus. line of best fit from sma regression. see table 1a for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 14, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David

In [93]:
# qa_refine.run(query9)

In [94]:
query10 = "In what habitat were the species found?"

In [95]:
result10 = qa_refine({"query": query10})

In [96]:
result10["result"]

'The new context provided does not mention any specific habitat information for the species. Therefore, the original answer still stands as the most accurate response. Based on the information provided in the original question and the absence of additional details in the new context, it is not possible to determine the specific habitat where the mentioned species are found.'

In [97]:
result10["source_documents"]

[Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pd

In [98]:
# qa_refine.run(query10)

In [99]:
query11 = "Does the paper mention a year, date and/or time that species were collected or observed, and if so, what was mentioned?"

In [100]:
result11 = qa_refine({"query": query11})

In [101]:
result11["result"]

'Based on the new context provided, the paper does mention the propal width: carapace width ratio between sexes and species. However, there is still no mention of a specific year, date, or time when the species were collected or observed.'

In [102]:
result11["source_documents"]

[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).p

In [103]:
# qa_refine.run(query11)

In [104]:
query12 = "Are there any maps, figures, tables or diagrams in the paper?"

In [105]:
result12 = qa_refine({"query": query12})

In [106]:
result12["result"]

'Based on the new context provided, there is no mention of any additional maps, tables, or diagrams in the paper. Therefore, the original answer remains unchanged.'

In [107]:
result12["source_documents"]

[Document(page_content='to right sides of the propus. all claw measurements were made on the left cheliped.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 3, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 16 figure 2 : relationship of carapace width and propal width in hemigrapsus. line of best fit from sma regression. see table 1b for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 15, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David'

In [108]:
# qa_refine.run(query12)

# Map Reduce Document Chain

The map reduce documents chain first applies an LLM chain to each document individually (the Map step), treating the chain output as a new document. It then passes all the new documents to a separate combine documents chain to get a single output (the Reduce step). It can optionally first compress, or collapse, the mapped documents to make sure that they fit in the combine documents chain (which will often pass them to an LLM). This compression step is performed recursively if necessary.

In [109]:
qa_reduce = RetrievalQA.from_chain_type(llm =llm, chain_type = "map_reduce", retriever = vs2.as_retriever())

In [110]:
#What is this paper about?
qa_reduce.run(query)

'Based on the provided text, it is not possible to determine what the paper is about. The text only presents regression equations for the relationship between carapace width and propal height/width for different species and genders of organisms. More context or additional information is needed to determine the overall topic or purpose of the paper.'

In [111]:
# Write a one sentence summary of the purpose of the paper.

qa_reduce.run(query2)

'The purpose of the paper is to analyze the relationship between carapace width and propal height/width in different species of crabs, investigating the differences in ratios between sexes and species.'

In [112]:
# Summarize the paper concisely with reference to materials and methods.
qa_reduce.run(query4)

'The paper examines the relationship between carapace width and propal height and width in two species of crabs, H. nudus and H. oregonensis. The authors collected data on female and male crabs and used simple moving average (SMA) regression analysis to analyze the data. They found that carapace width is positively correlated with propal height and width in both species, with varying strength of correlation. Descriptive statistics can be found in Table 1a for carapace width and propal height, and in Table 1b for carapace width and propal width.'

In [113]:
#Terms that may be used to identify an observation include “in the field”, “this study”, “observed”, “taken”, “collected”, “sampled”, “collection”, “seen”, “harvested”, “found”, etc. Does the paper include one or more observations? 

qa_reduce.run(query5)

'Based on the provided portion of the document, it does not appear to include any specific observations.'

In [114]:
#Does this paper contain observational or experimental research conducted in the natural environment or with organisms collected in nature?

qa_reduce.run(query6)

'There is no information provided in the given portion of the document to determine whether the research conducted was observational or experimental, or if it was conducted in the natural environment or with organisms collected in nature.'

In [115]:
#What are the scientific names of the species mentioned in this paper?
qa_reduce.run(query7)

'The scientific names of the species mentioned in this paper are Hemigrapsus nudus and Hemigrapsus oregonensis.'

In [116]:
# Does the paper mention where the species were observed or collected, and if so, what locations are given?
qa_reduce.run(query8)

'The provided portion of the document does not mention the locations where the species were observed or collected.'

In [117]:
#Are any coordinate locations given in latitude / longitude, and if so, what are they?

qa_reduce.run(query9)

'No, there are no coordinate locations given in latitude/longitude in the provided text.'

In [118]:
#In what habitat were the species found?
qa_reduce.run(query10)

'The given portion of the document does not provide any information about the habitat in which the species were found. Therefore, it is unknown in what habitat the species were found.'

In [119]:
#Does the paper mention a year, date and/or time that species were collected or observed, and if so, what was mentioned?

qa_reduce.run(query11)

'The provided portion of the document does not mention anything about the year, date, or time of species collection or observation.'

In [120]:
#Are there any maps, figures, tables or diagrams in the paper?

qa_reduce.run(query12)

'There is no information provided in the given portion of the document about the presence of maps, figures, tables, or diagrams.'

# Map Re-rank Document Chain

The map re-rank documents chain runs an initial prompt on each document, that not only tries to complete a task but also gives a score for how certain it is in its answer. The highest scoring response is returned.

In [121]:
qa_rank = RetrievalQA.from_chain_type(llm =llm, chain_type = "map_rerank", retriever = vs2.as_retriever())

In [122]:
#What is this paper about?

qa_rank.run(query)



'This document is about biometry and ecological character displacement.'

In [123]:
#Write a one sentence summary of the purpose of the paper
qa_rank.run(query2)

'This document does not provide enough information to determine the purpose of the paper.'

In [124]:
#Summarize the paper concisely with reference to materials and methods.
qa_rank.run(query4)

'This document provides a summary of the relationship between carapace width and propal width in different species of Hemigrapsus crabs. It includes data for female and male H. nudus and H. oregonensis, along with a line of best fit obtained from a simple linear regression analysis. Table 1b contains descriptive statistics for the data. '

In [125]:
#Terms that may be used to identify an observation include “in the field”, “this study”, “observed”, “taken”, “collected”, “sampled”, “collection”, “seen”, “harvested”, “found”, etc. Does the paper include one or more observations? 
qa_rank.run(query5)

'This document does not answer the question'

In [126]:
#Does this paper contain observational or experimental research conducted in the natural environment or with organisms collected in nature?

qa_rank.run(query6)

'Yes, this paper contains experimental research conducted in the natural environment with organisms collected in nature.'

In [127]:
#What are the scientific names of the species mentioned in this paper?
qa_rank.run(query7)

'h. nudus'

In [128]:
#Does the paper mention where the species were observed or collected, and if so, what locations are given?

qa_rank.run(query8)

'This document does not mention where the species were observed or collected.'

In [129]:
#Are any coordinate locations given in latitude / longitude, and if so, what are they?

qa_rank.run(query9)

'No, there are no coordinate locations given in latitude/longitude.'

In [130]:
# In what habitat were the species found?
qa_rank.run(query10)

'The species were found in the intertidal habitat.'

In [131]:
#Does the paper mention a year, date and/or time that species were collected or observed, and if so, what was mentioned?

qa_rank.run(query11)

'This document does not mention a year, date, or time that species were collected or observed.'

In [132]:
#Are there any maps, figures, tables or diagrams in the paper?

qa_rank.run(query12)

'Yes, there are figures in the paper. Figure 1 shows the relationship of carapace width and propal height in hemigrapsus. It includes four subfigures: a) female H. nudus, b) male H. nudus, c) female H. oregonensis, and d) male H. oregonensis. Table 1a also provides descriptive statistics. '

# Stuff Document Chain w/ ConversationBufferMemory

The stuff documents chain ("stuff" as in "to stuff" or "to fill") is the most straightforward of the document chains. It takes a list of documents, inserts them all into a prompt and passes that prompt to an LLM.

This chain is well-suited for applications where documents are small and only a few are passed in for most calls.

In [133]:
template = """
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
{context}
</ctx>
------
<hs>
{history}
</hs>
------
{question}
Answer:
"""
prompt = PromptTemplate(
    input_variables=["history", "context", "question"],
    template=template,
)

In [134]:
qa_stuff = RetrievalQA.from_chain_type(llm =llm, chain_type = "stuff", 
                                        retriever = vs2.as_retriever(),
                                        verbose = True,
                                        chain_type_kwargs ={
                                            "verbose": True,
                                            "prompt" : prompt,
                                            "memory" : ConversationBufferMemory(
                                            memory_key = "history",
                                            input_key = "question")
                                        },
                                        return_source_documents = True
                                       )

In [135]:
# qa_stuff = RetrievalQA.from_chain_type(llm =llm, chain_type = "stuff", retriever = vs2.as_retriever())

In [136]:
#What is this paper about?
stuff_result1 = qa_stuff({"query": query})



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.

and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova

charifson 13 table 1 : sma regressions of carapace width and propus measures. a ) the relationship between carapace width and propal height. x is carapace width and y is propal height. b ) the relationship between carapace width and propal height. x is carapace width and y is propal width. sma regression a n carapace width vs propal height r2 female h. nudus 1

In [137]:
stuff_result1["result"]

'The paper is about the analysis of carapace width and propus measures in different species of crabs, specifically H. nudus and H. oregonensis. It also discusses the consumption rates of individual H. nudus crabs.'

In [138]:
stuff_result1["source_documents"]

[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 4, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keyword

In [139]:
#Write a one sentence summary of the purpose of the paper

stuff_result2 = qa_stuff({"query": query2})



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
charifson 13 table 1 : sma regressions of carapace width and propus measures. a ) the relationship between carapace width and propal height. x is carapace width and y is propal height. b ) the relationship between carapace width and propal height. x is carapace width and y is propal width. sma regression a n carapace width vs propal height r2 female h. nudus 13 y = 0. 273 * x - 0. 678 0. 976 male h. nudus 13 y = 0. 311 * x - 1. 385 0. 868 female h. oregonensis 9 y = 0. 351 * x - 0. 833 0. 894 male h. oregonensis 14 y = 0. 39 * x - 1. 149 0. 693 sma regression b n carapace width vs propal width r2 female h. nudus 13 y = 0. 157 * x - 0. 386 0. 927 male h. nudus 13 y = 0. 209 

In [140]:
stuff_result2["result"]

'The purpose of the paper is to analyze the relationship between carapace width and propus measures in different species of crabs and investigate the differences in propal height and propal width between sexes and species.'

In [141]:
query3 = "And who wrote the paper?"

In [142]:
stuff_result3 = qa_stuff({"query": query3})



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova

charifson 19 figure 5 : consumption rates by individual h. nudus. mean consumption rates ( n = 8 trials ) of 3 female ( fe1 to fe3 ) and 3 male ( ma1 to ma3 ) h. nudus. crabs fe1, fe3, and ma3 did not consume snails. the individuals that eat snails did not differ in their consumption rates ( f2, 21 = 2. 52, p = 0. 104 ). error bars represent standard error of the mean. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1 fe1 fe2 fe3 ma1 ma2 ma3 mean consumption rate ( snails consumed / hour ) individual h. nudus

##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. 

In [143]:
stuff_result3["result"]

'The authors of the paper are Charifson and Rohlf (2011).'

In [144]:
stuff_result4 = qa_stuff({"query": query4})



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova

charifson 13 table 1 : sma regressions of carapace width and propus measures. a ) the relationship between carapace width and propal height. x is carapace width and y is propal height. b ) the relationship between carapace width and propal height. x is carapace width and y is propal width. sma regression a n carapace width vs propal height r2 female h. nudus 13 y = 0. 273 * x - 0. 678 0. 976 male h. nudus 13 y = 0. 311 * x - 1. 385 0. 868 female h. oregonensis 9 y = 0. 351 * x - 0. 833 0. 894 male h. oregonensis 14 y = 0. 39 * x - 1. 149 0. 693 sma regression b n carapace width vs pr

In [145]:
stuff_result4["result"]

'The paper analyzed the relationship between carapace width and propus measures in H. nudus and H. oregonensis crabs using a two-way ANOVA. The authors used SMA regression to determine the relationship between carapace width and propal height/propal width in different sexes and species.'

In [146]:
stuff_result4["source_documents"]

[Document(page_content='and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 4, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 13 table 1 : sma regressions of carapace width and propus measures. a ) the relationship between carapace width and propal height. x is carapace width and y is propal height. b ) the relationship between carapace width and propal height. x is carapace width and y is propal width. sma regression a n carapace width vs propal height r2 female h. nudus 13 y = 0. 273 * x - 0. 678 0. 976 male h. nudus 13 y = 0. 311 * x - 1. 385 0. 868 female

In [147]:
#Summarize the paper concisely with reference to materials and methods.

# qa_stuff.run(query4)

In [148]:
stuff_result5 = qa_stuff({"query": query5})



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.

vincta in the field. it should be noted that h. nudus and la. vincta usually occupy different portions of the intertidal and may have little contact with each other, unlike the relationship between h. nudus and li. scutulata. there is some potential for overlap in the winter when la. vincta migrates up shore. although no differences in consumption rates between male and female h. nudus were found, this might be

charifson 19 figure 5 : consumption ra

In [149]:
stuff_result5["result"]

'Based on the given context, it is not clear whether the paper includes one or more observations. The context does not provide specific information or details about any observations made in the study.'

In [150]:
stuff_result5["source_documents"]

[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='vincta in the field. it should be noted that h. nudus and la. vincta usually occupy different portions of the intertidal and may have little contact with each other, unlike the relationship between h. nudus and li. scutulata. there is some potential for overlap in the winter when la. vincta migrates up shore. a

In [151]:
#Terms that may be used to identify an observation include “in the field”, “this study”, “observed”, “taken”, “collected”, “sampled”, “collection”, “seen”, “harvested”, “found”, etc. Does the paper include one or more observations? 
# qa_stuff.run(query5)

In [152]:
stuff_result6 = qa_stuff({"query": query6})



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
charifson 19 figure 5 : consumption rates by individual h. nudus. mean consumption rates ( n = 8 trials ) of 3 female ( fe1 to fe3 ) and 3 male ( ma1 to ma3 ) h. nudus. crabs fe1, fe3, and ma3 did not consume snails. the individuals that eat snails did not differ in their consumption rates ( f2, 21 = 2. 52, p = 0. 104 ). error bars represent standard error of the mean. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1 fe1 fe2 fe3 ma1 ma2 ma3 mean consumption rate ( snails consumed / hour ) individual h. nudus

##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or 

In [153]:
stuff_result6["result"]

'Based on the given context, it is not clear whether the paper contains observational or experimental research conducted in the natural environment or with organisms collected in nature. The context does not provide specific information or details about the research methods used in the study.'

In [154]:
stuff_result6["source_documents"]

[Document(page_content='charifson 19 figure 5 : consumption rates by individual h. nudus. mean consumption rates ( n = 8 trials ) of 3 female ( fe1 to fe3 ) and 3 male ( ma1 to ma3 ) h. nudus. crabs fe1, fe3, and ma3 did not consume snails. the individuals that eat snails did not differ in their consumption rates ( f2, 21 = 2. 52, p = 0. 104 ). error bars represent standard error of the mean. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1 fe1 fe2 fe3 ma1 ma2 ma3 mean consumption rate ( snails consumed / hour ) individual h. nudus', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 18, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='##f, f. j. 2011. biometry

In [155]:
#Does this paper contain observational or experimental research conducted in the natural environment or with organisms collected in nature?
# qa_stuff.run(query6)

In [156]:
stuff_result7 = qa_stuff({"query": query7})



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have

charifson 15 figure 1 : relationship of carapace width and propal height in hemigrapsus. line of best fit from sma regression. see table 1a for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.

##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 40

In [157]:
stuff_result7["result"]

'The scientific names of the species mentioned in this paper are H. nudus and H. oregonensis.'

In [158]:
stuff_result7["source_documents"]

[Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 15 figure 1 : relationship of carapace width and propal height in hemigrapsus. line of best fit from sma regression. see table 1a for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hm

In [159]:
#What are the scientific names of the species mentioned in this paper?
# qa_stuff.run(query7)

In [160]:
stuff_result8 = qa_stuff({"query": query8})



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have

##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.

vincta in the field. it should be noted that h. nudus and la. vincta usually occupy different portions of the intertidal and may have little contact with each other, unlike the relationship between h. nudus and li. scutulata. there is some potential f

In [161]:
stuff_result8["result"]

'Based on the given context, it is not mentioned where the species were observed or collected. No specific locations are given in the context.'

In [162]:
stuff_result8["source_documents"]

[Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pd

In [163]:
#Does the paper mention where the species were observed or collected, and if so, what locations are given?
# qa_stuff.run(query8)

In [164]:
stuff_result9 = qa_stuff({"query": query9})



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
to right sides of the propus. all claw measurements were made on the left cheliped.

charifson 15 figure 1 : relationship of carapace width and propal height in hemigrapsus. line of best fit from sma regression. see table 1a for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.

charifson 16 figure 2 : relationship of carapace width and propal width in hemigrapsus. line of best fit from sma regression. see table 1b for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.

and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a t

In [165]:
stuff_result9["result"]

'No, there are no coordinate locations given in latitude/longitude in the provided context.'

In [166]:
stuff_result9["source_documents"]

[Document(page_content='to right sides of the propus. all claw measurements were made on the left cheliped.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 3, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 15 figure 1 : relationship of carapace width and propal height in hemigrapsus. line of best fit from sma regression. see table 1a for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 14, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David

In [167]:
#Are any coordinate locations given in latitude / longitude, and if so, what are they?
# qa_stuff.run(query9)

In [168]:
stuff_result10 = qa_stuff({"query": query10})



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have

##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.

vincta in the field. it should be noted that h. nudus and la. vincta usually occupy different portions of the intertidal and may have little contact with each other, unlike the relationship between h. nudus and li. scutulata. there is some potential f

In [169]:
stuff_result10["result"]

'Based on the given context, it is mentioned that H. nudus and la. vincta usually occupy different portions of the intertidal habitat.'

In [170]:
stuff_result10["source_documents"]

[Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pd

In [171]:
#In what habitat were the species found?
# qa_stuff.run(query10)

In [172]:
stuff_result11 = qa_stuff({"query": query11})



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.

waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have

charifson 17 figure 3 : differences in propal height : carapace width ratio between sex and species. the sex factor was statistically significant ( f = 125. 6. p < 0. 001 ), while the species factor was insignificant ( f > 0. 01, p = 0. 983 ). there w

In [173]:
stuff_result11["result"]

'No, the paper does not mention a year, date, or time that the species were collected or observed.'

In [174]:
stuff_result11["source_documents"]

[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).p

In [175]:
#Does the paper mention a year, date and/or time that species were collected or observed, and if so, what was mentioned?
# qa_stuff.run(query11)

In [176]:
stuff_result12 = qa_stuff({"query": query12})



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
to right sides of the propus. all claw measurements were made on the left cheliped.

charifson 16 figure 2 : relationship of carapace width and propal width in hemigrapsus. line of best fit from sma regression. see table 1b for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.

charifson 15 figure 1 : relationship of carapace width and propal height in hemigrapsus. line of best fit from sma regression. see table 1a for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.

and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a t

In [177]:
stuff_result12["result"]

'Based on the given context, it is mentioned that there are figures in the paper. Specifically, Figure 1 and Figure 2 are referenced, which show the relationship of carapace width and propal height/width in different species and sexes of crabs. The tables mentioned are Table 1a and Table 1b, which provide descriptive statistics related to the figures. It is not mentioned whether there are any maps, diagrams, or additional tables in the paper.'

In [178]:
stuff_result12["source_documents"]

[Document(page_content='to right sides of the propus. all claw measurements were made on the left cheliped.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 3, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='charifson 16 figure 2 : relationship of carapace width and propal width in hemigrapsus. line of best fit from sma regression. see table 1b for descriptive statistics. a ) female h. nudus. b ) male h. nudus. c ) female h. oregonensis. d ) male h. oregonensis.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 15, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David'

In [179]:
#Are there any maps, figures, tables or diagrams in the paper?
# qa_stuff.run(query12)

# Load_QA_Chain & Prompting

In [180]:
new_template = """
You are an expert on species occurrences. Your task is to generate information based on the 5 major sections: abstract, introduction, materials & methods, results, and discussion
Please do not pull information from the literature citations.

Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
{context}
</ctx>
------
<hs>
{history}
</hs>
------
{question}

Answer:
"""
new_prompt = PromptTemplate(
    input_variables=["history", "context", "question"],
    template=new_template,
)

In [181]:
from langchain.chains.question_answering import load_qa_chain


qa_chain = load_qa_chain(ChatOpenAI(temperature=0), chain_type = "stuff")

qa_stuff_combine = RetrievalQA(combine_documents_chain=qa_chain, 
                               retriever=vs2.as_retriever(),
                               return_source_documents = True
                                   )

In [182]:
combine_result = qa_stuff_combine({"query": query})

In [183]:
combine_result['result']

'Based on the provided context, the paper appears to be about various analyses and measurements related to carapace width and propus measures in different species of crabs (H. nudus and H. oregonensis). It includes information on the relationship between carapace width and propal height/width, as well as consumption rates of snails by individual H. nudus crabs.'

In [184]:
combine_result["source_documents"]

[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 4, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keyword

In [185]:
combine_result2 = qa_stuff_combine({"query": query2})

In [186]:
combine_result2["result"]

'The purpose of the paper is to analyze the relationship between carapace width and propus measures in two species of crabs, specifically focusing on propal height and propal width.'

In [187]:
combine_result3 = qa_stuff_combine({"query": query3})

In [188]:
combine_result3["result"]

'The author of the paper is not mentioned in the given context.'

In [189]:
combine_result4 = qa_stuff_combine({"query": query4})

In [190]:
combine_result4["result"]

'The paper analyzed the relationship between carapace width and propal height and width in different species of Hemigrapsus crabs. The authors used a two-way ANOVA to compare the differences in carapace width to propal height and width. They performed SMA regressions to determine the relationship between carapace width and propal height and width in female and male H. nudus and H. oregonensis crabs. The results were presented in tables and figures, showing the line of best fit for each species and gender.'

In [191]:
combine_result5 = qa_stuff_combine({"query": query5})

In [192]:
combine_result5["result"]

'Based on the provided context, it is not clear whether the paper includes one or more specific observations.'

In [193]:
combine_result6 = qa_stuff_combine({"query": query6})

In [194]:
combine_result6["result"]

'Based on the given context, it appears that the research described in the paper is experimental research conducted in the natural environment with organisms collected in nature. The acknowledgements section mentions the permission to collect organisms and the use of facilities at Friday Harbor Laboratories.'

In [195]:
combine_result7 = qa_stuff_combine({"query": query7})

In [196]:
combine_result7["result"]

'The scientific names of the species mentioned in this paper are Hemigrapsus nudus and Hemigrapsus oregonensis.'

In [197]:
combine_result8 = qa_stuff_combine({"query": query8})

In [198]:
combine_result8["result"]

'No, the paper does not mention specific locations where the species were observed or collected.'

In [199]:
combine_result9 = qa_stuff_combine({"query": query9})

In [200]:
combine_result9["result"]

'No, there are no coordinate locations given in latitude/longitude in the provided context.'

In [201]:
combine_result10 = qa_stuff_combine({"query": query10})

In [202]:
combine_result10["result"]

'The species were found in water and in finer sediment.'

In [203]:
combine_result11 = qa_stuff_combine({"query": query11})

In [204]:
combine_result11["result"]

'No, the paper does not mention a year, date, or time that species were collected or observed.'

In [205]:
combine_result12 = qa_stuff_combine({"query": query12})

In [206]:
combine_result12["result"]

'Yes, there are figures in the paper. Figure 1 shows the relationship between carapace width and propal height in different species and genders of Hemigrapsus crabs. Figure 2 shows the relationship between carapace width and propal width in the same species and genders. There are also tables mentioned in the text, such as Table 1a and Table 1b, which provide descriptive statistics for the figures.'

# ConversationalRetrievalQA 

The ConversationalRetrievalQA chain builds on RetrievalQAChain to provide a chat history component.

It first combines the chat history (either explicitly passed in or retrieved from the provided memory) and the question into a standalone question, then looks up relevant documents from the retriever, and finally passes those documents and the question to a question answering chain to return a response.

In [207]:
# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. 
Avoid pulling context from the literature cited section starting on page 10
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


In [208]:
memory = ConversationBufferMemory(memory_key="chat_history", input_key = "question", output_key = "answer", return_messages=True)

In [209]:
qa_conversational = ConversationalRetrievalChain.from_llm(ChatOpenAI(temperature=0),
                                                          vs2.as_retriever(),
                                                          verbose = True,
                                                          memory=memory,
                                                          combine_docs_chain_kwargs={"prompt": QA_CHAIN_PROMPT})


In [210]:
conversational_result = qa_conversational({"question": query})



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. 
Avoid pulling context from the literature cited section starting on page 10
##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.

and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova

charifson 13 table 1 : sma regressions of carapace width and propus measures. a ) the relationship between carapace width and propal height. x is carapace width and y is propal height. b ) the relationship between carapace width and pro

In [211]:
conversational_result["answer"]

'The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.'

In [212]:
conversational_result["chat_history"]

[HumanMessage(content='what is this paper about?', additional_kwargs={}, example=False),
 AIMessage(content='The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.', additional_kwargs={}, example=False)]

In [213]:
query1a = "Did the paper mention which species of crab had the highest carapace width and propus measures?"

In [214]:
conversational_result1a = qa_conversational({"question": query1a})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what is this paper about?
Assistant: The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.
Follow Up Input: Did the paper mention which species of crab had the highest carapace width and propus measures?
Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. 
Avoid pulling con

In [215]:
conversational_result1a["answer"]

'The paper does not provide information on which species of crab had the highest carapace width and propus measures.'

In [216]:
conversational_result2 = qa_conversational({"question": query2})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what is this paper about?
Assistant: The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.
Human: Did the paper mention which species of crab had the highest carapace width and propus measures?
Assistant: The paper does not provide information on which species of crab had the highest carapace width and propus measures.
Follow Up Input: Write a one sentence summary of the purpose of the paper
Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question 

In [217]:
conversational_result2["answer"]

'The purpose of the paper is to investigate the differences in propus size between males and females of the two crab species, H. nudus and H. oregonensis.'

In [218]:
conversational_result3 = qa_conversational({"question": query3})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what is this paper about?
Assistant: The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.
Human: Did the paper mention which species of crab had the highest carapace width and propus measures?
Assistant: The paper does not provide information on which species of crab had the highest carapace width and propus measures.
Human: Write a one sentence summary of the purpose of the paper
Assistant: The purpose of the paper is to investigate the differences in propus size between males and females of the two crab species, H. nudus and H. oregonensis.
Follow Up Input: And who wrote the paper?
Standalone question:[0m

[1m> Finished chain.[

In [219]:
conversational_result3["answer"]

'The author of the paper is F. J. Stuart.'

In [222]:
conversational_result4 = qa_conversational({"question": query4})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what is this paper about?
Assistant: The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.
Human: Did the paper mention which species of crab had the highest carapace width and propus measures?
Assistant: The paper does not provide information on which species of crab had the highest carapace width and propus measures.
Human: Write a one sentence summary of the purpose of the paper
Assistant: The purpose of the paper is to investigate the differences in propus size between males and females of the two crab species, H. nudus and H. oregonensis.
Human: And who wrote the paper?
Assistant: The author of the paper is F. J. Stuart.
Follow 

In [223]:
conversational_result4["answer"]

'The paper analyzed the differences in carapace width and propus measures using a two-way ANOVA. The relationships between carapace width and propal height/width were examined using SMA regression. The paper also included descriptive statistics and a line of best fit for each species.'

In [224]:
conversational_result5 = qa_conversational({"question": query5})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what is this paper about?
Assistant: The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.
Human: Did the paper mention which species of crab had the highest carapace width and propus measures?
Assistant: The paper does not provide information on which species of crab had the highest carapace width and propus measures.
Human: Write a one sentence summary of the purpose of the paper
Assistant: The purpose of the paper is to investigate the differences in propus size between males and females of the two crab species, H. nudus and H. oregonensis.
Human: And who wrote the paper?
Assistant: The author of the paper is F. J. Stuart.
Human: 

In [225]:
conversational_result5["answer"]

'No, the paper does not include any observations.'

In [226]:
conversational_result6 = qa_conversational({"question": query6})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what is this paper about?
Assistant: The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.
Human: Did the paper mention which species of crab had the highest carapace width and propus measures?
Assistant: The paper does not provide information on which species of crab had the highest carapace width and propus measures.
Human: Write a one sentence summary of the purpose of the paper
Assistant: The purpose of the paper is to investigate the differences in propus size between males and females of the two crab species, H. nudus and H. oregonensis.
Human: And who wrote the paper?
Assistant: The author of the paper is F. J. Stuart.
Human: 

In [227]:
conversational_result6["answer"]

'This paper contains experimental research conducted in the natural environment with organisms collected in nature.'

In [228]:
conversational_result7 = qa_conversational({"question": query7})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what is this paper about?
Assistant: The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.
Human: Did the paper mention which species of crab had the highest carapace width and propus measures?
Assistant: The paper does not provide information on which species of crab had the highest carapace width and propus measures.
Human: Write a one sentence summary of the purpose of the paper
Assistant: The purpose of the paper is to investigate the differences in propus size between males and females of the two crab species, H. nudus and H. oregonensis.
Human: And who wrote the paper?
Assistant: The author of the paper is F. J. Stuart.
Human: 

In [229]:
conversational_result7["answer"]

'The scientific names of the species mentioned in this paper are Hemigrapsus nudus and Hemigrapsus oregonensis.'

In [230]:
conversational_result8 = qa_conversational({"question": query8})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what is this paper about?
Assistant: The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.
Human: Did the paper mention which species of crab had the highest carapace width and propus measures?
Assistant: The paper does not provide information on which species of crab had the highest carapace width and propus measures.
Human: Write a one sentence summary of the purpose of the paper
Assistant: The purpose of the paper is to investigate the differences in propus size between males and females of the two crab species, H. nudus and H. oregonensis.
Human: And who wrote the paper?
Assistant: The author of the paper is F. J. Stuart.
Human: 

In [231]:
conversational_result8["answer"]

'The paper does not mention where the species were observed or collected.'

In [232]:
conversational_result9 = qa_conversational({"question": query9})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what is this paper about?
Assistant: The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.
Human: Did the paper mention which species of crab had the highest carapace width and propus measures?
Assistant: The paper does not provide information on which species of crab had the highest carapace width and propus measures.
Human: Write a one sentence summary of the purpose of the paper
Assistant: The purpose of the paper is to investigate the differences in propus size between males and females of the two crab species, H. nudus and H. oregonensis.
Human: And who wrote the paper?
Assistant: The author of the paper is F. J. Stuart.
Human: 

In [233]:
conversational_result9["answer"]

'The paper does not provide any coordinate locations in latitude/longitude.'

In [234]:
conversational_result10 = qa_conversational({"question": query10})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what is this paper about?
Assistant: The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.
Human: Did the paper mention which species of crab had the highest carapace width and propus measures?
Assistant: The paper does not provide information on which species of crab had the highest carapace width and propus measures.
Human: Write a one sentence summary of the purpose of the paper
Assistant: The purpose of the paper is to investigate the differences in propus size between males and females of the two crab species, H. nudus and H. oregonensis.
Human: And who wrote the paper?
Assistant: The author of the paper is F. J. Stuart.
Human: 

In [235]:
conversational_result10["answer"]

'The species mentioned in the paper were found in the intertidal habitat.'

In [236]:
conversational_result11 = qa_conversational({"question": query11})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what is this paper about?
Assistant: The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.
Human: Did the paper mention which species of crab had the highest carapace width and propus measures?
Assistant: The paper does not provide information on which species of crab had the highest carapace width and propus measures.
Human: Write a one sentence summary of the purpose of the paper
Assistant: The purpose of the paper is to investigate the differences in propus size between males and females of the two crab species, H. nudus and H. oregonensis.
Human: And who wrote the paper?
Assistant: The author of the paper is F. J. Stuart.
Human: 

In [237]:
conversational_result11["answer"]

'No, the paper does not mention a year, date, or time that the species were collected or observed.'

In [238]:
conversational_result12 = qa_conversational({"question": query12})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what is this paper about?
Assistant: The paper is about the analysis of carapace width and propus measures in two species of crabs, H. nudus and H. oregonensis, and their consumption rates of snails.
Human: Did the paper mention which species of crab had the highest carapace width and propus measures?
Assistant: The paper does not provide information on which species of crab had the highest carapace width and propus measures.
Human: Write a one sentence summary of the purpose of the paper
Assistant: The purpose of the paper is to investigate the differences in propus size between males and females of the two crab species, H. nudus and H. oregonensis.
Human: And who wrote the paper?
Assistant: The author of the paper is F. J. Stuart.
Human: 

In [239]:
conversational_result12["answer"]

'Yes, the paper contains figures (Figure 2, Figure 1, Figure 3, Figure 4) and tables (Table 1b, Table 1a).'

## Pass in Chat History

In [261]:
qa_conversational2 = ConversationalRetrievalChain.from_llm(ChatOpenAI(temperature=0), vs2.as_retriever())


In [262]:
chat_history = []
result = qa_conversational2({"question": query, "chat_history": chat_history})

In [263]:
result["answer"]

'Based on the provided context, the paper appears to be about various analyses and measurements related to carapace width and propus measures in different species of crabs (H. nudus and H. oregonensis). It also includes information on consumption rates of snails by individual H. nudus crabs.'

In [264]:
result["chat_history"]

[]

In [265]:
chat_history = [(query, result["answer"])]

result_conversational2 = qa_conversational2({"question": query, "chat_history": chat_history})

In [266]:
result_conversational2["answer"]

'The main topic of this paper is the relationship between carapace width and propus measures in H. nudus and H. oregonensis crabs.'

In [267]:
result_conversational2["chat_history"]

[('what is this paper about?',
  'Based on the provided context, the paper appears to be about various analyses and measurements related to carapace width and propus measures in different species of crabs (H. nudus and H. oregonensis). It also includes information on consumption rates of snails by individual H. nudus crabs.')]

## Return Source Documents

In [273]:
qa_conversational3 = ConversationalRetrievalChain.from_llm(ChatOpenAI(temperature=0), vs2.as_retriever(), return_source_documents=True)




In [274]:
chat_history = []
result = qa_conversational3({"question": query, "chat_history": chat_history})

In [275]:
result["answer"]

'Based on the provided context, the paper appears to be about various analyses and measurements related to carapace width and propus measures in different species of crabs (H. nudus and H. oregonensis). It also includes information on consumption rates of snails by individual H. nudus crabs.'

In [276]:
result["chat_history"]

[]

In [277]:
result['source_documents'][0]

Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''})

In [278]:
result2 = qa_conversational3({"question": query2, "chat_history": chat_history})

In [279]:
result2["answer"]

'The purpose of the paper is to analyze the relationship between carapace width and propal height/width in different species and sexes of crabs.'

In [302]:
result2["chat_history"]

[]

In [303]:
result2["source_documents"][0]

Document(page_content='charifson 13 table 1 : sma regressions of carapace width and propus measures. a ) the relationship between carapace width and propal height. x is carapace width and y is propal height. b ) the relationship between carapace width and propal height. x is carapace width and y is propal width. sma regression a n carapace width vs propal height r2 female h. nudus 13 y = 0. 273 * x - 0. 678 0. 976 male h. nudus 13 y = 0. 311 * x - 1. 385 0. 868 female h. oregonensis 9 y = 0. 351 * x - 0. 833 0. 894 male h. oregonensis 14 y = 0. 39 * x - 1. 149 0. 693 sma regression b n carapace width vs propal width r2 female h. nudus 13 y = 0. 157 * x - 0. 386 0. 927 male h. nudus 13 y = 0. 209 * x - 1. 288 0. 859 female h. oregonensis 9 y = 0. 175 * x + 0. 037 0. 724 male h. oregonensis 14 y = 0. 244 * x - 0. 688 0. 534', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 12, 'total_pages': 19, 'format': 'PDF 1.5',

In [304]:
result3 = qa_conversational3({"question": query3, "chat_history": chat_history})

In [305]:
result3["answer"]

'The author of the paper is not mentioned in the given context.'

In [306]:
result4 = qa_conversational3({"question": query4, "chat_history": chat_history})

In [307]:
result4["answer"]

'The paper analyzed the relationship between carapace width and propal height and width in different species of Hemigrapsus crabs. The authors used a two-way ANOVA to compare the differences in carapace width to propal height and width. They performed SMA regressions to determine the relationship between carapace width and propal height and width in female and male H. nudus and H. oregonensis crabs. The results were presented in tables and figures, showing the line of best fit and descriptive statistics for each species and gender.'

In [308]:
result4["source_documents"][0]

Document(page_content='and rohlf 2011 ). differences in cw : ph and cw : pw were analyzed using a two - way anova', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 4, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''})

In [309]:
result5 = qa_conversational3({"question": query5, "chat_history": chat_history})

In [310]:
result5["answer"]

'Based on the provided context, it is not clear whether the paper includes one or more specific observations.'

In [288]:
result5["source_documents"]

[Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''}),
 Document(page_content='vincta in the field. it should be noted that h. nudus and la. vincta usually occupy different portions of the intertidal and may have little contact with each other, unlike the relationship between h. nudus and li. scutulata. there is some potential for overlap in the winter when la. vincta migrates up shore. a

In [289]:
result6 = qa_conversational3({"question": query6, "chat_history": chat_history})

In [290]:
result6["answer"]

'Based on the given context, it appears that the research described in the paper is experimental research conducted in the natural environment with organisms collected in nature. The acknowledgements section mentions the permission to collect organisms and the use of facilities at Friday Harbor Laboratories.'

In [291]:
result6["chat_history"]

[]

In [299]:
result6["source_documents"][0]

Document(page_content='charifson 19 figure 5 : consumption rates by individual h. nudus. mean consumption rates ( n = 8 trials ) of 3 female ( fe1 to fe3 ) and 3 male ( ma1 to ma3 ) h. nudus. crabs fe1, fe3, and ma3 did not consume snails. the individuals that eat snails did not differ in their consumption rates ( f2, 21 = 2. 52, p = 0. 104 ). error bars represent standard error of the mean. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1 fe1 fe2 fe3 ma1 ma2 ma3 mean consumption rate ( snails consumed / hour ) individual h. nudus', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 18, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''})

In [292]:
result7 = qa_conversational3({"question": query7, "chat_history": chat_history})

In [300]:
result7["answer"]

'The scientific names of the species mentioned in this paper are Hemigrapsus nudus and Hemigrapsus oregonensis.'

In [311]:
result7["source_documents"][0]

Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''})

In [293]:
result8 = qa_conversational3({"question": query8, "chat_history": chat_history})

In [312]:
result8["answer"]

'No, the paper does not mention specific locations where the species were observed or collected.'

In [313]:
result8["source_documents"][0]

Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''})

In [294]:
result9 = qa_conversational3({"question": query9, "chat_history": chat_history})

In [314]:
result9['answer']

'No, there are no coordinate locations given in latitude/longitude in the provided context.'

In [315]:
result9["source_documents"][0]

Document(page_content='to right sides of the propus. all claw measurements were made on the left cheliped.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 3, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''})

In [295]:
result10 = qa_conversational3({"question": query10, "chat_history": chat_history})

In [316]:
result10["answer"]

'The species were found in water and in finer sediment.'

In [317]:
result10["source_documents"][0]

Document(page_content='waterand in finer sediment than the more desiccation - tolerant h. nudus ( sliger 1982 ). there is still considerable habitat overlap between these two species ; the underside of a single rock may have', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 1, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''})

In [296]:
result11 = qa_conversational3({"question": query11, "chat_history": chat_history})

In [318]:
result11["answer"]

'No, the paper does not mention a year, date, or time that species were collected or observed.'

In [319]:
result11["source_documents"][0]

Document(page_content='##f, f. j. 2011. biometry. 4th ed. w. h. freeman, new york, new york, usa. stuart, y. e. and losos, j. b. 2013. ecological character displacement : glass half full or half empty? trends in ecology and evolution 28 : 402 - 408.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 10, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''})

In [297]:
result12 = qa_conversational3({"question": query12, "chat_history": chat_history})

In [320]:
result12["answer"]

'Yes, there are figures in the paper. Figure 1 shows the relationship between carapace width and propal height in different species and genders of Hemigrapsus crabs. Figure 2 shows the relationship between carapace width and propal width in the same species and genders. There are also tables mentioned in the text, such as Table 1a and Table 1b, which provide descriptive statistics for the figures.'

In [321]:
result12["source_documents"][0]

Document(page_content='to right sides of the propus. all claw measurements were made on the left cheliped.', metadata={'source': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'file_path': 'hms/fhl_2014_Charifson_34622 (1).pdf', 'page': 3, 'total_pages': 19, 'format': 'PDF 1.5', 'title': '', 'author': 'David', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Office Word 2007', 'producer': 'Microsoft® Office Word 2007', 'creationDate': "D:20140723120649-04'00'", 'modDate': "D:20140723120649-04'00'", 'trapped': ''})