Load document store

In [1]:
from qdrant_haystack import QdrantDocumentStore

document_store = QdrantDocumentStore(
    path="qdrant",
    index="Document",
    embedding_dim=768,
    recreate_index=False,
)

  from .autonotebook import tqdm as notebook_tqdm


Load initial places

In [2]:
import pickle
with open('initial_places.pkl', 'rb') as f:
    initial_places = pickle.load(f)

## Tools:
Load and initialise tools that are going to be used:

In [3]:
from src.search_scrape_PIPE import search_scrape_pipeline
search_scrape_pipe = search_scrape_pipeline(document_store)

Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [4]:
from duckduckgo_search import DDGS

def internet_search_tool(search:str,n_results=3)->bool:
    '''
    This function makes use of the chatgpt api and google search library to, 
    given some place in the form of a string and a country name determine if 
    that place is in the named country or even a place at all.
    For that we will first inject to gpt a summary of the top 15 results found.

    :param place: string of the place to search
    :param place: string of the country to search
    :param n_results: number of page summaries to show to gpt     
    '''

    #first of all we'll perform a search and add the results.
    result_links = search_scrape_pipe.run(search,topk=20)


    results = f"Search: {search}\nResults:\n"
    #now we'll look for our results in the doc_store using the returned links
    cnt = 0
    for doc in document_store.get_all_documents_generator():
        if doc.meta["url"] in result_links:
            results += f"({cnt+1})\n " + doc.meta["summary"] + "\n"
            cnt += 1

            #delete the link from result_links as one summary can appear in multiple documents
            #if a page has been chopped into chunks
            result_links.pop(result_links.index(doc.meta["url"]))
        
        if cnt == n_results:
            break
    
    return results

In [5]:
from haystack.nodes import EmbeddingRetriever
import torch

retriever = EmbeddingRetriever(
            document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1",use_gpu=True,devices=[torch.device("mps"),]
        )

  return self.fget.__get__(instance, owner)()
You seem to be using sentence-transformers/multi-qa-mpnet-base-dot-v1 model with the cosine function instead of the recommended dot_product. This can be set when initializing the DocumentStore


In [6]:
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI
from langchain.agents.tools import Tool
from langchain.tools import StructuredTool
from langchain import PromptTemplate, OpenAI, LLMChain

In [7]:
#llm initialisation
llm = OpenAI(temperature=0,openai_api_key="")

In [8]:
#wikipedia search tool
from langchain.tools import WikipediaQueryRun
from langchain.utilities import WikipediaAPIWrapper

wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

In [9]:
from src.coordinates_search import coordinates_search
cordi_searchy_tool = coordinates_search("NO.txt")

  self.df = pd.read_csv('NO.txt', sep="\t", header=None,names=features.keys())


In [10]:
#first the question answerer that uses a combination of qa pipeline 
# with entailment checker and t5 to generate possible answers to a question 
# and their entailment to the knowledge base.
from src.question_search_answer_entail_TOOL import question_search_answer_entail
QSAE_tool = question_search_answer_entail(document_store,retriever=retriever)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [11]:
from src.simple_search_TOOL import simple_search
SS = simple_search(document_store,retriever)

In [12]:
from haystack.document_stores import FAISSDocumentStore

places_doc_store = FAISSDocumentStore(faiss_index_factory_str="Flat")

In [13]:
from haystack.schema import Document

In [14]:
from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli")
candidate_labels = ["nature","culinary", "city", "cultural","other"]



In [15]:
print(SS.run("weather","Bergen"))

  incremental_indices = torch.cumsum(mask, dim=1).type_as(mask) * mask
Batches: 100%|██████████| 1/1 [00:00<00:00,  2.40it/s]

Query: weather
(1)
A lover of cold, wild, mountainous places, Lonely Planet writer Kerry Walker headed to Norway's southern fjords to find the true meaning of "friluftsliv" (outdoor living). Here she shares a snapshot of her adventures.

Ever since I first set foot in Norway a decade ago, I've dreamed of visiting the fjords proper. Summer's crowds and cruise boats would not be for me, so instead, I decided to piece together my own itinerary from Stavanger to Bergen in late spring. And I found a region of rare beauty, with thundering rivers, flint-blue fjords and glacier-encrusted peaks - all ripe for adventure.

Rock up in May or early June, and you'll have much of the region to yourself. The fjords are at their photogenic best at this time of year, with the landscape bursting into blossom, waterfalls raging away as the ice melts, snow still on the summits and light nights allowing you to eke out every precious moment.

Immerse yourself in the best experiences the world has to offer wi




# Pipeline run

In [16]:
tools_step2 = [
    Tool(
        name = "Broad knowledge search",
        func = internet_search_tool,
        description = "Returns three search results based on your input text, good for broad information."
    ),

    Tool(
        name =  "Question answerer",
        func = QSAE_tool.run,
        description= "Useful to find new information. Further evaluation of the answers and fact checking has to be performed using other tools. This tool should be the last resources"
    ),

    StructuredTool.from_function(SS.run,
    name="Semantic search",
    description="Useful to search for concrete information. The filter argument should always be the place being searched for (properly capitalize the name)."
    )
]

In [17]:
for place in initial_places[:2]:
    print(f"starting to analyze {place}")
    #step1 filter
    document_dict = {"Name":place}
    #step 1 filter out words 
    prompt_template = "Does the word {place} refer to a general concept? Answer only Yes or No"

    llm_chain = LLMChain(
        llm=llm,
        prompt=PromptTemplate.from_template(prompt_template)
    )
    input_list = [
        {"place": place},
    ]
    result = llm_chain.apply(input_list)[0]["text"]

    if "no" in result.lower():
        cords = cordi_searchy_tool.run(place,"NO")
        if cords is None:
            print("couldn't find coordinates...")
            continue
    
    document_dict["coordinates"] = cords

    #step2 create description
    agent = initialize_agent(tools_step2, llm, agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
    prompt_step2 = f"""
    Elaborate a condensed touristic guide about {place} clearly portraying in different sections: main attractions, touristic landmarks(if the place isn't one itself) and things to do as a tourist on vacation there.
    Do not talk about other places, search and add as many information as possible.
    """
    description = agent.run(prompt_step2)

    summary = description + "\n\nFrequently asked questions:\n"

    #step3 FAQ
    faqs = [
    f"What is the weather like in {place}?",
    f"What is the best time of the year to visit {place}?",
    f"How good is public transport in {place}?",
    f"What are some off-the-beaten-path attractions to explore in {place}?",
    f"Is it safe to travel to {place}? Any safety concerns/precautions?",
    ]

    for question in faqs:
        #having some error:
        #ValidationError: 2 validation errors for Semantic searchSchemaSchema
        #query
        #field required (type=value_error.missing)
        #filter
        #field required (type=value_error.missing)

        try:
            summary += question + " " + agent.run(f"Answer the following question: {question} The answer should be very consice.") + "\n\n"
        except:
            pass
    document_dict["content"] = summary

    #use zero shot classification to get tag
    output = classifier(summary, candidate_labels, multi_label=False)
    document_dict["tag"] = output["labels"][0]

    #save to doc_store
    places_doc_store.write_documents([Document.from_dict(document_dict)])

starting to analyze Øye


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction:
```
{
  "action": "Broad knowledge search",
  "action_input": "Øye tourist guide"
}
```
[0m

 20%|██        | 4/20 [00:03<00:15,  1.04it/s]

Unable to extract text from https://www.heartmybackpack.com/norway/hotel-union-oye-review/


 40%|████      | 8/20 [00:07<00:12,  1.08s/it]

Unable to extract text from https://thehiddennorth.com/recreating-the-past-at-oye-stave-church/


100%|██████████| 20/20 [00:24<00:00,  1.25s/it]
Preprocessing: 100%|██████████| 18/18 [00:00<00:00, 2531.69docs/s]
Extracting entities: 100%|██████████| 2/2 [00:03<00:00,  1.50s/it]
10000it [00:00, 169078.80it/s]        



Observation: [36;1m[1;3mSearch: Øye tourist guide
Results:
(1)
 Dette er en liste over øyer i Estland .
Det finnes 1521 øyer.
Dette er ikke en komplett liste.
Største øyer [ rediger | rediger kilde ]1) Kassari blir noen ganger regnet som en del av Hiiumaa.
Abruka – Adralaid – Aegna (Wulf) – Ahelaid – Aherahu (Atla) – Aherahu – Ahessäär – Ahtra – Aksi – Allirahu (Kõiguste) – Allirahu (Pihtla) (Väike-Prangli) – Allirahu – Allu – Alumine Vaika – Anekäbrud – Ankrurahu – Annilaid (Anõlaid) – Antsulaiud – Anulaid – AuklaiustEerikukivi – Eerikulaid – Elmrahu – EsirahuGretagrundHanemaa – Hanerahu – Hanikatsi laid – Hara – Harilaid, utenfor Vormsi – Harilaid, utenfor Saaremaa – Härjakare – Härjamaa – Heinlaid (Kõiguste laht) – Heinlaid (Väinameri) – Hellamaa rahu – Hiiumaa (Dagö) – Hobulaid – Hõralaid – Hülgelaid – HülgerahuImutilaid – InnarahuJuksirahuKadakalaid – Kaevatsi laid – Kahtla laid – Kajakarahu – Käkimaa – Käkirahu – Kakralaid – Kakrarahu – Karirahu – Kassari – Kasselaid – Keri (K

Batches: 100%|██████████| 1/1 [00:00<00:00,  6.25it/s]


Observation: [38;5;200m[1;3mQuery: weather
(1)
Now, and then the highlight of the day is the 3-course dinner, composed by our chef with the best local ingredients of the season. Served in the dining room and followed by excellent tales of days long gone told in the Sun Lounge while you savour your coffee. Enjoy an evening of refreshments as you gaze up at cascading waterfalls.

You will be pleased to know that daylight hours in the summer at Øye lasts until midnight, giving you extra time to explore the beautiful mountains nearby. Take a walk down to the waterside to admire the majestic mountains reflected in the fjords.

Hotel Union Øye is a 62°Nord hotel. 62°Nord offers tailor made activities in Fjord Norway.

unionoye.no

#hotelunionoye
[0m
Thought:




[32;1m[1;3m I have the answer
Action:
```
{
  "action": "Final Answer",
  "action_input": "The weather in Øye is usually mild with long daylight hours in the summer and cascading waterfalls."
}
```[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction:
```
{
  "action": "Semantic search",
  "action_input": {
    "query": "best time of the year to visit Øye",
    "filter": "Øye",
    "topk": 1
  }
}
```

[0m

Batches: 100%|██████████| 1/1 [00:00<00:00,  9.65it/s]


Observation: [38;5;200m[1;3mQuery: best time of the year to visit Øye
(1)
Now, and then the highlight of the day is the 3-course dinner, composed by our chef with the best local ingredients of the season. Served in the dining room and followed by excellent tales of days long gone told in the Sun Lounge while you savour your coffee. Enjoy an evening of refreshments as you gaze up at cascading waterfalls.

You will be pleased to know that daylight hours in the summer at Øye lasts until midnight, giving you extra time to explore the beautiful mountains nearby. Take a walk down to the waterside to admire the majestic mountains reflected in the fjords.

Hotel Union Øye is a 62°Nord hotel. 62°Nord offers tailor made activities in Fjord Norway.

unionoye.no

#hotelunionoye
[0m
Thought:




[32;1m[1;3m I know what to respond
Action:
```
{
  "action": "Final Answer",
  "action_input": "The best time of the year to visit Øye is during the summer months when daylight hours last until midnight, giving you extra time to explore the beautiful mountains nearby."
}
```[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction:
```
{
  "action": "Semantic search",
  "action_input": {
    "query": "public transport in Øye",
    "filter": "Øye",
    "topk": 5
  }
}
```

[0m

Batches: 100%|██████████| 1/1 [00:00<00:00,  9.05it/s]


Observation: [38;5;200m[1;3mQuery: public transport in Øye
(1)
Distance and total height gain: From Øye: 3 km. 4,921 ft. From Patchellhytta: 3 km. 2,460 ft. From Urke: 3,937 ft. 8 km.

Access Bus/car to Øye or Stranda/Engeset.

Worth noting You are strongly advised not to use the snow fields on the way down unless you have an ice axe and know how to brake with it. This practice leads to several serious accidents every year.
(2)
Now, and then the highlight of the day is the 3-course dinner, composed by our chef with the best local ingredients of the season. Served in the dining room and followed by excellent tales of days long gone told in the Sun Lounge while you savour your coffee. Enjoy an evening of refreshments as you gaze up at cascading waterfalls.

You will be pleased to know that daylight hours in the summer at Øye lasts until midnight, giving you extra time to explore the beautiful mountains nearby. Take a walk down to the waterside to admire the majestic mountains reflecte




[32;1m[1;3m I know what to respond
Action:
```
{
  "action": "Final Answer",
  "action_input": "Public transport in Øye is accessible by bus or car and is a popular destination for day trips. There are several routes to Patchellhytta, and onwards to Slogen, either from Øye, Engeset or Urke. However, it is important to note that you should not use the snow fields on the way down unless you have an ice axe and know how to brake with it, as this practice leads to several serious accidents every year."
}
```

[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Action:
```
{
  "action": "Semantic search",
  "action_input": {
    "query": "attractions",
    "filter": "Øye",
    "topk": 3
  }
}
```

[0m

Batches: 100%|██████████| 1/1 [00:00<00:00,  9.22it/s]


Observation: [38;5;200m[1;3mQuery: attractions
(1)
Now, and then the highlight of the day is the 3-course dinner, composed by our chef with the best local ingredients of the season. Served in the dining room and followed by excellent tales of days long gone told in the Sun Lounge while you savour your coffee. Enjoy an evening of refreshments as you gaze up at cascading waterfalls.

You will be pleased to know that daylight hours in the summer at Øye lasts until midnight, giving you extra time to explore the beautiful mountains nearby. Take a walk down to the waterside to admire the majestic mountains reflected in the fjords.

Hotel Union Øye is a 62°Nord hotel. 62°Nord offers tailor made activities in Fjord Norway.

unionoye.no

#hotelunionoye
(2)
In the village of Øye by the Norangsfjord you will find one of the most distinct hotels in all of Europe.

In the village of Øye by the Norangsfjord you will find one of the most distinct hotels in all of Europe.

It has been a place where




[32;1m[1;3m I have the information I need to respond
Action:
```
{
  "action": "Final Answer",
  "action_input": "Some off-the-beaten-path attractions to explore in Øye include Hotel Union Øye, Slogen summit, and Patchellhytta cabin. Be sure to bring an ice axe if you plan to explore the snow fields."
}
```
[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction:
```
{
  "action": "Semantic search",
  "action_input": {
    "query": "Is it safe to travel to Øye?",
    "filter": "Øye",
    "topk": 5
  }
}
```

[0m

Batches: 100%|██████████| 1/1 [00:00<00:00, 15.10it/s]


Observation: [38;5;200m[1;3mQuery: Is it safe to travel to Øye?
(1)
Distance and total height gain: From Øye: 3 km. 4,921 ft. From Patchellhytta: 3 km. 2,460 ft. From Urke: 3,937 ft. 8 km.

Access Bus/car to Øye or Stranda/Engeset.

Worth noting You are strongly advised not to use the snow fields on the way down unless you have an ice axe and know how to brake with it. This practice leads to several serious accidents every year.
(2)
In the village of Øye by the Norangsfjord you will find one of the most distinct hotels in all of Europe.

In the village of Øye by the Norangsfjord you will find one of the most distinct hotels in all of Europe.

It has been a place where visitors come to savour the good life and the tranquillity in magnificent surroundings since 1891. It has been a favourite venue of royalties, writers and lovers for generations.

The 27 rooms, all of which are individually furnished with carefully selected antiques, are named after notables who have stayed here: Kaise




[32;1m[1;3m I know what to respond
Action:
```
{
  "action": "Final Answer",
  "action_input": "Yes, it is safe to travel to Øye. However, you should take precautions when travelling in the area, such as bringing an ice axe and knowing how to use it to brake on snow fields. You can access Øye by bus or car from Stranda/Engeset."
}
```
[0m

[1m> Finished chain.[0m


Writing Documents: 10000it [00:00, 668905.33it/s]       


starting to analyze Norangsfjord


Looking at alternatenames: 100%|██████████| 607428/607428 [00:10<00:00, 57211.60it/s]

couldn't find coordinates...





In [21]:
places_doc_store.update_embeddings(retriever,batch_size=100)

Batches: 100%|██████████| 1/1 [00:00<00:00,  1.21it/s]ocs/s]
Documents Processed: 100 docs [00:00, 119.29 docs/s]         


In [22]:
places_doc_store.save(index_path="places_doc_store.faiss", config_path="places_doc_store_config.json")