# Filecoin-Spec Question Answering System with OpenAI API
See idea in https://simonwillison.net/2023/Jan/13/semantic-search-answers/ <br>
See notebook from OpenAI where idea is implemented: https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb  <br>
Also see: Haystack Deepset Tool: https://haystack.deepset.ai/tutorials/01_basic_qa_pipeline#preparing-documents



In [1]:
from urllib.request import urlopen
from bs4 import BeautifulSoup
# import pandas as pd
# from pandas.io import sql
# import numpy as np
# from datetime import datetime, timedelta
import re, requests

## Scrape and Assemble Desired Content

In [2]:
URL = "https://spec.filecoin.io/"
r = requests.get(URL)
  
soup = BeautifulSoup(r.content)
type(soup)

bs4.BeautifulSoup

In [3]:
# Get the title
title = soup.title
print(title)

<title>Home | Filecoin Spec</title>


In [4]:
# Print out the text
text = soup.get_text()
print(soup.text[12000:13000])

locks6.2.1.1.1.1 Block Single6.2.1.1.1.2 Block Short6.2.1.1.1.3 Block Long6.2.1.1.1.4 Bit Numbering6.2.2 HAMT6.2.3 Other Considerations6.3 Filecoin Parameters6.3.1 Orient parameters6.4 Audit Reports6.4.1 Lotus6.4.1.1 2020-10-20 Lotus Mainnet Ready Security Audit6.4.2 Actors6.4.2.1 2020-10-19 Actors Mainnet Ready Security Audit6.4.3 Proofs6.4.3.1 2020-10-20 Filecoin Bellman and BLS Signatures6.4.3.2 2020-07-28 Filecoin Proving Subsystem6.4.3.3 2020-07-28 zk-SNARK proofs6.4.4 GossipSub6.4.4.1 2020-06-03 GossipSub Design and Implementation6.4.4.2 2020-04-18 GossipSub Evaluation6.4.5 Drand6.4.5.1 2020-08-09 drand reference implementation Security Audit7 Filecoin Implementations7.1 Lotus7.2 Venus7.3 Forest7.4 Fuhon (cpp-filecoin)8 Releases8.1 v2.1.18.1.1 Bug Fixes8.2 v2.1.08.2.1 Bug Fixes8.2.2 Features8.3 v2.0.08.3.1 Bug Fixes8.3.2 FeaturesIntroductionState reliableTheory Audit n/aEdit this sectionsection-introFilecoin is a distributed storage network based on a blockchain mechanism.
Fileco

In [5]:
len(soup.text)

951469

## Divide into Searchable Index

There are much better methods to do this like index into Elasticsearch or Deepset Haystack. Just a traditional search engine to take Top N results.

In [6]:
text_chunk_chars = 400
list_of_texts = []
for i in range(12000,len(soup.text),text_chunk_chars):
    list_of_texts.append(soup.text[i:(i+text_chunk_chars)])

In [7]:
print(list_of_texts[0:3])

['locks6.2.1.1.1.1 Block Single6.2.1.1.1.2 Block Short6.2.1.1.1.3 Block Long6.2.1.1.1.4 Bit Numbering6.2.2 HAMT6.2.3 Other Considerations6.3 Filecoin Parameters6.3.1 Orient parameters6.4 Audit Reports6.4.1 Lotus6.4.1.1 2020-10-20 Lotus Mainnet Ready Security Audit6.4.2 Actors6.4.2.1 2020-10-19 Actors Mainnet Ready Security Audit6.4.3 Proofs6.4.3.1 2020-10-20 Filecoin Bellman and BLS Signatures6.4.3.', '2 2020-07-28 Filecoin Proving Subsystem6.4.3.3 2020-07-28 zk-SNARK proofs6.4.4 GossipSub6.4.4.1 2020-06-03 GossipSub Design and Implementation6.4.4.2 2020-04-18 GossipSub Evaluation6.4.5 Drand6.4.5.1 2020-08-09 drand reference implementation Security Audit7 Filecoin Implementations7.1 Lotus7.2 Venus7.3 Forest7.4 Fuhon (cpp-filecoin)8 Releases8.1 v2.1.18.1.1 Bug Fixes8.2 v2.1.08.2.1 Bug Fixes8.2.2 ', 'Features8.3 v2.0.08.3.1 Bug Fixes8.3.2 FeaturesIntroductionState reliableTheory Audit n/aEdit this sectionsection-introFilecoin is a distributed storage network based on a blockchain mechani

In [30]:
query = "Is Initial Pledge higher than or lower than PreCommitDeposit?"
# Note query can be in multiple languages

query_keyword = "Initial Pledge" # With a good search engine step here, you would not need to manually pull a keyword

prompt_blob = [i for i in list_of_texts if query_keyword in i]
print(len(prompt_blob))
print(prompt_blob)

8
['balance MUST cover ALL of the following:PreCommit Deposits: When a Miner PreCommits a Sector, they must supply a “precommit deposit” for the Sector, which acts as collateral. If the Sector is not ProveCommitted on time, this deposit is removed and burned.Initial Pledge: When a Miner ProveCommits a Sector, they must supply an “initial pledge” for the Sector, which acts as collateral. If the Sector ', 'd is a mechanism to reduce the initial token commitment by vesting block rewards over time. The third aligns incentives between miner and client, and can allow miners to differentiate themselves in the market. The remainder of this subsection describes each in more detail.Initial Pledge CollateralState reliableTheory Audit n/aEdit this sectionsection-systems.filecoin_mining.miner_collaterals.initi', 'ay creates the necessary sub-linearity. This sub-linearity has been introduced by the Initial Pledge.In general, fault fees are slashed first from the soonest-to-vest unvested block reward

## Dynamically Create OpenAI Prompt including most Relevant Content 

In [31]:
prompt = """Answer the question as truthfully as possible using the provided text, 
and if the answer is not contained within the text below, say "I don't know,
and finally, provide the answer translated into Spanish, and Chinese Mandarin."

Context:""" + str(prompt_blob) + """

Q: """ + query +"""?

A:"""
print(prompt)

Answer the question as truthfully as possible using the provided text, 
and if the answer is not contained within the text below, say "I don't know,
and finally, provide the answer translated into Spanish, and Chinese Mandarin."

Context:['balance MUST cover ALL of the following:PreCommit Deposits: When a Miner PreCommits a Sector, they must supply a “precommit deposit” for the Sector, which acts as collateral. If the Sector is not ProveCommitted on time, this deposit is removed and burned.Initial Pledge: When a Miner ProveCommits a Sector, they must supply an “initial pledge” for the Sector, which acts as collateral. If the Sector ', 'd is a mechanism to reduce the initial token commitment by vesting block rewards over time. The third aligns incentives between miner and client, and can allow miners to differentiate themselves in the market. The remainder of this subsection describes each in more detail.Initial Pledge CollateralState reliableTheory Audit n/aEdit this sectionsection-sys

In [32]:
import openai
import os
from dotenv import load_dotenv, find_dotenv
from pathlib import Path

COMPLETIONS_MODEL = "text-davinci-003"
EMBEDDING_MODEL = "text-embedding-ada-002"

In [33]:
load_dotenv(Path(os.getcwd() + '/.env'))
openai.api_key = os.getenv("OPENAI_API_KEY")
# print(os.getenv("OPENAI_API_KEY"))

In [34]:
openai.Completion.create(
    prompt=prompt,
    temperature=0,
    max_tokens=600,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    model=COMPLETIONS_MODEL
)["choices"][0]["text"].strip(" \n")

'Initial Pledge is usually higher than PreCommitDeposit. \nSpanish: El compromiso inicial suele ser mayor que el depósito de precompromiso.\nChinese Mandarin: 初始承诺通常高于预提交存款。'