<a href="https://colab.research.google.com/github/beve0x/theGraphLLM/blob/main/TheGraph_LLM_PoC_k.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **TheGraph-LLM Demo**

In the following notebook, we aim to show how we can create a conceptual pipeline of Data extraction via Graph, and use GPT to create a chatBot-like experience for blockchain data analytics.
For the demo below, it is required to create an account (with a wallet) in TheGraph, and an account with OpenAI.

---
Use this notebook below to run the demo.


In [None]:
import requests
import os

### TheGraph GatewayURL
Add the Gateway URL here. This will need you to get a graphAPI key. For example, for https://thegraph.com/explorer/subgraphs/HUZDsRpEVP2AvzDCyzDHtdc64dyDxx8FQjzsmqSg4H3B?view=Overview&chain=arbitrum-one (The Substreams Uniswap v3 Ethereum), the url looks like the one shown in the cell below. Replace [api-key] with your TheGraph API key.

In [None]:
url = "https://gateway-arbitrum.network.thegraph.com/api/[api-key]/subgraphs/id/HUZDsRpEVP2AvzDCyzDHtdc64dyDxx8FQjzsmqSg4H3B"

### GraphQL Query

You can modify the graphQL query below. To know the attributes, you can refer to the substream playground (Found in the explorer: https://thegraph.com/explorer/subgraphs/HUZDsRpEVP2AvzDCyzDHtdc64dyDxx8FQjzsmqSg4H3B?view=Playground&chain=arbitrum-one). The playground provides an explorer which has all the eligible attributes via which graphQL can run the query.

In [None]:
query = """
{
  factories(first: 5) {
    id
    poolCount
    txCount
    totalVolumeUSD
  }
  bundles(first: 5) {
    id
    ethPriceUSD
  }
  tokens(first: 20) {
    id
    name
  }
}
"""

Cells below run the GraphQL API. You can run them as is.

In [None]:
headers = {
    "Content-Type": "application/json",
}

data = {
    "query": query
}

In [None]:
response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
    result = response.json()
    print(result)
else:
    print(f"Error: {response.status_code}, {response.text}")

{'data': {'factories': [{'id': '0x1F98431c8aD98523631AE4a59f267346ea31F984', 'poolCount': '18667', 'txCount': '44442309', 'totalVolumeUSD': '1238052823980.30305322304874176553'}], 'bundles': [{'id': '1', 'ethPriceUSD': '2413.689436074894154128756508889091'}], 'tokens': [{'id': '0x00000000000045166c45af0fc6e4cf31d9e14b9a', 'name': 'TopBidder'}, {'id': '0x0000000000004946c0e9f43f4dee607b0ef1fa1c', 'name': 'Chi Gastoken by 1inch'}, {'id': '0x0000000000071566c1cf5db929f8e2e2f5d57da8', 'name': 'UOU'}, {'id': '0x0000000000071821e8033345a7be174647be0706', 'name': 'Scry Protocol'}, {'id': '0x0000000000085d4780b73119b644ae5ecd22b376', 'name': 'TrueUSD'}, {'id': '0x0000000000095413afc295d19edeb1ad7b71c952', 'name': 'Tokenlon'}, {'id': '0x000000000075f13bcf2e6652e84821e8b544f6f9', 'name': 'Signet'}, {'id': '0x000000000ca5171087c18fb271ca844a2370fc0a', 'name': 'Merkle Network Token'}, {'id': '0x00000000441378008ea67f4284a57932b1c000a5', 'name': 'TrueGBP'}, {'id': '0x000000007a58f5f58e697e51ab0357b

You can visualize the results which have been extracted from the blockchain query. This data is then dumped locally in the notebooks temporary.

In [None]:
result['data']

{'factories': [{'id': '0x1F98431c8aD98523631AE4a59f267346ea31F984',
   'poolCount': '18667',
   'txCount': '44442309',
   'totalVolumeUSD': '1238052823980.30305322304874176553'}],
 'bundles': [{'id': '1',
   'ethPriceUSD': '2413.689436074894154128756508889091'}],
 'tokens': [{'id': '0x00000000000045166c45af0fc6e4cf31d9e14b9a',
   'name': 'TopBidder'},
  {'id': '0x0000000000004946c0e9f43f4dee607b0ef1fa1c',
   'name': 'Chi Gastoken by 1inch'},
  {'id': '0x0000000000071566c1cf5db929f8e2e2f5d57da8', 'name': 'UOU'},
  {'id': '0x0000000000071821e8033345a7be174647be0706',
   'name': 'Scry Protocol'},
  {'id': '0x0000000000085d4780b73119b644ae5ecd22b376', 'name': 'TrueUSD'},
  {'id': '0x0000000000095413afc295d19edeb1ad7b71c952', 'name': 'Tokenlon'},
  {'id': '0x000000000075f13bcf2e6652e84821e8b544f6f9', 'name': 'Signet'},
  {'id': '0x000000000ca5171087c18fb271ca844a2370fc0a',
   'name': 'Merkle Network Token'},
  {'id': '0x00000000441378008ea67f4284a57932b1c000a5', 'name': 'TrueGBP'},
  {'id':

In [None]:
os.mkdir('data')
file_path = "data/result.txt"

FileExistsError: ignored

In [None]:
# Write the response to a text file
with open(file_path, 'w') as file:
  file.write(str(result['data']))

Install llama-index library. LlamaIndex is used to handle very large query texts, which may exceed the context length of GPT APIs

In [None]:
!pip install llama-index

Collecting llama-index
  Downloading llama_index-0.9.22-py3-none-any.whl (15.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.7/15.7 MB[0m [31m33.8 MB/s[0m eta [36m0:00:00[0m
Collecting beautifulsoup4<5.0.0,>=4.12.2 (from llama-index)
  Downloading beautifulsoup4-4.12.2-py3-none-any.whl (142 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.0/143.0 kB[0m [31m17.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dataclasses-json (from llama-index)
  Downloading dataclasses_json-0.6.3-py3-none-any.whl (28 kB)
Collecting deprecated>=1.2.9.3 (from llama-index)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting httpx (from llama-index)
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
Collecting openai>=1.1.0 (from llama-index)
  Downloading openai-1.6.1-py3-none-any.whl (225 kB)
[2K     [90m━━━━━

### OpenAI API

Below cells are used to run OpenAI APIs via LLamaIndex. You will need the OpenAI API key. Fill your API key in the cell below instead of <your-openai-key>

In [None]:
os.environ['OPENAI_API_KEY']='<your-openai-key>'

In [None]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

In the cell below, we have modified the GPT prompt, to add more context and help GPT for a better analysis. You can also make any changes if required. Consider this as the exact prompt which is fed into GPT.

In [None]:
from llama_index.prompts import PromptTemplate
qa_prompt_tmpl_str = (
    "Context information is below. It is a json formatted data coming from a blockchain, queried from the uniswap substream.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and no prior knowledge, "
    "answer the query about the blockchain data.\n"
    "Query: {query_str}\n"
    "Answer: "
)
qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)
query_engine = index.as_query_engine(response_mode="compact")
query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)

### Results

Add your question in the question variable. Run the below cell to get the result.

In [None]:
#query_engine = index.as_query_engine()
question = "How many tokens are there?"
response = query_engine.query(question)
print(response)

There are 20 tokens in the blockchain data.
