In [6]:
import json

# Tạo đồ thị đơn giản với một vài nút và cạnh
graph = {
    "nodes": [
        {"id": "1", "label": "Person", "properties": {"name": "Alice", "age": 30}},
        {"id": "2", "label": "Person", "properties": {"name": "Bob", "age": 25}},
        {"id": "3", "label": "Company", "properties": {"name": "Acme Corp", "industry": "Tech"}}
    ],
    "edges": [
        {"start": "1", "end": "2", "label": "KNOWS", "properties": {"since": 2010}},
        {"start": "1", "end": "3", "label": "WORKS_AT", "properties": {"role": "Engineer"}},
        {"start": "2", "end": "3", "label": "WORKS_AT", "properties": {"role": "Designer"}}
    ]
}

# Lưu đồ thị vào file JSON
with open('simple_graph.json', 'w') as f:
    json.dump(graph, f, indent=4)

print("Đã tạo đồ thị đơn giản và lưu vào file 'simple_graph.json'.")


Đã tạo đồ thị đơn giản và lưu vào file 'simple_graph.json'.


### Encoder graph

In [7]:
def create_node_string(nodes):
    node_strings = []
    for node in nodes:
        properties = ", ".join(f"{k}: {v}" for k, v in node["properties"].items())
        node_strings.append(f"{node['label']} {node['id']} ({properties})")
    return "; ".join(node_strings)

def encode_graph(graph):
    nodes_string = create_node_string(graph["nodes"])
    output = f"G describes a graph among nodes: {nodes_string}.\n"
    
    if graph["edges"]:
        output += "In this graph:\n"
    
    for edge in graph["edges"]:
        start_node = next(node for node in graph["nodes"] if node["id"] == edge["start"])
        end_node = next(node for node in graph["nodes"] if node["id"] == edge["end"])
        start_node_str = f"{start_node['label']} {start_node['id']}"
        end_node_str = f"{end_node['label']} {end_node['id']}"
        properties = ", ".join(f"{k}: {v}" for k, v in edge["properties"].items())
        
        output += f"Node {start_node_str} is connected to node {end_node_str} with edge {edge['label']} ({properties}).\n"
    
    return output

# Mã hóa đồ thị
encoded_graph = encode_graph(graph)
print("Đồ thị đã được mã hóa:\n", encoded_graph)


Đồ thị đã được mã hóa:
 G describes a graph among nodes: Person 1 (name: Alice, age: 30); Person 2 (name: Bob, age: 25); Company 3 (name: Acme Corp, industry: Tech).
In this graph:
Node Person 1 is connected to node Person 2 with edge KNOWS (since: 2010).
Node Person 1 is connected to node Company 3 with edge WORKS_AT (role: Engineer).
Node Person 2 is connected to node Company 3 with edge WORKS_AT (role: Designer).



In [30]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel, pipeline

# Sử dụng pipeline của Hugging Face để tạo text completion với mô hình GPT-2 XL
generator = pipeline('text-generation', model='gpt2-xl')

# Khởi tạo tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-xl')

# Giới hạn độ dài đầu vào
MAX_INPUT_LENGTH = 1024

# Hàm chia nhỏ chuỗi đầu vào nếu vượt quá độ dài tối đa
def split_input(input_text, max_length=MAX_INPUT_LENGTH):
    tokens = tokenizer.encode(input_text)
    if len(tokens) > max_length:
        return tokenizer.decode(tokens[:max_length])
    return input_text

# Hàm thực hiện truy vấn sử dụng mô hình lớn
def query_huggingface(query, context):
    input_text = f"{context}\n\nQuery: {query}\nAnswer:"
    input_text = split_input(input_text, MAX_INPUT_LENGTH)
    max_length = 500  # Giới hạn độ dài đầu ra
    input_length = len(tokenizer.encode(input_text))
    max_new_tokens = max_length - input_length

    if max_new_tokens > 0:
        results = generator(input_text, max_length=max_length, num_return_sequences=1)
        response = results[0]['generated_text']
        return response.split("Answer:")[1].strip()
    else:
        return "Truy vấn quá dài để xử lý."

# Ví dụ truy vấn
print(" \n-------------TRUY VAN 1 -----------\n")
query = "Find all nodes with labels 'Person'"
response = query_huggingface(query, encoded_graph)
print("Query answer:", response)

# Truy vấn khác
print("\n\n----------TRUY VAN 2--------------\n\n")
query = "Find all nodes whose 'name' attribute is 'Alice'"
response = query_huggingface(query, encoded_graph)
print("Query answer:", response)

print("\n\n------------TRUY VAN 3 ---------------\n\n")
query = "Check the connection between node '1' and node '2'"
response = query_huggingface(query, encoded_graph)
print("Query answer:", response)


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 
-------------TRUY VAN 1 -----------



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Query answer: No

In a query, as we can notice, there are no labels specified when calling GET().

Query: Find all entities with label 'Tech'


----------TRUY VAN 2--------------




Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Query answer: (from { Person 1 ; Company 1 } join Person 2 on (Person 1. name = Person 2. name). find_by (name = 'Alice' ) as c1)

Node 1 has attribute name of : Alice

As you can see, when the name attribute contains an asterisk ("*"), the first match found by the join is returned.

Query: Find all nodes whose 'age' attribute is '30'


------------TRUY VAN 3 ---------------


Query answer: TRUE.

What is the answer to this query?

(a): TRUE

(b): Not possible

(c): TRUE

(d) False

What you actually read:

The answer to this query is (a): TRUE, since they were in the same department since 2010.

The second answer is also available:

(a): TRUE

(b): Not possible


How to get the answer to the question:

The answer to this question is (c): TRUE because both of them are in the Tech department.

Query: Check the connection between node '6' and node '7'


In [33]:
# Ví dụ truy vấn
print(" \n-------------TRUY VAN 1 -----------\n")
query = "Find all nodes with labels 'Person'"
response = query_huggingface(query, encoded_graph)
print("Query answer:", response)

# Truy vấn khác
print("\n\n----------TRUY VAN 2--------------\n\n")
query = "Find all nodes whose 'name' attribute is 'Alice'"
response = query_huggingface(query, encoded_graph)
print("Query answer:", response)

print("\n\n------------TRUY VAN 3 ---------------\n\n")
query = "Check the connection between node '1' and node '2'"
response = query_huggingface(query, encoded_graph)
print("Query answer:", response)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 
-------------TRUY VAN 1 -----------



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Query answer: [{"age":30,"age"":30,"age"":5}], [{"age":10,"age"":10,"age"":5}],..

This query is very useful as it is a complete description. You can combine other functions to create more useful graphs.

More examples

Example 1: Example of an object and an additional feature, example using a multilateral graph as query:

[{"name": "Company", "industry": "Tech"}, {"name": "Person", "age": 7}]

Example 2: Example of an object and its additional nodes and edge in a multilateral graph which is not known to the object:

[{"name":"Computer Vision", "industry": "Computer Vision"}, {"name":"Digital Camera", "manufacturer": "Yamaha"}, {"name":"Door", "doorType":2,"state":"New York"}]

Example 3: Example about objects and their attributes, example connected to its graph and labeled as "unclassifiable":

[{"name": "Person", "age":6}]


Example 4a: Example (from another data source such as Google Sheet and CSV files) about objects, their attributes, edges and label as query:

[{"data": "company,

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Query answer: (all of them)

In this graph:

Node label Person 1: KNOWS - {NAME "Alice",AGE 30;}

Node label Person 1: WORKS_AT - {Role Engineer }

Node label Person 2: WORKS_AT - {Role Designer}

Node label Person 2: WORKS_AT - {Role Accountant}

In this graph:

Query: Determine nodes with highest 'company' attribute


------------TRUY VAN 3 ---------------


Query answer: the connection is in fact a single node with two label values (Person 1 and Person 2) and a single edge pointing to node 'KNOWS'. Note that there is no label 'KNOWS' but only two label values, Person 1 and Person 2. The fact that the graph has an edge only between two nodes is the key. You can even create a single edge between any two nodes in the graph.

However, you cannot have edges between any two nodes in parallel; only edges that are connected in their "top to bottom" dimension. The following example illustrates the difference:

Graph B: 2 x 2 x 2 x 2 x 2 x 2 x 2 = 4 edges

Query: Check the connection between 

In [32]:
def create_node_string(nodes):
    node_strings = []
    for node in nodes:
        properties = ", ".join(f"{k}: {v}" for k, v in node["properties"].items())
        node_strings.append(f"label {node['label']} {node['id']} with properties ({properties})")
    return "; ".join(node_strings)

def encode_graph(graph):
    nodes_string = create_node_string(graph["nodes"])
    output = f"G describes a graph among nodes: {nodes_string}.\n"
    
    if graph["edges"]:
        output += "In this graph:\n"
    
    for edge in graph["edges"]:
        start_node = next(node for node in graph["nodes"] if node["id"] == edge["start"])
        end_node = next(node for node in graph["nodes"] if node["id"] == edge["end"])
        start_node_str = f"label {start_node['label']} {start_node['id']}"
        end_node_str = f"label {end_node['label']} {end_node['id']}"
        properties = ", ".join(f"{k}: {v}" for k, v in edge["properties"].items())
        
        output += f"Node {start_node_str} is connected to node {end_node_str} with edge {edge['label']} ({properties}).\n"
    
    return output

# Mã hóa đồ thị
encoded_graph = encode_graph(graph)
print("Đồ thị đã được mã hóa:\n", encoded_graph)


Đồ thị đã được mã hóa:
 G describes a graph among nodes: label Person 1 with properties (name: Alice, age: 30); label Person 2 with properties (name: Bob, age: 25); label Company 3 with properties (name: Acme Corp, industry: Tech).
In this graph:
Node label Person 1 is connected to node label Person 2 with edge KNOWS (since: 2010).
Node label Person 1 is connected to node label Company 3 with edge WORKS_AT (role: Engineer).
Node label Person 2 is connected to node label Company 3 with edge WORKS_AT (role: Designer).



In [40]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel, pipeline

# Use Hugging Face pipeline for text completion with GPT-2 XL model
generator = pipeline('text-generation', model='gpt2-xl')

# Initialize tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-xl')

# Maximum input length
MAX_INPUT_LENGTH = 1024

# Function to split input if it exceeds maximum length
def split_input(input_text, max_length=MAX_INPUT_LENGTH):
    tokens = tokenizer.encode(input_text)
    if len(tokens) > max_length:
        return tokenizer.decode(tokens[:max_length])
    return input_text

# Function to perform query using the large language model
def query_huggingface(query, context):
    input_text = f"{context}\n\nQuery: {query}\nAnswer:"
    input_text = split_input(input_text, MAX_INPUT_LENGTH)
    max_length = 500  # Maximum output length
    input_length = len(tokenizer.encode(input_text))
    max_new_tokens = max_length - input_length

    if max_new_tokens > 0:
        # Define few-shot prompts based on query type
        prompts = {
            "find_nodes_by_label": "List all nodes with label 'Person': Answer: Person 1 (name: Alice, age: 30); Person 2 (name: Bob, age: 25);",
            "find_nodes_by_attribute": "Find all nodes where the 'age' attribute is 30': Answer: Person 1 (name: Alice, age: 30)",
            "check_connection": "What is the connection between node '1' and node '2'? Answer: TRUE, relationship is KNOW",
        }

        # Identify the appropriate prompt based on the query
        prompt_type = None
        if "label" in query.lower():
            prompt_type = "find_nodes_by_label"
        elif "attribute" in query.lower():
            prompt_type = "find_nodes_by_attribute"
        elif "connection" in query.lower():
            prompt_type = "check_connection"

        # Construct the final input text with prompt and context
        if prompt_type:
            input_text = f"{context}\n\n{prompts[prompt_type].format(*query.split(' ', 1))}\n"
        else:
            input_text = f"{context}\n\nQuery: {query}\nAnswer:"

        input_text = split_input(input_text, MAX_INPUT_LENGTH)

        results = generator(input_text, max_length=max_length, num_return_sequences=1)
        response = results[0]['generated_text']
        return response.split("Answer:")[1].strip()
    
    else:
        return "Truy vấn quá dài để xử lý."

# Example queries
print("\n\n-------------TRUY VAN 1 -----------\n\n")
query = "Find all nodes with labels 'Company'"
response = query_huggingface(query, encoded_graph)
print("Query answer:", response)

print("\n\n----------TRUY VAN 2--------------\n\n")
query = "Find all nodes whose 'name' attribute is 'Alice'"
response = query_huggingface(query, encoded_graph)
print("Query answer:", response)

print("\n\n------------TRUY VAN 3 ---------------\n\n")
query = "Check the connection between node '2' and node '3'"
response = query_huggingface(query, encoded_graph)
print("Query answer:", response)



KeyboardInterrupt



In [3]:
!pip install groq

Collecting groq
  Downloading groq-0.8.0-py3-none-any.whl.metadata (13 kB)
Downloading groq-0.8.0-py3-none-any.whl (105 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.4/105.4 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: groq
Successfully installed groq-0.8.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [13]:
import os

from groq import Groq

client = Groq(
    api_key= "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",
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": encoded_graph + "\nQuery: Find all nodes in graph G with age is 30" ,
        }
    ],
    model="llama3-8b-8192",
)

print(chat_completion.choices[0].message.content)

A nice graph query!

Since we're looking for nodes with age 30, we can use a simple filter operation to find the nodes that match this condition.

Let's denote the graph as `G` and the nodes as `n`. We can write the query as follows:

`SELECT n AS node`

`FROM G`

`WHERE type(n) = "Person" AND age(n) = 30`

This query selects all nodes `n` in graph `G` where:

1. `type(n)` is "Person" (to filter out companies and other non-person nodes)
2. `age(n)` is equal to 30

In this case, the only node that matches this condition is "Person 1" (Alice), which has an age of 30.

The resulting output would be a single node with properties:

* Node: Person 1
* Age: 30
* Name: Alice


In [11]:
encoded_graph + "\nQuery: Find all nodes in this graph with labels 'Company'"

"G describes a graph among nodes: Person 1 (name: Alice, age: 30); Person 2 (name: Bob, age: 25); Company 3 (name: Acme Corp, industry: Tech).\nIn this graph:\nNode Person 1 is connected to node Person 2 with edge KNOWS (since: 2010).\nNode Person 1 is connected to node Company 3 with edge WORKS_AT (role: Engineer).\nNode Person 2 is connected to node Company 3 with edge WORKS_AT (role: Designer).\n\nQuery: Find all nodes in this graph with labels 'Company'"