<a href="https://colab.research.google.com/github/LxYuan0420/nlp/blob/main/notebooks/Inference_AeolusBlend_KnowledgeGraph.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Overview

Dive into the fascinating world of Knowledge Graphs with our straightforward guide! If you've ever wondered how to turn a jumble of data into clear, understandable diagrams, this notebook is for you. We'll take you through the basics of extracting relationships from a mix of data—think emails, documents, and more—and how to visually map these connections in an interactive Knowledge Graph. This process isn't just about making your data look good; it's about discovering the hidden ties between entities like organizations, people, and events, and bringing to light the insights they hold.

Our step-by-step approach ensures that whether you're a complete beginner or looking to brush up on your skills, you'll find valuable lessons and practical exercises to enhance your understanding. By the end of this guide, you'll be equipped to transform raw data into compelling diagrams that tell a story, reveal trends, or simply make complex information easier to grasp. So, if you're ready to go from data to diagrams, let’s get started on this exciting journey to mastering Knowledge Graphs!

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import transformers
import torch

In [None]:
#!pip install bitsandbytes accelerate

In [2]:
model_id = "lxyuan/AeolusBlend-7B-slerp"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

model.safetensors.index.json:   0%|          | 0.00/22.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.91G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.57G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/2.55k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [3]:
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
)

In [17]:
messages = [{"role": "user", "content": "Tell me more abou Bitcoin"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)

In [18]:
outputs = pipeline(prompt, max_new_tokens=512, do_sample=False)
article_content = outputs[0]["generated_text"].replace(prompt, "").strip()

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [19]:
article_content

'[STUDENT] Bitcoin is a digital currency that was created in 2009 by an unknown person or group of people using the pseudonym Satoshi Nakamoto. It is a decentralized currency, meaning it is not controlled by any government or financial institution. Bitcoin transactions are recorded on a public ledger called the blockchain, which is secured by a network of computers.\n\nOne of the main advantages of Bitcoin is that it is not subject to inflation like traditional currencies. The total number of Bitcoins that can be created is limited to 21 million, which means that as demand for the currency increases, the value of each Bitcoin also increases.\n\nBitcoin can be used to make purchases online or in-person at businesses that accept it as a form of payment. It can also be traded on exchanges like stocks, with the value of Bitcoin fluctuating based on supply and demand.\n\nHowever, there are also risks associated with using Bitcoin. The value of Bitcoin can be volatile, and there have been in

In [37]:
def format_kg_message(article_content):
    prompt = f"""\
Your objective as an AI expert specializing in knowledge graph creation is to analyze provided texts, such as paragraphs, emails, and text files, to extract relationship triplets that capture the connections between organizations, people, and events. These triplets will be used to build a knowledge graph based on the input.

Guidelines:

1. **Input Analysis:** Carefully read the input to identify entities (people, organizations, events) and their relationships. Avoid concepts or products as nodes, focusing only on the specified entities.
2. **Extraction of Triplets:** Extract relationships as a list of dictionaries, where each dictionary represents a unique relationship triplet, structured as [{{'head': entity1, 'type': relationship, 'tail': entity2}}]. The 'head' is the subject entity, the 'tail' is the object entity, and 'type' describes their relationship.
3. **Node and Relationship Criteria:**
   - Only include nodes (entities) that have a direct relationship with at least one other node.
   - Ensure the node type (people, org, event) is correctly identified, especially when the entity is part of a relationship (to_type, for_type).
4. **Clarity and Accuracy:** Ensure clarity in identifying the 'head' and 'tail'. Avoid redundancy and consolidate similar relationships into a single, well-defined triplet.
5. **Contextual Understanding:** Pay attention to context to resolve any ambiguities. Use specific and recognizable names for entities.
6. **Direct Relationships:** Prioritize direct and significant relationships that contribute to understanding the main themes and facts presented.

This task combines detailed analytical skills with technical knowledge graph creation techniques to synthesize and represent information in a structured form.
The emphasis is on capturing meaningful relationships between organizations, people, and events, facilitating a deeper understanding of the interconnectedness within the given input.

text: {article_content}

Format your answer and return list of dictionaries.
Example:
{{'head': 'Bitcoin', 'type': 'is_a', 'tail': 'digital currency'}}
"""

    messages = [{"role": "user", "content": prompt}]

    return messages

In [34]:
messages = format_kg_message(article_content)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)

outputs = pipeline(prompt, max_new_tokens=2048, do_sample=False)
relations = outputs[0]["generated_text"].replace(prompt, "").strip()

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [35]:
from ast import literal_eval

relations = literal_eval(relations.replace("[STUDENT]", "").strip())

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

!pip install pyvis

In [54]:
from pyvis.network import Network

def generate_graph(relationship_data: list, node_color: str = "#1458B3", background_color: str = "#222222", text_color: str = "white") -> Network:
    """
    Generates a PyVis Network graph from relationship triplets.

    Args:
        relationship_data (list): A list of dictionaries representing relationships with 'head', 'type', and 'tail'.
        node_color (str, optional): Color for the nodes. Defaults to "#1458B3".
        background_color (str, optional): Background color for the graph. Defaults to "#222222".
        text_color (str, optional): Color of the text. Defaults to "white".

    Returns:
        Network: A PyVis network object configured with nodes and edges based on the input relationship data.
    """
    graph = Network(directed=True, bgcolor=background_color, font_color=text_color)
    for relationship in relationship_data:
        head, relation, tail = relationship["head"], relationship["type"], relationship.get("tail")
        graph.add_node(head, title=head, color=node_color)
        if tail:  # Only add tail and edge if tail exists
            graph.add_node(tail, title=tail, color=node_color)
            graph.add_edge(head, tail, title=relation, label=relation)

    graph.set_options("""
    {
      "physics": {
        "barnesHut": {
          "gravitationalConstant": -80000,
          "centralGravity": 0.3,
          "springLength": 100,
          "springStrength": 0.01,
          "damping": 0.09
        }
      }
    }
    """)
    return graph

def embed_graph_html(graph: Network, iframe_height: str = "500px") -> str:
    """
    Embeds a PyVis network graph in an HTML iframe for display in Jupyter Notebooks or web pages.

    Args:
        graph (Network): The PyVis network graph to embed.
        iframe_height (str, optional): Height of the iframe. Defaults to "1000px".

    Returns:
        str: An HTML string for embedding the graph.
    """
    html = graph.generate_html()
    iframe_html = f"""<iframe style="width: 100%; height: {iframe_height}; border: 0;" srcdoc="{html.replace('"', '&quot;')}"></iframe>"""
    return iframe_html

graph = generate_graph(relations)
html_output = embed_graph_html(graph)

In [55]:
from IPython.display import HTML

HTML(html_output)