### Creating Knowledge Graphs from Textual Data: Finding Hidden Connections

Knowledge graphs have emerged as a powerful way to visualize and understand relationships between different pieces of information, transforming unstructured text into a structured network of entities and their relationships. We will guide you through a simple workflow for creating a knowledge graph from textual data, making complex information more accessible and easier to understand.

Here’s what we are going to do in this project:

![image](./knowledge-graph-data-pipeline.jpg)

*Our knowledge graph from textual data pipeline.*

Before creating a knowledge graph, it is essential to understand the difference between **knowledge graphs** and **knowledge bases,** as these terms are often mistakenly interchanged.

A **Knowledge Base (KB)** is a collection of structured information about a specific domain. A **Knowledge Graph** is a form of **Knowledge Base** organized as a graph. In a **Knowledge Graph,** `nodes` represent *entities*, and `edges` represent the *relationships between these entities*. For instance, from the sentence “Fabio lives in Italy,” we can derive the relationship triplet `<Fabio, lives in, Italy>`, where “Fabio” and “Italy” are the entities, and “lives in” represents their connection.

A **knowledge graph** is a subtype of a **knowledge base**; however, it is not always associated with one.

Building a knowledge graph generally involves two main steps:

1.  **Named Entity Recognition (NER):**  This step focuses on identifying and extracting entities from the text, which will serve as the nodes in the knowledge graph.
2.  **Relation Classification (RC):**  This step focuses on identifying and classifying the relationships between the extracted entities, forming the edges of the knowledge graph.

The **knowledge graph** is often visualized using tools like **pyvis**.

To enhance the process of creating a **knowledge graph** from text, additional steps can be integrated, such as:

-   **Entity Linking:**  This step helps to normalize different mentions of the same entity. For example, “Napoleon” and “Napoleon Bonaparte” would be linked to a common reference, such as their Wikipedia page.
-   **Source Tracking:**  This involves recording the origin of each piece of information, like the URL of the article or the specific text fragment it came from. Tracking sources helps assess the information’s credibility (for example, a relationship is considered more reliable if multiple reputable sources verify it).

In this project, we will simultaneously do **Named Entity Recognition** and **Relation Classification** through an effective prompt. This combined approach is often referred to as **Relation Extraction (RE)**.