# KG CREATION PIPELINE

This is a tool that allows the creation of a robust ontology-based knowledge graph from unstrctured input. The knowledge graph will be stored in Neo4j.

## FOR REFERENCE:

TKGCon: 
https://medium.com/@researchgraph/automated-knowledge-graph-construction-with-large-language-models-part-2-b107ca8ec5ea

Entity Resolution: 
https://towardsdatascience.com/entity-resolution-identifying-real-world-entities-in-noisy-data-3e8c59f4f41c

### GOALS
1. Multimodal input
2. Self-augmentation from user queries and automated web search
3. Agents and tools

### PROCESS

![TKGCon](Markdown_Images\TKGCon.png)
1. PDF to image extraction and generation of description in unstructured text
2. Text splitting and storage in local vector database
3. Creation of unrefined triples
3. Determination of themes
4. Ontology creation based on themes (Human in-the-loop)
5. Knowledge graph construction 

## INSTALLING DEPENDENCIES

In [1]:
%pip install pdf2image==1.17.0 openai==1.30.1 tiktoken==0.7.0 python-dotenv==1.0.1 pandas==2.2.2 langchain==0.2.11 langchain-community==0.2.10 langchain-openai==0.1.17 langchain-experimental==0.0.63 neo4j==5.22.0 langchain-chroma==0.1.2

INFO: pip is looking at multiple versions of langchain-openai to determine which version is compatible with other requirements. This could take a while.

The conflict is caused by:
    The user requested openai==1.30.1
    langchain-openai 0.1.17 depends on openai<2.0.0 and >=1.32.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

Note: you may need to restart the kernel to use updated packages.


ERROR: Cannot install langchain-openai==0.1.17 and openai==1.30.1 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts


## SETUP

In [3]:
import tempfile
import os
import json
import base64
import requests
import tempfile
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import display, Math, Markdown
from pdf2image import convert_from_path, convert_from_bytes
from pdf2image.exceptions import (
    PDFInfoNotInstalledError,
    PDFPageCountError,
    PDFSyntaxError
)

from langchain_core.runnables import (
    RunnableBranch,
    RunnableLambda,
    RunnableParallel,
    RunnablePassthrough,
)
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts.prompt import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import Tuple, List, Optional
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_community.graphs import Neo4jGraph
from langchain.document_loaders import WikipediaLoader
from langchain.text_splitter import TokenTextSplitter
from langchain_openai import ChatOpenAI
from langchain_experimental.graph_transformers import LLMGraphTransformer
from neo4j import GraphDatabase
from langchain_community.vectorstores import Neo4jVector
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores.neo4j_vector import remove_lucene_chars
from langchain_core.runnables import ConfigurableField, RunnableParallel, RunnablePassthrough
import tiktoken

import pandas as pd
import numpy as np
from langchain.document_loaders import PyPDFLoader, UnstructuredPDFLoader, PyPDFium2Loader
from langchain.document_loaders import PyPDFDirectoryLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pathlib import Path
import random

load_dotenv('KG.env')
api_key = os.getenv('OPENAI_API_KEY')