Fact Finder

Getting Started

Set up PrimeKG Neo4j Instance, see here

Install Dependencies:

pip install -e .

Some features of FactFinder are based on external APIs. While an openai api key is required to run FactFinder, semantic scholar as well as Bayer's linnaeusannotate entity detection are optional. Set environment variables:

export LLM="gpt-4o"                 # "gpt-4-turbo" as an alternative
export SEMANTIC_SCHOLAR_KEY=""      # fill API key for semantic scholar
export OPENAI_API_KEY=""            # fill opanAI api key
export SYNONYM_API_KEY=""           # Bayer internal linnaeusannotate synonym API key
export SYNONYM_API_URL=""           # Bayer internal linnaeusannotate synonym API url

Run UI:

streamlit run src/fact_finder/app.py --browser.serverAddress localhost

Running with additional arguments (e.g. activating the normalized graph synonyms):

streamlit run src/fact_finder/app.py --browser.serverAddress localhost -- [args]
streamlit run src/fact_finder/app.py --browser.serverAddress localhost -- --normalized_graph --use_entity_detection_preprocessing

The following flags are available:

--normalized_graph  =  Apply synonym replacement based on the normalized graph to the cypher queries before applying them to the graph.
--use_entity_detection_preprocessing  =  Apply entity detection to the user question before generating the cypher query. The found entities will be replaced by their preferred terms and a string describing their category (e.g. "Psoriasis is a disease.") will be added to the query. This requires the corresponding api key ($SYNONYM_API_KEY) to be set. Also, the normalized graph should be used.
--use_subgraph_expansion  =  The evidence graph gets expanded through the surrounding neighborhoods.

Process description

The following steps are undertaken to get from the user question to the natural language answer and the provided evidence:

In the first step, a language model call is used to generate a cypher query to the knowledge graph. To achieve this, the prompt template contains the schema of the graph, i.e. information about all nodes and their properties. Additionally, the prompt template can be enriched with natural language descriptions for (some of) the relations in the graph allowing better understanding of their meaning for the language model. In case the model decides that the user question cannot be answered by a graph with the given schema, the model is instructed to return an error message starting with the marker string "SCHEMA_ERROR". This is then detected and the error message is directly forwarded to the user.
In the second step, the generated cypher query is preprocessed using regular expressions.
- First, a formatting is applied in order to make subsequent regular expressions easier to design. This includes for example removal of unnecessary whitespaces and using double quotes for all strings.
- Next, all property values are turned to lower case. This assumes that a similar preprocessing has been done for the property values in the graph and makes the query resistant to capitalization mismatches.
- Finally, for some node types, any names used in the query, are replaced with a synonym that is actually used in the graph. This is (for example) done by looking up synonyms for the name and checking which one actually exists in the graph.
In the third step, the graph is queried with the final result of the cypher preprocessing. The graph answer together with the cypher query are part of the evidence presented in the interface, allowing transparency for the user.
With another language model call, the final natural language answer is generated from the result of querying the graph.
Additionally, a subgraph is generated from the graph query and result. This serves as visual evidence for the user. The subgraph can either be generated via a rule based approach or also with help of the language model.

User Interface

The following image shows the user interface of the application for the question "Which drugs are used to treat ocular hypertension?". The answers of the standalone LLM and our graph-based hybrid system are compared as output. In addition, the relevant subgraph is displayed as evidence together with the generated Cypher query, the answer from the graph and the prompts used.

Name		Name	Last commit message	Last commit date
Latest commit History 356 Commits
img		img
neo4j_primekg		neo4j_primekg
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fact Finder

Getting Started

Process description

User Interface

About

Releases

Packages

Contributors 6

Languages

License

chrschy/fact-finder

Folders and files

Latest commit

History

Repository files navigation

Fact Finder

Getting Started

Process description

User Interface

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages