# Diffbot

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/graph/diffbot_graphtransformer.ipynb)

>[Diffbot](https://docs.diffbot.com/docs/getting-started-with-diffbot) is a suite of products that make it easy to integrate and research data on the web.
>
>[The Diffbot Knowledge Graph](https://docs.diffbot.com/docs/getting-started-with-diffbot-knowledge-graph) is a self-updating graph database of the public web.


## Use case

Text data often contain rich relationships and insights used for various analytics, recommendation engines, or knowledge management applications.

`Diffbot's NLP API` allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.

By coupling `Diffbot's NLP API` with `Neo4j`, a graph database, you can create powerful, dynamic graph structures based on the information extracted from text. These graph structures are fully queryable and can be integrated into various applications.

This combination allows for use cases such as:

* Building knowledge graphs from textual documents, websites, or social media feeds.
* Generating recommendations based on semantic relationships in the data.
* Creating advanced search features that understand the relationships between entities.
* Building analytics dashboards that allow users to explore the hidden relationships in data.

## Overview

LangChain provides tools to interact with Graph Databases:

1. `Construct knowledge graphs from text` using graph transformer and store integrations 
2. `Query a graph database` using chains for query creation and execution
3. `Interact with a graph database` using agents for robust and flexible querying 

## Setting up

First, get required packages and set environment variables:

In [None]:
%pip install --upgrade --quiet  langchain langchain-experimental langchain-openai neo4j wikipedia

In [24]:
from langchain_experimental.graph_transformers import DiffbotGraphTransformer
from langchain_core.documents import Document

diffbot_api_key = "fbade13b714446eba358131c7a83357a"
diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)

document = Document(page_content="Mike Tunge is the CEO of Diffbot.")
graph_documents = diffbot_nlp.convert_to_graph_documents([document])
graph_documents

[GraphDocument(nodes=[], relationships=[], source=Document(page_content='Mike Tunge is the CEO of Diffbot.'))]

### Diffbot NLP Service

`Diffbot's NLP` service is a tool for extracting entities, relationships, and semantic context from unstructured text data.
This extracted information can be used to construct a knowledge graph.
To use their service, you'll need to obtain an API key from [Diffbot](https://www.diffbot.com/products/natural-language/).

In [1]:
from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer

diffbot_api_key = "fbade13b714446eba358131c7a83357a"
diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)

This code fetches Wikipedia articles about "Warren Buffett" and then uses `DiffbotGraphTransformer` to extract entities and relationships.
The `DiffbotGraphTransformer` outputs a structured data `GraphDocument`, which can be used to populate a graph database.
Note that text chunking is avoided due to Diffbot's [character limit per API request](https://docs.diffbot.com/reference/introduction-to-natural-language-api).

In [22]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("../VTKG/Identification of non-linear chemical systems with neural networks.pdf")
raw_documents = loader.load()
raw_documents


[Document(page_content='Identification of non -linear chemical systems with neural \nnetworks . \nReynold Oramas  Rodríguez [0000 -0002 -5468 -9929]  Ana Isabel González Santos [0000 -0002 -7969 -\n4070]  Laura García González[0000 -0002 -3815 -158x ] \nTechnological University of Havana , 114 # 11901 / Ciclovía and Rotonda CP 19390 Marianao, \nHavana, Cuba  \nreynoldalejandro46@gmail.com  \nAbstract.  This study proposes the use of neural networks, specifically NARX \nnetworks, in the modeling of no n-linear chemical systems with the use of the \ncontrol field systems identification methodology. The chemical reactor of the \nTennessee Eastman, responsible for the greater non -linearities of the plant, is \nstudied. First, a simple decentralized control schem e is proposed for the        \nstabilization of the plant, an identification experiment is designed, and two            \nsub-models are trained for the level and pressure of the reactor, obtaining      \nsatisfactory results.  \n

In [12]:
graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents)
diffbot_nlp

In [21]:
graph_documents

[GraphDocument(nodes=[], relationships=[], source=Document(page_content='Identification of non -linear chemical systems with neural \nnetworks . \nReynold Oramas  Rodríguez [0000 -0002 -5468 -9929]  Ana Isabel González Santos [0000 -0002 -7969 -\n4070]  Laura García González[0000 -0002 -3815 -158x ] \nTechnological University of Havana , 114 # 11901 / Ciclovía and Rotonda CP 19390 Marianao, \nHavana, Cuba  \nreynoldalejandro46@gmail.com  \nAbstract.  This study proposes the use of neural networks, specifically NARX \nnetworks, in the modeling of no n-linear chemical systems with the use of the \ncontrol field systems identification methodology. The chemical reactor of the \nTennessee Eastman, responsible for the greater non -linearities of the plant, is \nstudied. First, a simple decentralized control schem e is proposed for the        \nstabilization of the plant, an identification experiment is designed, and two            \nsub-models are trained for the level and pressure of the re

## Loading the data into a knowledge graph

You will need to have a running Neo4j instance. One option is to create a [free Neo4j database instance in their Aura cloud service](https://neo4j.com/cloud/platform/aura-graph-database/). You can also run the database locally using the [Neo4j Desktop application](https://neo4j.com/download/), or running a docker container. You can run a local docker container by running the executing the following script:
```
docker run \
    --name neo4j \
    -p 7474:7474 -p 7687:7687 \
    -d \
    -e NEO4J_AUTH=neo4j/pleaseletmein \
    -e NEO4J_PLUGINS=\[\"apoc\"\]  \
    neo4j:latest
```    
If you are using the docker container, you need to wait a couple of second for the database to start.

In [19]:
from langchain_community.graphs import Neo4jGraph

url = "neo4j+s://f93b4063.databases.neo4j.io"
username = "neo4j"
password = "9t7zbHCj3q85w8WW-0BLZUj__vLj9xPjCPiZJSpneeE"

graph = Neo4jGraph(url=url, username=username, password=password)

The `GraphDocuments` can be loaded into a knowledge graph using the `add_graph_documents` method.

In [20]:
graph.add_graph_documents(graph_documents)

## Refresh graph schema information
If the schema of database changes, you can refresh the schema information needed to generate Cypher statements

In [15]:
graph.refresh_schema()

## Querying the graph
We can now use the graph cypher QA chain to ask question of the graph. It is advisable to use **gpt-4** to construct Cypher queries to get the best experience.

In [16]:
from langchain.chains import GraphCypherQAChain
from langchain_openai import ChatOpenAI

chain = GraphCypherQAChain.from_llm(
    cypher_llm=ChatOpenAI(temperature=0, model_name="gpt-4"),
    qa_llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"),
    graph=graph,
    verbose=True,
)

In [17]:
chain.run("What neural networks are used?")

  warn_deprecated(




[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mAs the schema is not provided, it's impossible to generate a specific Cypher statement. However, a general Cypher statement to query a graph database for neural networks might look like this:

MATCH (n:NeuralNetwork) RETURN n.name[0m


ValueError: Generated Cypher Statement is not valid
{code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input 'As': expected
  "ALTER"
  "CALL"
  "CREATE"
  "DEALLOCATE"
  "DELETE"
  "DENY"
  "DETACH"
  "DROP"
  "DRYRUN"
  "ENABLE"
  "FINISH"
  "FOREACH"
  "GRANT"
  "INSERT"
  "LOAD"
  "MATCH"
  "MERGE"
  "NODETACH"
  "OPTIONAL"
  "REALLOCATE"
  "REMOVE"
  "RENAME"
  "RETURN"
  "REVOKE"
  "SET"
  "SHOW"
  "START"
  "STOP"
  "TERMINATE"
  "UNWIND"
  "USE"
  "USING"
  "WITH" (line 1, column 1 (offset: 0))
"As the schema is not provided, it's impossible to generate a specific Cypher statement. However, a general Cypher statement to query a graph database for neural networks might look like this:"
 ^}

In [9]:
chain.run("Who is or was working at Berkshire Hathaway?")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[r:EMPLOYEE_OR_MEMBER_OF]->(o:Organization) WHERE o.name = 'Berkshire Hathaway' RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Charlie Munger'}, {'p.name': 'Oliver Chace'}, {'p.name': 'Howard Buffett'}, {'p.name': 'Howard'}, {'p.name': 'Susan Buffett'}, {'p.name': 'Warren Buffett'}][0m

[1m> Finished chain.[0m


'Charlie Munger, Oliver Chace, Howard Buffett, Susan Buffett, and Warren Buffett are or were working at Berkshire Hathaway.'