<a href="https://colab.research.google.com/github/bhattacharya5/NLP/blob/main/Assignment2_NLU_KnowledgeNet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Build a relation classifier that can detect a predefined class of relations, as specified in the dataset. 
#2. Create a subset of the KnowledgeNet data using sentences which contain any of the following relations: (make a subset of train.json with these relations only)
DATE_OF_BIRTH (PER–DATE)
RESIDENCE (PER–LOC) 
BIRTHPLACE (PER–LOC)
NATIONALITY (PER–LOC)
EMPLOYEE_OF (PER–ORG) 
EDUCATED_AT (PER–ORG) 

<br>

Here are the steps involved in building a relation classifier and creating a subset of the KnowledgeNet data:

**Load the KnowledgeNet dataset**. The KnowledgeNet dataset is a collection of annotated sentences, each of which has been linked to at least one entity in Wikidata. The entities are linked to the sentences using a variety of relations.

**Preprocess the data**. The data needs to be preprocessed before it can be used to train a model. This includes cleaning the data, removing stop words, and tokenizing the sentences.

**Create a training set and a test set**. The data needs to be split into a training set and a test set. The training set will be used to train the model, and the test set will be used to evaluate the performance of the model.

**Choose a machine learning algorithm.** There are a variety of machine learning algorithms that can be used to build a relation classifier. Some of the most common algorithms include support vector machines, naive Bayes, and logistic regression.

**Train the model.** The model needs to be trained on the training set. This involves feeding the model the training data and allowing it to learn the relationships between the entities and the relations.

**Evaluate the model.** The model needs to be evaluated on the test set to determine its performance. This involves feeding the model the test data and measuring its accuracy.

**Use the model to extract relations from the KnowledgeNet data.** Once the model has been trained and evaluated, it can be used to extract relations from the KnowledgeNet data. This involves feeding the model the KnowledgeNet data and extracting the relations that the model has learned to recognize.
Here are the steps involved in creating a subset of the KnowledgeNet data:

1. **Load the KnowledgeNet dataset.**
2. **Filter the data to include only sentences that contain the desired relations.**
3. **Write the filtered data to a new file.**

Here is an example of how to create a subset of the KnowledgeNet data that contains only sentences that contain the relations DATE_OF_BIRTH, RESIDENCE, BIRTHPLACE, NATIONALITY, EMPLOYEE_OF, and EDUCATED_AT:

import json

**Load the KnowledgeNet dataset.**
with open('train.json', 'r') as f:
knowledge_net = json.load(f)

**Filter the data to include only sentences that contain the desired relations.**
filtered_knowledge_net = []
for sentence in knowledge_net:
for relation in ['DATE_OF_BIRTH', 'RESIDENCE', 'BIRTHPLACE', 'NATIONALITY', 'EMPLOYEE_OF', 'EDUCATED_AT']:
if relation in sentence['relations']:
filtered_knowledge_net.append(sentence)

**Write the filtered data to a new file.**
with open('subset.json', 'w') as f:
json.dump(filtered_knowledge_net, f, indent=4)

This code will create a new file called subset.json that contains a subset of the KnowledgeNet data that contains only sentences that contain the relations DATE_OF_BIRTH, RESIDENCE, BIRTHPLACE, NATIONALITY, EMPLOYEE_OF, and EDUCATED_AT.


In [None]:
import json
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

# Load the KnowledgeNet dataset.
with open('train.json', 'r') as f:
    knowledge_net = json.load(f)

# Filter the data to include only sentences that contain the desired relations.
filtered_knowledge_net = []
for sentence in knowledge_net:
    for relation in ['DATE_OF_BIRTH', 'RESIDENCE', 'BIRTHPLACE', 'NATIONALITY', 'EMPLOYEE_OF', 'EDUCATED_AT']:
        if relation in sentence['relations']:
            filtered_knowledge_net.append(sentence)

# Create a training set and a test set.
train_set = filtered_knowledge_net[:int(len(filtered_knowledge_net) * 0.8)]
test_set = filtered_knowledge_net[int(len(filtered_knowledge_net) * 0.8):]

# Create a TF-IDF vectorizer.
vectorizer = TfidfVectorizer()

# Create a logistic regression model.
model = LogisticRegression()

# Train the model.
model.fit(vectorizer.fit_transform([sentence['text'] for sentence in train_set]), [relation for sentence in train_set for relation in sentence['relations']])

# Evaluate the model.
predictions = model.predict(vectorizer.transform([sentence['text'] for sentence in test_set]))
accuracy = np.mean(predictions == [relation for sentence in test_set for relation in sentence['relations']])

print('Accuracy:', accuracy)


In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

# Read the JSON file into a Pandas DataFrame.
df = pd.read_json('https://raw.githubusercontent.com/diffbot/knowledge-net/master/dataset/train.json')

# Extract the entities and relations from the DataFrame.
entities = df['entities'].values
relations = df['relations'].values

# Create a training and testing set.
X_train, X_test, y_train, y_test = train_test_split(entities, relations, test_size=0.2)

# Create a TfidfVectorizer.
vectorizer = TfidfVectorizer()

# Transform the training and testing sets.
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Create a LogisticRegression classifier.
clf = LogisticRegression()

# Train the classifier.
clf.fit(X_train_tfidf, y_train)

# Evaluate the classifier on the testing set.
y_pred = clf.predict(X_test_tfidf)

# Print the accuracy of the classifier.
print(clf.score(X_test_tfidf, y_test))


ValueError: ignored

# 3.Create a Knowledge Graph that can store the information contained in these sentences. You can use any open-source graph database for this purpose. 

Here are the steps involved in creating a Knowledge Graph that can store the information contained in the sentences:

Choose a graph database. There are a variety of open-source graph databases available, such as Neo4j, ArangoDB, and OrientDB. Choose a graph database that meets your needs and requirements.
Create a schema for the Knowledge Graph. The schema defines the structure of the Knowledge Graph. It specifies the types of nodes and edges that are allowed in the Knowledge Graph.
Load the data into the Knowledge Graph. The data can be loaded into the Knowledge Graph using a variety of methods, such as Cypher queries, LOAD CSV statements, and Bulk API calls.
Query the Knowledge Graph. The Knowledge Graph can be queried using a variety of methods, such as Cypher queries, SPARQL queries, and GraphQL queries.
Visualize the Knowledge Graph. The Knowledge Graph can be visualized using a variety of tools, such as Neo4j Bloom, ArangoDB Graph Studio, and OrientDB Graph Inspector.
Here is an example of how to create a Knowledge Graph using Neo4j:

In [None]:
import py2neo

# Create a connection to the Neo4j database.
graph = py2neo.Graph()

# Create a schema for the Knowledge Graph.
graph.schema.create_node_type('Person')
graph.schema.create_node_type('Organization')
graph.schema.create_relationship_type('BornIn', ['Person', 'Location'])
graph.schema.create_relationship_type('ResidesIn', ['Person', 'Location'])
graph.schema.create_relationship_type('WorksFor', ['Person', 'Organization'])
graph.schema.create_relationship_type('StudiedAt', ['Person', 'Organization'])

# Load the data into the Knowledge Graph.
with open('train.json', 'r') as f:
    for sentence in json.load(f):
        for relation in sentence['relations']:
            if relation == 'DATE_OF_BIRTH':
                graph.create(Person(name=sentence['subject'], birth_date=sentence['object']))
            elif relation == 'RESIDENCE':
                graph.create(Person(name=sentence['subject'], residence=sentence['object']))
            elif relation == 'BIRTHPLACE':
                graph.create(Person(name=sentence['subject'], birthplace=sentence['object']))
            elif relation == 'NATIONALITY':
                graph.create(Person(name=sentence['subject'], nationality=sentence['object']))
            elif relation == 'EMPLOYEE_OF':
                graph.create(Person(name=sentence['subject'], employer=sentence['object']))
            elif relation == 'EDUCATED_AT':
                graph.create(Person(name=sentence['subject'], university=sentence['object']))

# Query the Knowledge Graph.
# Find all people who were born in the United States.
query = '''
MATCH (p:Person)
WHERE p.birth_date > '1776-07-04'
RETURN p.name
'''
results = graph.cypher.execute(query)
for result in results:
    print(result['p.name'])

# Visualize the Knowledge Graph.
# Use Neo4j Bloom to visualize the Knowledge Graph.
from neo4j_bloom import Graph

graph = Graph('bolt://localhost:7687', 'neo4j', 'password')

# Create a visualization of the Knowledge Graph.
graph.create_visualization('my_graph')

# Open the visualization in a web browser.
graph.open_visualization('my_graph')


Sure, I can do that. Here are the steps involved in connecting a Knowledge Graph to a front end that can take in Natural Language Queries and give the answers back:

**Choose a chatbot framework. **There are a variety of open-source chatbot frameworks available, such as Rasa, Dialogflow, and Amazon Lex. Choose a chatbot framework that meets your needs and requirements.

**Create a chatbot model.** The chatbot model is a machine learning model that can understand and respond to Natural Language Queries. The model can be trained on a variety of data, such as text corpora, dialogue transcripts, and question-answering datasets.

**Connect the chatbot model to the Knowledge Graph.** The chatbot model can be connected to the Knowledge Graph using a variety of methods, such as REST APIs, GraphQL APIs, and WebSockets.

**Deploy the chatbot.** The chatbot can be deployed to a variety of platforms, such as a web server, a mobile app, or a voice assistant.

Here is an example of how to connect a Knowledge Graph to a chatbot using Rasa:

**1. Create a Rasa project.** The Rasa project is a collection of files that define the chatbot model and the chatbot configuration.

**2. Add the Knowledge Graph to the Rasa project.** The Knowledge Graph can be added to the Rasa project by adding the following lines to the domain.yml file:


Code snippet :

```
# This is formatted as code
knowledge_graph:
  url: "bolt://localhost:7687"
  username: "neo4j"
  password: "password"
```

**Train the chatbot model.** The chatbot model can be trained on a dataset of Natural Language Queries and their corresponding answers.

**Deploy the chatbot.** The chatbot can be deployed to a variety of platforms, such as a web server, a mobile app, or a voice assistant.

Once the chatbot is deployed, it can be used to take in Natural Language Queries and give the answers back. The chatbot can also be used to continue a conversation rather than only Question-Answering.

In [None]:
import rasa
import py2neo

# Create a Rasa project.
project = rasa.core.projects.create('my_project')

# Add the Knowledge Graph to the Rasa project.
knowledge_graph = py2neo.Graph('bolt://localhost:7687', 'neo4j', 'password')
project.domain.add_knowledge_graph(knowledge_graph)

# Train the chatbot model.
project.train()

# Deploy the chatbot.
project.deploy()

# Use the chatbot.
# Create a conversation.
conversation = rasa.core.agent.get_conversation(project, 'default')

# Start the conversation.
conversation.start()

# Send a message to the chatbot.
message = 'What is the capital of France?'

conversation.send(message)

# Get the chatbot's response.
response = conversation.receive()

# Print the chatbot's response.
print(response.text)

# End the conversation.
conversation.end()
