# NLP and knowledge graphs

**Knowledge graphs** allow to represent information in a structured manner, capturing relationships between entities. On the other side **NLP** focuses on understanding and processing human language. So, what about combining them to create an efficient way for extracting and analysing information?


## Your challenge
You have been tasked with creating a knowledge graph that stores and infer information from a large document.

#### Requirements
In your group, you should:
- read and process the text in the provided document;
- extract entities and relationships present within it;
- store this information in a knowledge graph;
- and visualize the graph. 

You can utilise the knowledge and skills acquired from other modules to accomplish this task effectively. 

_Your graph will be huge and very complex, so you might want to plot only a 'piece of knowledge' maybe just related to a specific node or a relationship_.

#### Data
The dataset, named `ai_wiki_page.txt`, is a text file containing the textual content extracted from the [Wikipedia page dedicated to Artificial Intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence). Your task is to transform the information contained in such document into a knowledge graph.

The directed graph to construct is made of triplets: **(head, relation, tail)**. In the case of text mining, these tripets are usually of the type **(subject, verb, object)**. For example, 'Paris is the capital of France' can be translated into (Paris, is_capital, France), where Paris and France are _nodes_ and is_capital is the connecting _edge_.

#### Libraries
You can use whatever library will help you to achieve your task. **Pandas** (for general data mining), **spaCy** (for NLP related work) and **networkX** (for graph visualisation) are recommended. 

You are now ready to develop your solution. Below there are some hints that you can use at any point if you get stuck or want to check your process. 

**Good luck!**

### Hints

#### Hint 1A - text processing
<details>
  <summary>Click here to show the hint</summary>
  
  To create your knowledge graph you will need to extract sentences from the text document.
</details>

#### Hint 1B - text processing
<details>
  <summary>Click here to show the hint</summary>
  
  Your document contains lots of sentences. Try to reduce them considering only the sentences with exactly 1 subject and 1 object.    
</details>

#### Hint 2 - entities extraction
<details>
  <summary>Click here to show the hint</summary>
  
  The main idea is to extract the subject and the object from each sentence. However an entity (subject/object) can span across multiple words, e.g., 'artificial intelligence' . You can easily extract a single entity using parts of speech (POS) tags, but these are not sufficient when an entity is made of multiple words. In the case of 'Aritificial Intelligence' only 'intelligence' would count as noun. Hence, it might be helpful creating a function that uses both POS and dependency tags (DEP) to find multiple words entities. 
</details>

#### Hint 3 - relation extraction
<details>
  <summary>Click here to show the hint</summary>
  
  To extract relations/verbs, you can use spaCy’s rule-based matching: https://spacy.io/usage/rule-based-matching/
</details>

#### Hint 4 - KG
<details>
  <summary>Click here to show the hint</summary>
  
  Once you have your subjects/source, objetcs/targets, verbs/relationships triplets, you can construct your direct graph with networkX. Try to store your triplets into a dataframe and construct your graph using `nx.from_pandas_edgelist`. 
</details>

## Your solution

In [1]:
# Write your code here