Our manager at a consultancy agency has been overrun by an influx of projects that involve large volumes of financial news. He hands you piles and piles of documents and asks you to read each one and provide a summary of:
Who is implicated in this document? and what are their relationships?
How hard could that be? Not that hard, no ? True. If you know, natural language processing that is. You realize that what your manager really wants, is a knowledge graph.
- Be able to preprocess data obtained from textual sources
- Be able to employ named entity recognition and relationship extraction using SpaCy
- Be able to visualize results
- Be able to present insights and findings to client
- Be able to store data using the graph database Neo4j
- Be able to write clean and documented code.
-
README.md -> Explains the project and gives a report
-
requirements.txt -> Shows information on what libraries and python version to install/use.
-
main.py -> Runs the streamlit application
-
text_preprocessing -> Contains the code for the proprecessing stage of the project
-
entity_relation -> Contains the code for entity/relation extraction
-
graph -> Contains the code for the graph creation
-
Clone this repository into your local environment with below command-
git clone https://github.com/ujjwalk00/Entity_Recognition.git
-
Create python virtual environment
-
Install all the required library with below command
pip install -r requirements.txt
To run application with streamlit run main.py with below command.
streamlit run main.py
Application withh open in browser automatically or you can also find application url in terminal like below
You can open url in browser and your application will load.
This knowledge graph shows us how the entities are related to eachother based on a single article:
This knowledge graph shows us how the entities from different articles are related to a certain category:
Design and construction phase of the project was made by 3 collaborators.(Ujjwal Kandel, Reena Koshta, and Maryam El B)
January 2022
- Duration:
2 weeks
- Deadline:
20/01/2022 4:30 PM