TL;DR

An end-to-end event extraction and summarization system.

The project entitles "TL;DR" describes a large-scale automated system for extracting violent incidents relating to protests/riots and violence against civilians. A comprehensive architecture is outlined that can identify, categorize, summarize and perform entity slot filling against the target event types. Furthermore, an attempt is made to relate the recorded events to politics and elections taking place in the countries India, Indonesia, and Thailand. This allows a better understanding of the driving factors for these events.

Getting Started

Step 1: Crawling news articles

News Please -

Install News Please.
Use config.cfg_mass, sitelist.hjson_mass to collect data for building classifier Model.
Use config_lib.cfg_live, sitelist_COUNTRYNAME.hjson (replace COUNTRYNAME with correcponding country) to crawl and clean Live news articles

Step 2: Classification of the articles:

The following scripts are run in the same order as below:

Classifier_Builder.ipynb - A classifier that classifies events described in the news articles.
Doc2Vec_Classifier.ipynb - Using Doc2Vec we built the classifier.
TextRank.ipynb - Performed TextRank and Coref Resolution on the text to extract keywords.

Step 3: NER tagging and Summarization System-

sample_input.csv = Input file for the system. ./output = Location where the output files are stored.

topic_modelling.ipynb - Topic Modelling for extracting topics from the news article. Input - sample_input.csv Output - ./output/topic_modelling.csv
extractor.ipynb - An extractor to extract named entities from the news article. Input - sample_input.csv Output - ./output/final_data_with_lat_and_long.csv
evaluation.ipynb - The evaluation of our system. Input - sample_input.csv - ./output/extracted_data.csv *Output for this script - ./output/ACLED_rouge_scores.txt - (Printing the scores for each category in the console).

Step 4: User Interface Setup -

Install Elastic search
Add Index CSE_635 as described in ElasticSearch_Index.txt to Elastic search.
Import Kibana dashboard and Visualizations from Kibana Admin console from KIbana_VIsualizations.json
Run Elasticsearch_Indexer.ipynb - Jupyter notebook for Indexing articles into Elastic search Input for this script - ./output/final_data_with_lat_and_long.csv

Example:

Summary of a news article:

Patna Bihar India, Apr 4 ANI Congress workers created ruckus at the party office here on Thursday in protest against the denial of ticket to former party MP Nikhil Kumar from Aurangabad parliamentary constituency. The workers also shouted the slogan 'Nikhil Kumar Zinadabad.' Kumar was also present in the office when the ruckus took place. Kumar had successfully contested from the seat in 2004 when the Congress fought in alliance with the RJD and the LJP. Kumar, a former Delhi Police Commissioner, unsuccessfully contested from Aurangabad in 2014 against BJP s Sushil Kumar Singh.

Date: 4/4/2019 ranked_list_PERSON: [{'Kumar': 1.0}, {'Patna Bihar': 0.625}, {'Nikhil': 1.0}] ranked_list_ORG: [{'Congress': 1.0},{'Kumar': 1.0}, {'ANI': 1.0}] Location: ['India', 'Aurangabad', 'Patna', 'Bihar', 'India', 'Lok', 'Sabha'] From the above example, we can see that for the ‘PERSON’ entity, terms like ‘Nikhil’ and ‘Kumar’ have been given higher weight than ‘Patna Bihar’. The date tagged by our system is accurate because ‘April 4th’ in the text and the ‘Thursday’ correspond to the same day. Similarly, for the ‘ORG’, the term ‘Congress’ has a higher weight

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Final_files_elasticsearch		Final_files_elasticsearch
Keywords		Keywords
NewsPlease_config		NewsPlease_config
UI Configuration		UI Configuration
Windows_Cleaned_json		Windows_Cleaned_json
json_voilence		json_voilence
mallet-2.0.8		mallet-2.0.8
output		output
stanford-ner		stanford-ner
Classifier_Builder.ipynb		Classifier_Builder.ipynb
Doc2Vec_Classifier.ipynb		Doc2Vec_Classifier.ipynb
Elasticsearch_Indexer.ipynb		Elasticsearch_Indexer.ipynb
Live Data Classifier.ipynb		Live Data Classifier.ipynb
README.md		README.md
Semantic Labeling with TextRank.ipynb		Semantic Labeling with TextRank.ipynb
TLDR_ACM_REPORT.pdf		TLDR_ACM_REPORT.pdf
TextRank.ipynb		TextRank.ipynb
evaluation.ipynb		evaluation.ipynb
extractor.ipynb		extractor.ipynb
sample_input.csv		sample_input.csv
topic_modelling.ipynb		topic_modelling.ipynb

akshayakp97/TL-DR

Folders and files

Latest commit

History

Repository files navigation

TL;DR

An end-to-end event extraction and summarization system.

Getting Started

Step 1: Crawling news articles

Step 2: Classification of the articles:

Step 3: NER tagging and Summarization System-

Step 4: User Interface Setup -

Example:

Summary of a news article:

About

Topics

Resources

Stars

Watchers

Forks

Languages