Skip to content

alg0s/enews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enews

Enews is a software that processes news articles to extract entities and connect them in order to provide insights into certain individuals, organizations or timeline analysis.

Features & Requirements

  • Ability to extract entities from an article
  • An event based system with tasks:
    • Extract entities
    • Save unique entities
    • Connect articles and entities
  • Support scheduled extraction jobs
  • Flexible in input and output definition:
    • Input: articles, and users can define designed columns of content
    • Ouput: allowed pre-existing database otherwise generate tables in default database
  • Have a metadata system to record all activities, i.e log system
  • Packaged like Django, with editable settings
  • Support both Sqlite3 and PostgreSQL
  • Have interface agreement as part of the setting
  • Extensible for future modules with different languages such as English, France.
  • Written primarily in Go but the schema could be ported in Python or other languages
  • Data schemas are language agnostic, meaning that it should be stored in YAML or XML
  • Using ORM to communicate with databases
  • Using queue with multiple workers structure

Development

Enews is developed as modules that play different roles:

  1. Pipeline

    Pipeline takes in articles and processes them in batch. It follows ETL patterns. The outcome of pipelines is entities are extracted from articles and stored in corresponding table in database.

  2. Graphs

    Graphs takes information generated by the pipeline and constructs graphs that connect the dots: entities, articles and timeline

  3. Applications

    Applications take graphs, process and provide designed outputs that provide insights into individuals, organizations or timelines

  4. Metadata

    Metadata stores enews' operation data. Every event that occurs within the system is stored into Metadata's tables in the database for the purpose of monitoring and quality assurance. Metadata is an independent program within enews.

  5. NLP

    In order to process articles, enews leverages open-source NLP libraries to help label and extract entities from articles in different languages. For instance, VnCoreNLP for Vietnamese. Each service is a server that is initiated and coordinated by enews' main program.

Customers

  1. Business Intelligence
  2. Security Intelligence
  3. Investment Institutions
  4. Internet Consumers

Installation

go mod init

Dependencies

  1. Java
  2. sqlc

Usage

  1. Install enews

  2. Configure settings

    • Input database and source table
    • Output database
  3. Run Enews will automatically run the following procedure:

    • Verify connection with Input database + verify input table
    • Verify connection with Output database + verify output tables
    • Establish TaskQueue
    • Controller will start picking up articles from the input table and drop into TaskQueue
    • Executor will pick up tasks from TaskQueue and distributes to available workers to process in parallel. The number of works could be set in Settings
  4. Audit At the end of the job, Enews will audit its work by running Auditor. Auditor will go through the log table as well as output tables to double check the results. Audit outcome will be saved in table audit_instances

Scheduled Job

Enews provides a scheduling feature to automatically run extraction jobs on an hourly or daily basis.

Automatic Retries

TBA

Configuring Logging

TBA

Test

References

Releases

No releases published

Packages

No packages published