Skip to content

azotdata/azot-event-extractor

Repository files navigation

AZOT

Motivation

AZOT is the fruit of mind connection of W3A's foundators, who want to do something innovating that consequently improving everyone's everyday life. The main purpose is to simplify accessibility of datas, and express their values by analyzing them.

Synopsis

This tool classifies automatically online newspaper articles to extract events from them, which will be organized in chronogical order. There are steps to crete these events:

  • 1 - collect in real time articles
  • 2 - classify them according to their main topic
  • 3 - Shape the events: name the cluster, detect date and location

Installation

Prerequisites: (These installations are for UBUNTU)

  • First need to install newspaper, documentation is here

  • nltk and Corpora

  • For the database, there are 2 possible choices:
    - mongodb, with the plugin mongoengine and Robomongo for viewing the datas.
    - couchdb. It's mandatory to create the 4 views defined in the file couchdb.views.json in the couchdb database. Creating view in Futon for couchdb is explained here

  • Need these three packages to be installed for ssl issues with python 2.7 while exploring the sites:
    - pyOpenSSL
    - ndg-httpsclient
    - pyasn1

Running the code

  • Configure the system in config.ini:
    The Database server : choose between couchdb or mongodb (uncomment the unused one)
    The Database name : "azotdb" (Or whatever you want)
    The path of stopwords files : by default [data] (data)
    The language of the website source to be explored : set to "fr" by default
    The path of log directory

  • Run the script for collecting datas in a neawspaper site with the following command: (Requires the source news (example: https://www.clicanoo.re) as parameter)

    $ python collect_newspaper_article.py https://www.clicanoo.re

  • To automatically generate the events'cluster, run the script for classification as follow:

    $ python clustering_articles.py

About

Azot est une plateforme WEB qui va permettre une meilleure accessibilité aux données massives. Travaillons ensemble sur le projet! Documentation:

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published