AZOT is the fruit of mind connection of W3A's foundators, who want to do something innovating that consequently improving everyone's everyday life. The main purpose is to simplify accessibility of datas, and express their values by analyzing them.
This tool classifies automatically online newspaper articles to extract events from them, which will be organized in chronogical order. There are steps to crete these events:
- 1 - collect in real time articles
- 2 - classify them according to their main topic
- 3 - Shape the events: name the cluster, detect date and location
-
For the database, there are 2 possible choices:
- mongodb, with the plugin mongoengine and Robomongo for viewing the datas.
- couchdb. It's mandatory to create the 4 views defined in the file couchdb.views.json in the couchdb database. Creating view in Futon for couchdb is explained here -
Need these three packages to be installed for ssl issues with python 2.7 while exploring the sites:
- pyOpenSSL
- ndg-httpsclient
- pyasn1
-
Configure the system in config.ini:
The Database server : choose between couchdb or mongodb (uncomment the unused one)
The Database name : "azotdb" (Or whatever you want)
The path of stopwords files : by default [data] (data)
The language of the website source to be explored : set to "fr" by default
The path of log directory -
Run the script for collecting datas in a neawspaper site with the following command: (Requires the source news (example: https://www.clicanoo.re) as parameter)
$ python collect_newspaper_article.py https://www.clicanoo.re
-
To automatically generate the events'cluster, run the script for classification as follow:
$ python clustering_articles.py