Skip to content
Search engine news crawler for the whowrotethis project.
Python
Branch: master
Clone or download
Latest commit 1fcf819 May 22, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
rss_examples Inital draft. May 21, 2019
LICENSE.md
README.md Update README.md May 22, 2019
create_table.sql Inital draft. May 21, 2019
model.py Inital draft. May 21, 2019
news_crawler.py Inital draft. May 21, 2019
persist.py Inital draft. May 21, 2019
persist_test.py Inital draft. May 21, 2019
requirements.txt Inital draft. May 21, 2019
sources.py Inital draft. May 21, 2019
sources_test.py Inital draft. May 21, 2019
strategies.py Inital draft. May 21, 2019
strategies_test.py Inital draft. May 21, 2019
template_method.py Inital draft. May 21, 2019
util.py Inital draft. May 21, 2019

README.md

Who Wrote This News Crawler

Web crawler for news articles from a subset of sources to power an open source news search engine.


Purpose

Used in "Machine Learning Techniques for Detecting Identifying Linguistic Patterns in the News Media" by A Samuel Pottinger, web crawler parses RSS feeds from a list of news agencies, saving the articles found to a SQLite database.


Environment Setup

This requires Python 3 and pip to be installed for your platform. If available, run $ pip install -r requirements.txt.


Usage

These set of scripts are executable from the command line with $ python news_crawler.py. It will write to articles.db as a sqlite database in the same directory and expects the table to have been created using create_table.sql.


Testing

Some automated tests are available and can be run with $ nosetests.


Development Standards

Please unit test and follow the Google Python Style Guide where possible.


Related Projects

Note that this is in a series of related projects as linked:


Open Source

This application's source is released under the MIT License. The following open source libraries are used internally:

You can’t perform that action at this time.