PANFeed is a service which creates personalised news feeds by using keyword identification in posts combined with a hotness metric to give the user news which is current and relevent to their listed interests. The keywords are generated by using TFIDF to identify the commonality of words in the corpus and in each blog post. Hotness is modifier based on how old the post is.
PANFeed has a simple web interface for submitting feeds and also ships with a web spider for spidering domains in search of RSS and Atom feeds to process. There is also a bin script which should be run frequently to pull more news in from existing feeds.
For instructions on how to install see the INSTALL file
For changes and version history see CHANGELOG
PANFeed is made available under the GNU GPL and is copyright of the University of Southampton. For more information see the LICENCE file.
- django based website for creating customized feeds and submitting feeds to be crawled
- python based web crawler for crawling a domain to find all its RSS and Atom feeds
- python script to crawl the captured feeds and index them into the corpus
- generates custom feeds suitable for blog readers and personal magazines
- create a statistics dashboard for each feed
- create a statistics dashboard for each domain
- create a statistics dashboard about most popular keywords used to generate feeds
- add a way to request domain spidering through the web interface
- add an image to ever single post to make it more exciting in flipboard and pulse
- allow users to create domain specific feeds