Skip to content

newslynx/lauteur

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

travis-img lauteur

Tools for ascribing authorship - to the chagrin of Barthes

NOTE: All functionality of this library has been improved upon and ported to Newslynx V2, specifically this module.

Install

pip install lauteur

Test

Requires nose

nosetests

Usage

lauteur has two methods: from_string and from_html

from_string is mostly used in RSS Feed parsing where authors are sometimes embedded as bylines:

import lauteur

string = 'By: Brian Abelson ,and Michael H. Keller & Dr. Stijn Debrouwere IV'
authors = lauteur.from_string(string)
print authors

# ['Brian Abelson', 'Michael H Keller', 'DR Stijn Debrouwere IV']

from_html searches through common meta tags for authors.

import lauteur
import requests

r = requests.get('http://www.nytimes.com/2013/12/20/books/michiko-kakutanis-10-favorite-books-of-2013.html')

authors = lauteur.from_html(r.content)
print authors
# ['Michiko Kakutani']

Todo:

  • Find more meta tags.
  • Improve name formatting algorithm.

About

Tools for ascribing authorship - to the chagrin of Barthes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published