No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
churnalism.py
churnalism.pyc

README.md

Python wrapper for churnalism.sunlightfoundation.com

PLEASE TAKE NOTE:

Sunlight is a very smart, technically-saavy organization and designed this site to NOT be accessed as an API - most likely because of the computational requirements needed to provide such a service. Make sure to use this script lightly, as in no multi-threading and no serial requests without a brief nap in between.

What does this do?

This python function will submit urls or text to churnalism.sunlightfoundation.com, determine whether they have been "churned" and return the url(s) of the original content, the number of characters the documents share, and a link to compare the documents side-by-side on Sunlight Foundation's website.

Requirements

This function is built on top of three python libraries: selenium,pyvirtualdisplay, and BeatifulSoup. If you have pip installed on your computer, you can install all three by typing the following into your terminal:

pip install selenium pyvirtualdisplay BeautifulSoup

Then download this repo somewhere in your PYTHONPATH.

Usage:

with a url:

from churnalism import churnalism
churnalism(url="http://www.cbsnews.com/8301-204_162-57526084/moms-bpa-levels-linked-to-sons-thyroid-problems/")

with a text blob:

from churnalism import churnalism
article_text = "Researchers found that every doubling of BPA levels in pregnant moms was tied to a decrease of 0.13 micrograms per deciliter of total thyroxine (T4), meaning their thyroids were less active. Boys whose mother's had doubled their BPA had a 9.9 percent decrease in thyroid stimulating hormone (TSH), meaning their thyroid was overactive."
churnalism(text=article_text)

in the command line (just with urls):

python churnalism.py "http://www.cbsnews.com/8301-204_162-57526084/moms-bpa-levels-linked-to-sons-thyroid-problems/" > output.json

Returns:

compare_url: A unique url on sunlight's page to view this churn request
input_url: the inputted url (only returned if a url is submitted)
input_text: the inputted text (only returned if text is submitted)
matched: True or False - Whether or not churnalism found a match
matched_chars: Number of matched characters (only returned if "matched" is True)
matched_urls: A list of matched urls (only returned when "matched" is True)