Scrape news articles, output extracted article text and normalized/processed ngrams
Java Groovy
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Build Status

Scrape news articles, output extracted article text and normalized/processed ngrams


  • Supports any environment that runs Java 7, i.e. Mac OS X, Linux, Windows, etc.
  • ~300MB of free space (the dependencies are large).
  • Install phantomjs and put it into your system path. You should be able to run phantomjs from any directory; if not handytrowel will complain that it cannot find phantomjs.


# Clone the repository
git clone
cd handytrowel

# Build the executable artifacts
./gradlew installApp

# Run it, point it at a news article URL
build/install/handytrowel/bin/handytrowel \

# Output from the above command
# - Currently Selenium outputs annoying logs to stderr that I'm figuring
#   out how to suppress. If you tee the output to a file, or just
#   examine the stuff at the end, you'll see the JSON only.
  "tokens" : [ "polio", "erad", "prompt", "global", "warn", "insid", "photo", "continu", "main", "alarm", "spread", "polio", "fragil", "countri", "organ", "declar", "global", "emerg", "monday", "regul", "permit", "adopt", "NUMBER", "NUMBER-year", "campaign", "vaccin", "billion", "paralyz", "virus", "erad", "offici", "goal", "evapor", "swift", "action", "pakistan", "syria", "cameroon", "recent", "virus", "spread", "afghanistan", "iraq", "equatori", "guinea", "respect", "extraordinari", "measur", "organ", "track", "terribl", "happen", "gregori", "hartl", "w.h.o.", "spokesman", "re", "pakistani", "syrian", "cameroonian", "ve", "act", "declar", "effect", "impos", "travel", "restrict", "countri", "repres", "newli", "aggress", "stanc", "organ", "pressur", "member", "state", "demand", "consequ", "epidem", "rage", "insid", "border", "slip", "photo", "sakhina", "NUMBER-year-old", "kabul", "contract", "polio", "confirm", "capit", "NUMBER", "previous", "pakistan", "taxi", "driver", "travel", "frequent", "tribal", "area", "credit", "diego", "ibarra", "sanchez", "york", "fundament", "shift", "program", "dr.", "bruce", "aylward", "organ", "chief", "polio", "erad", "countri", "signal", "toler", "spread", "virus", "countri", "emerg", "declar", "total", "case", "year", "relat", "NUMBER", "april", "NUMBER", "compar", "NUMBER", "date", "year", "alarm", "expert", "mr.", "hartl", "virus", "normal", "transmiss", "season", "januari", "april", "case", "move", "central", "african", "republ", "south", "sudan", "ukrain", "rebecca", "martin", "director", "global", "immun", "center", "diseas", "control", "prevent", "provid", "expertis", "erad", "campaign", "NUMBER", "continu", "main", "fight", "virus", "normal", "includ", "round", "vaccin", "target", "unusu", "agenc", "resid", "pakistan", "syria", "cameroon", "age", "vaccin", "travel", "abroad", "restrict", "retain", "year", "export", "photo", "patient", "receiv", "therapi", "kabul", "credit", "diego", "ibarra", "sanchez", "york", "countri", "encourag", "would-b", "travel", "vaccin", "afghanistan", "equatori", "guinea", "ethiopia", "iraq", "israel", "nigeria", "somalia", "israel", "confirm", "case", "diseas", "pakistan", "strain", "virus", "detect", "sewag", "tel", "aviv", "elsewher", "w.h.o.", "enforc", "regul", "NUMBER", "global", "treati", "parti", "ensur", "recommend", "appli", "pakistan", "syria", "cameroon", "encourag", "countri", "document", "refus", "admit", "migrant", "visitor", "travel", "lack", "vaccin", "card", "polio", "poliomyleti", "high", "contagi", "virus", "spread", "fece", "NUMBER", "caus", "symptom", "hardest-hit", "victim", "paralyz", "kill", "carrier", "confirm", "outbreak", "cure", "photo", "continu", "main", "advertis", "unlik", "influenza", "winter", "virus", "polio", "thrive", "case", "start", "summer", "explod", "monsoon", "rain", "summer", "heat", "flood", "sewage-chok", "gutter", "bath", "romp", "virus", "pick", "touch", "ball", "finger", "diseas", "primarili", "strike", "evid", "mount", "cross", "border", "adult", "carrier", "trader", "smuggler", "migrant", "worker", "NUMBER", "year", "NUMBER", "infect", "pakistan", "riskiest", "dr.", "aylward", "polio", "elimin", "taliban", "faction", "forbidden", "vaccin", "north", "waziristan", "elsewher", "murder", "vaccin", "team", "syria", "confirm", "polio", "year", "NUMBER", "case", "octob", "NUMBER", "upris", "NUMBER", "syria", "NUMBER", "percent", "vaccin", "rate", "rapid", "war-torn", "area", "NUMBER,NUMBER", "area", "block", "govern", "danger", "reach", "unit", "nation", "fund", "gabon", "organ", "syrian", "case", "year", "pakistan", "strain", "egypt", "year", "israel", "larg", "bedouin", "desert", "elsewher", "syria", "unclear", "april", "syrian", "refuge", "camp", "iraq", "despit", "extens", "vaccin", "campaign", "camp", "lebanon", "jordan", "turkey", "elsewher", "fortun", "refuge", "camp", "mr.", "hartl", "syrian", "flee", "massacr", "bomb", "absurd", "produc", "vaccin", "card", "critic", "cameroon", "outbreak", "strain", "nigeria", "previous", "case", "year", "pakistan", "islam", "terrorist", "group", "nigeria", "kill", "vaccin", "nonetheless", "multipl", "vaccin", "round", "reduc", "problem", "cameroon", "equatori", "guinea", "african", "countri", "vulner", "routin", "immun", "rate", "equatori", "guinea", "NUMBER", "percent", "protect", "dr.", "martin", "photo", "anni", "gul", "contract", "polio", "march", "credit", "diego", "ibarra", "sanchez", "york", "unclear", "travel", "restrict", "economi", "affect", "countri", "pakistan", "vaccin", "booth", "highway", "enter", "afghanistan", "china", "iran", "pakistan", "minist", "saira", "afzal", "tarar", "offic", "recommend", "vaccin", "travel", "intern", "airport", "board", "w.h.o.", "call", "vaccin", "travel", "emerg", "express", "disappoint", "restrict", "due", "tribal", "region", "face", "extraordinari", "challeng", "NUMBER", "enorm", "progress", "toward", "elimin", "polio", "india", "million", "case", "monday", "emerg", "declar", "alert", "donor", "pressur", "affect", "countri", "organ", "vaccin", "drive", "mr.", "hartl", "recruit", "train", "hundr", "thousand", "vaccin", "send", "million", "dose", "vaccin", "usual", "pack", "ice", "foam", "plastic", "vaccin", "carri", "strap", "huge", "logist", "undertak", "vaccin", "villag", "citi", "approach", "passeng", "railway", "station", "buse", "car", "toll", "plaza", "traffic", "circl", "ideal", "vaccin", "month", "entail", "conflict", "local", "opposit", "struggl", "issu", "includ", "get", "vaccin", "job", "usual", "NUMBER", "NUMBER", "control", "gas", "minibus", "team", "villag", "report", "contribut", "ann", "barnard", "dan", "bilefski", "rick", "gladston", "salman", "masood", "version", "articl", "appear", "print", "NUMBER", "NUMBER", "page", "aNUMBER", "york", "edit", "headlin", "polio", "erad", "prompt", "global", "warn", "reprint", "today", "subscrib" ],
  "links" : [ "", "", "#story-continues-2", "", "#story-continues-4", " Donald G. McNeil Jr.", "", "" ],
  "extractedBody" : "Polio’s Return After Near Eradication Prompts a Global Health Warning\nInside\nPhoto\nContinue reading the main story\nAlarmed by the spread of polio to several fragile countries, the World Health Organization declared a global health emergency on Monday for only the second time since regulations permitting it to do so were adopted in 2007.\nJust two years ago — after a 25-year campaign that vaccinated billions of children — the paralyzing virus was near eradication; now health officials say that goal could evaporate if swift action is not taken.\nPakistan, Syria and Cameroon have recently allowed the virus to spread — to Afghanistan, Iraq and Equatorial Guinea, respectively — and should take extraordinary measures to stop it, the health organization said.\n“Things are going in the wrong direction and have to get back on track before something terrible happens,” said Gregory Hartl, a W.H.O. spokesman. “So we’re saying to the Pakistanis, the Syrians and the Cameroonians, ‘You’ve really got to get your acts together.’ ”\nThe declaration, which effectively imposes travel restrictions on the three countries, represented a newly aggressive stance by the health organization. In the past, it has often bent to pressure from member states demanding no consequences even as epidemics raged inside their borders and sometimes slipped over them.\nPhoto\nSakhina, a 3-year-old girl from Kabul, has contracted polio, the first confirmed case in the capital in 12 years. Her family previously lived in Pakistan and her father is a taxi driver who travels frequently to the tribal areas. Credit Diego Ibarra Sanchez for The New York Times\n“This is a fundamental shift in the program,” said Dr. Bruce Aylward, the organization’s chief of polio eradication. “This is the countries of the world signaling that they will no longer tolerate the spread of the virus from the countries that aren’t finished.”\nThe emergency was declared though the total number of known cases this year is still relatively small: 68 as of April 30, compared with 24 by that date last year.\nWhat most alarmed experts, Mr. Hartl said, was that the virus was on the move during what is normally the low transmission season from January to April.\n“What we don’t want is cases moving into places like the Central African Republic, South Sudan or the Ukraine,” said Rebecca M. Martin, director of global immunization for the Centers for Disease Control and Prevention , which has provided money and expertise to the eradication campaign since it began in 1988.\nContinue reading the main story\nFighting the virus normally includes several rounds of vaccination of all young children in a target country. But, in an unusual step, the agency also said that all residents of Pakistan, Syria and Cameroon, of all ages, should be vaccinated before traveling abroad, and that this restriction should be retained until one year after the last “exported case.”\nPhoto\nA patient receiving therapy in Kabul. Credit Diego Ibarra Sanchez for The New York Times\nIt also said another seven countries should “encourage” all their would-be travelers to get vaccinated. Those are Afghanistan, Equatorial Guinea, Ethiopia, Iraq, Israel, Nigeria and Somalia.\nIsrael has had no confirmed human cases of the disease, but a Pakistan strain of the virus has been detected in sewage in Tel Aviv and elsewhere.\nWhile the W.H.O. has no enforcement power, the regulations are part of a 2007 global health treaty saying all parties “should ensure” that steps it recommends are taken. That applies to Pakistan, Syria and Cameroon. The other seven only need to “encourage” those steps.\nBut countries could use the document to refuse to admit migrants, visitors or even business travelers who lack vaccination cards.\nPolio, short for poliomyletis, is a highly contagious virus spread in feces; although only one case in 200 causes symptoms, the hardest-hit victims can be paralyzed or killed. With so many silent carriers, even one confirmed case is considered a serious outbreak. There is no cure.\nPhoto\nContinue reading the main story\nAdvertisement\nUnlike influenza or other winter viruses, polio thrives in hot weather. Cases start rising in the summer and often explode when the monsoon rains break the summer heat, flooding sewage-choked gutters and bathing the feet of romping children with virus, which they pick up by touching their feet or a ball and then putting a finger in a mouth.\nThough the disease primarily strikes children, evidence has mounted that it also crosses borders in adult carriers, such as traders, smugglers and migrant workers.\nWith 54 of this year’s 68 new infections, Pakistan is by far the riskiest country, Dr. Aylward said. Polio has never been eliminated there, Taliban factions have forbidden vaccinations in North Waziristan for years, and those elsewhere have murdered vaccine teams.\nSyria has had only one confirmed case of polio this year, but it had 13 cases last October, the first in the country since 1999.\nBefore the uprising began in 2011, Syria had a 90 percent vaccination rate, but it fell rapidly in war-torn areas. About 300,000 children are in areas blocked off by the government or too dangerous to reach, according to the United Nations Children’s Fund .\nGABON\nWorld Health Organization\nThe Syrian cases from last year were of the Pakistan strain, which was found in Egypt last year, then moved into Israel, first in a largely Bedouin desert town, then elsewhere. How it reached Syria is unclear, but in April it was found in a Syrian refugee camp in Iraq, despite extensive vaccination campaigns in camps in Lebanon, Jordan, Turkey and elsewhere.\n“Fortunately, it’s pretty easy to do in refugee camps,” Mr. Hartl said.\nWith Syrians fleeing massacres and bombings, it seems absurd to make them stop and produce vaccination cards, critics said.\nCameroon’s outbreak is of a strain from Nigeria, which previously had more cases than any country in the world but which has had only two so far this year. As in Pakistan, Islamic terrorist groups in Nigeria have killed vaccinators. Nonetheless, multiple vaccination rounds have reduced the problem.\nCameroon, Equatorial Guinea and other African countries are all vulnerable because their routine immunization rates are so low; in Equatorial Guinea, only 26 percent of all children are protected, Dr. Martin said\nPhoto\nAnnis Gul contracted polio in March. Credit Diego Ibarra Sanchez for The New York Times\nIt is unclear whether the new travel restrictions will hurt the economies of the affected countries. Pakistan already has vaccination booths where its highways enter Afghanistan, China and Iran.\nPakistan’s health minister, Saira Afzal Tarar, said her office had recommended vaccinating travelers at the country’s five international airports before they board. (The W.H.O. calls for vaccination at least four weeks before traveling, except in emergencies.)\nShe expressed her disappointment at the restrictions, saying, “We have been doing whatever we can, but due to the law and order situation in our country, especially in the two tribal regions, we are facing extraordinary challenges.”\nUntil 2012, the world was making enormous progress toward eliminating polio. India, which once had millions of cases, had its last three years ago. Monday’s emergency was declared both to alert donors and to pressure the affected countries to organize vaccination drives, Mr. Hartl said.\nThat means recruiting and training hundreds of thousands of vaccinators, and sending them into the field with millions of doses of vaccine, which must be kept cold, usually by packing them on ice in a foam plastic box each vaccinator carries on a shoulder strap.\nIt is a huge logistical undertaking. Vaccinators go door to door in villages and cities, approach passengers at railway stations and on buses, and walk up to cars at toll plazas and in traffic circles. The ideal is to vaccinate every child in the country several times, with a month or so between each round.\nIt also entails many conflicts. Even when there is no local opposition, there are struggles over issues including who gets the vaccinator jobs, which usually pay $2 to $5 a day, and who controls the gas money for minibuses taking teams to villages.\nReporting was contributed by Anne Barnard, Dan Bilefsky, Rick Gladstone and Salman Masood.\nA version of this article appears in print on May 6, 2014, on page A8 of the New York edition with the headline: Polio’s Return After Near Eradication Prompts a Global Health Warning. Order Reprints | Today's Paper | Subscribe\n"

## License

handytrowel is licensed under the Affero General Public License v3.0. Please see the LICENSE file for more details.