Cleaning Corpus Text: Endangered Species Act

In [1]:
# Load libraries
library(readr)
library(dplyr)
library(tidytext)
library(tm)
library(textmineR)
library(SnowballC)
library(textstem)
library(textclean)
library(stringr)
library(qdapDictionaries)
library(lexicon)
library(here)
library(text2vec)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: NLP

Loading required package: Matrix


Attaching package: ‘textmineR’


The following object is masked from ‘package:Matrix’:

    update


The following object is masked from ‘package:stats’:

    update


Loading required package: koRpus.lang.en

Loading required package: koRpus

Loading required package: sylly

For information on available language packages for 'koRpus', run

  available.koRpus.lang()

and see ?install.koRpus.lang()



Attaching package: ‘koRpus’


The following object is masked from ‘package:tm’:

    readTagged


The following object is masked from ‘package:readr’:

    tokenize


here() starts at /home/ec2-user/SageMaker/esa_analysis



In [2]:
i_am("here.txt")
here()

here() starts at /home/ec2-user/SageMaker/esa_analysis



In [4]:
# Read in the original CSV file (replace blank.csv with csv name)
original_corpus <- read.csv(
    here("output_file", "esa_10_22_2025.csv"))

str(original_corpus)

'data.frame':	5241 obs. of  5 variables:
 $ X    : int  0 1 2 3 4 5 6 7 8 9 ...
 $ GOID : num  4.22e+08 4.09e+08 4.29e+08 4.19e+08 2.07e+09 ...
 $ Title: chr  "Developers Urge Delay of Gnatcatcher Ruling; Birds: Species isn't in danger, they claim. Federal decision this "| __truncated__ "TODAY IN CONGRESS" "Plan Offered to Aid Northwest Salmon and Trout" "SAVING WILDLIFE" ...
 $ Date : chr  "2000-09-26" "1999-07-20" "1994-03-27" "1999-07-05" ...
 $ Text : chr  "Armed with a new study, developers are demanding that federal officials postpone a decision due later this week"| __truncated__ "SENATE Meets at 9:30 a.m. Committees: Armed Services -- 9:30 a.m. U.S. policy & military operations regarding K"| __truncated__ "AP The Clinton Administration has proposed protection zones along rivers and streams to help threatened fish sp"| __truncated__ "In the Tribune's June 18 news item on the recovery and proposed removal of the bald eagle from the endangered s"| __truncated__ ...


In [5]:
# Remove rows where Text is empty or contains only whitespace using dplyr
original_corpus <- original_corpus %>%
  filter(nchar(trimws(Text)) > 0)

str(original_corpus)

'data.frame':	5239 obs. of  5 variables:
 $ X    : int  0 1 2 3 4 5 6 7 8 9 ...
 $ GOID : num  4.22e+08 4.09e+08 4.29e+08 4.19e+08 2.07e+09 ...
 $ Title: chr  "Developers Urge Delay of Gnatcatcher Ruling; Birds: Species isn't in danger, they claim. Federal decision this "| __truncated__ "TODAY IN CONGRESS" "Plan Offered to Aid Northwest Salmon and Trout" "SAVING WILDLIFE" ...
 $ Date : chr  "2000-09-26" "1999-07-20" "1994-03-27" "1999-07-05" ...
 $ Text : chr  "Armed with a new study, developers are demanding that federal officials postpone a decision due later this week"| __truncated__ "SENATE Meets at 9:30 a.m. Committees: Armed Services -- 9:30 a.m. U.S. policy & military operations regarding K"| __truncated__ "AP The Clinton Administration has proposed protection zones along rivers and streams to help threatened fish sp"| __truncated__ "In the Tribune's June 18 news item on the recovery and proposed removal of the bald eagle from the endangered s"| __truncated__ ...


In [6]:
# Save the corpus as a csv file
write.csv(
    original_corpus,
    here("data", "original", "original_corpus_filtered_empty_docs.csv")
)

In [7]:
# Function for preserving abbreviations
preserve_abbreviations <- function(text) {
  matches <- gregexpr("\\b([a-z](?:\\.[a-z])+)\\.?", text, ignore.case = TRUE)
  regmatches(text, matches) <- lapply(regmatches(text, matches), function(abbrev) {
    gsub("\\.", "DOTDOTDOT", abbrev)
  })

  return(text)
}

In [8]:
# Function to convert possessive forms
convert_possessives <- function(text) {
  text <- gsub("\\b(\\w+)'s\\b", "\\1 ", text, ignore.case = TRUE)
  text <- gsub("\\b(\\w+)'\\b", "\\1 ", text, ignore.case = TRUE)

  return(text)
}

In [9]:
corpus_clean <- original_corpus %>% 
  mutate(
      Text = as.character(Text),
      Text = preserve_abbreviations(Text),
      Text = gsub("\\b\\S*\\.(com|org|gov|edu|htm|net)\\S*\\b", " ", Text, ignore.case = TRUE),
      Text = gsub("\\S*@\\S*", " ", Text, ignore.case = TRUE),
      Text = gsub("\\S*\\d+\\S*", " ", Text, ignore.case = TRUE),
      Text = replace_contraction(Text),
      Text = gsub(paste0("\\b(", paste(stopwords("en"), collapse = "|"), ")\\b"), " ", Text, ignore.case = TRUE),
      Text = gsub(paste0("\\b(", paste(BuckleySaltonSWL, collapse = "|"), ")\\b"), " ", Text, ignore.case = TRUE),
      Text = convert_possessives(Text),
      Text = replace_ordinal(Text, num.paste = TRUE, remove = TRUE),
      Text = replace_number(Text, num.paste = TRUE, remove = TRUE),
      Text = add_comma_space(Text)
  )

In [10]:
lemma_dict <- make_lemma_dictionary(
    corpus_clean$Text,
    engine = "lexicon"
)

final_lemma_dict <- lemma_dict %>% 
    filter(nchar(trimws(token)) > 1)

str(final_lemma_dict)
final_lemma_dict

“[1m[22m`tbl_df()` was deprecated in dplyr 1.0.0.
[36mℹ[39m Please use `tibble::as_tibble()` instead.
[36mℹ[39m The deprecated feature was likely used in the [34mtextstem[39m package.
  Please report the issue to the authors.”


'data.frame':	15907 obs. of  2 variables:
 $ token: chr  "abalones" "abandoned" "abandoning" "abandons" ...
 $ lemma: chr  "abalone" "abandon" "abandon" "abandon" ...
 - attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "sorted")= chr "token"


token,lemma
<chr>,<chr>
abalones,abalone
abandoned,abandon
abandoning,abandon
abandons,abandon
abashed,abash
abated,abate
abating,abate
abbreviated,abbreviate
abdicated,abdicate
abdicates,abdicate


In [11]:
corpus_more_clean <- corpus_clean  %>% 
  mutate( 
      Text = lemmatize_strings(Text, final_lemma_dict),
      Text = strip(Text, char.keep = NULL, apostrophe.remove = TRUE),
      Text = gsub("DOTDOTDOT", ".", Text, ignore.case = TRUE),
      Text = stripWhitespace(Text)
  )

In [12]:
str(original_corpus)
head(original_corpus)

'data.frame':	5239 obs. of  5 variables:
 $ X    : int  0 1 2 3 4 5 6 7 8 9 ...
 $ GOID : num  4.22e+08 4.09e+08 4.29e+08 4.19e+08 2.07e+09 ...
 $ Title: chr  "Developers Urge Delay of Gnatcatcher Ruling; Birds: Species isn't in danger, they claim. Federal decision this "| __truncated__ "TODAY IN CONGRESS" "Plan Offered to Aid Northwest Salmon and Trout" "SAVING WILDLIFE" ...
 $ Date : chr  "2000-09-26" "1999-07-20" "1994-03-27" "1999-07-05" ...
 $ Text : chr  "Armed with a new study, developers are demanding that federal officials postpone a decision due later this week"| __truncated__ "SENATE Meets at 9:30 a.m. Committees: Armed Services -- 9:30 a.m. U.S. policy & military operations regarding K"| __truncated__ "AP The Clinton Administration has proposed protection zones along rivers and streams to help threatened fish sp"| __truncated__ "In the Tribune's June 18 news item on the recovery and proposed removal of the bald eagle from the endangered s"| __truncated__ ...


Unnamed: 0_level_0,X,GOID,Title,Date,Text
Unnamed: 0_level_1,<int>,<dbl>,<chr>,<chr>,<chr>
1,0,421721236,"Developers Urge Delay of Gnatcatcher Ruling; Birds: Species isn't in danger, they claim. Federal decision this week could protect almost 800,000 acres.",2000-09-26,"Armed with a new study, developers are demanding that federal officials postpone a decision due later this week on providing nearly 800,000 acres of ""critical habitat"" for the tiny gnatcatcher in Southern California. The study, co-written by Jonathon L. Atwood, a biologist whose earlier research concluded that the birds were nearing extinction, is being used by developers as proof that the California gnatcatcher is genetically so close to a Mexican songbird that the species is not in danger. Environmentalists and other scientists disagree, saying the California birds even look different from the Mexican ones. There are only a few thousand pairs of gnatcatchers in California, but there are millions in Mexico. The study was funded by the Building Industry Assn. of Southern California, the Transportation Corridor Agencies, the U.S. Navy and others. Written by Robert M. Zink, George F. Barrowclough, Rachelle C. Blackwell-Rago and Atwood, the study compares DNA from gnatcatchers in Mexico with DNA obtained from feathers of nestlings in the United States. ""Put simply, based on [DNA] data, northern populations do not appear to constitute a unique component of gnatcatcher biodiversity,"" the study concludes. The scientists cautioned that they were able to test just a small number of birds, and that they might be seeing evidence of genetic mutations or cross-breeding between the Mexican and American birds. But in a letter mailed Monday, Irvine attorney Rob Thornton urged Interior Secretary Bruce Babbitt to postpone a decision on critical habitat for the bird. A judge has ordered federal officials to designate critical habitat for the bird by Saturday. U.S. Fish and Wildlife officials would have to seek an emergency extension from the court to bypass that order. Atwood's 1990 report that California gnatcatchers were distinct from their Mexican cousins was the primary scientific evidence that the government relied on in deciding to list the gnatcatcher as a threatened species. Thornton, who represents the Transportation Corridor Agencies, Pulte Homes Corp. and Forest Lawn Memorial-Park Assn., said it was ""impossible to determine that [the gnatcatcher] needs 800,000 acres as essential, given the fact that it doesn't appear to be threatened."" But environmental groups say that even if there are genetic similarities with the Mexican birds, there are also obvious differences, such as the birds' coloring. They say the final habitat decision is vital because the gnatcatcher and the sage scrub it nests in are under siege from development. ""The Endangered Species Act doesn't have a narrow, inflexible genetic definition of what is and what isn't a species,"" said Andrew Wetzler, an attorney with the Natural Resources Defense Council. Environmentalists also note that healthy populations in a neighboring country do not mean a species is safe in the United States. Kimball Garrett, ornithology collections manager of the Natural History Museum of Los Angeles County, cited the birds' appearances as proof of differences. The California birds are darker, and the Mexican birds have more white on their breasts and tails, he said. Caption: PHOTO: A male California gnatcatcher.; PHOTOGRAPHER: AL SCHABEN / Los Angeles Times; PHOTO: Some slopes along the Eastern Transportation Corridor toll road near Orange contain coastal sage scrub, habitat of the gnatcatcher.; PHOTOGRAPHER: DON KELSEN / Los Angeles Times Credit: TIMES STAFF WRITER"
2,1,408507519,TODAY IN CONGRESS,1999-07-20,"SENATE Meets at 9:30 a.m. Committees: Armed Services -- 9:30 a.m. U.S. policy & military operations regarding Kosovo. Defense Sec. William Cohen & Joint Chiefs Chairman Gen. Henry Shelton. 216 Hart Office Bldg. Armed Services -- 9:30 a.m. Nominations of F. Whitten Peters to be secretary of Air Force & Arthur Money to be assistant secretary of defense for command, control, communications & intelligence. 222 Russell Office Bldg. Budget -- 10 a.m. Mid-session review of president's FY 2000 budget. 608 Dirksen Office Bldg. Energy & Natural Resources -- 2:30 p.m. Forests & public land management subc. National Monument Public Participation Act. 366 DOB. Environment & Public Works -- 9:30 a.m. Fisheries, wildlife & drinking water subc. Habitat conservation plans under Endangered Species Act. 406 DOB. Foreign Relations -- 11 a.m. Nominations for ambassadors to the Philippines, Indonesia, Palau, Fiji, Nauru, Tonga, Tuvalu & Brunei Darussalam. 419 DOB. Foreign Relations -- 2 p.m. International operations subc. Closed. U.N. international criminal court. 419 DOB. Governmental Affairs -- 9:30 a.m. Permanent investigations subc. Deceptive mailing practices including sweepstakes, skill contests or look-alike mailings & need for legislation to control such practices. 342 DOB. Health, Education, Labor & Pensions -- 9:30 a.m. Elementary & Secondary Education Act Reauth. 430 DOB. Special Aging -- 2:30 p.m. Impact of drug switching on older Americans. 106 DOB. HOUSE Meets at 9 a.m. Committees: Agriculture -- 10 a.m. General farm commodities subc. Small Watershed Rehabilitation Amendments of 1999. 1300 Longworth House Office Bldg. Appropriations -- 9:30 a.m. Mark up D.C., energy & water, & foreign operations approps. for FY 2000. 2359 Rayburn House Office Bldg. Banking & Financial Services -- 10 a.m. Financial institutions & consumer credit subc. Financial privacy issues. 2128 RHOB. Commerce -- 10 a.m. Oversight & investigations subc. Security inspections at DOE's Lawrence Livermore National Laboratory. May close. 2322 RHOB. Commerce -- 10 a.m. Telecommunications, trade & consumer protection subc. Corporation for Public Broadcasting Authorization Act. 2123 RHOB. Judiciary -- 10 a.m. Mark up pending legislation. 2141 RHOB. Judiciary -- 2 p.m. Mark up Pain Relief Promotion Act. 2237 RHOB. Resources -- 10 a.m. National parks, forests & lands subc. Pending legislation. 1324 LHOB. Ways & Means -- 10 a.m. Human resources subc. Adoption & other permanent placements. B-318 RHOB. Ways & Means -- 2 p.m. Health subc. 1100 LHOB. Credit: Reuters"
3,2,429461998,Plan Offered to Aid Northwest Salmon and Trout,1994-03-27,"AP The Clinton Administration has proposed protection zones along rivers and streams to help threatened fish species on Federal lands in eastern Oregon, eastern Washington, Northern California and Idaho. The proposal, given the name Pacfish, would result in less logging, livestock grazing and recreation along thousands of miles of streams in an effort to save several salmon and trout species from extinction. It features protective buffer strips similar to those planned for national forests containing northern spotted owls in western Oregon and Washington. ""It is the goal of Pacfish to reverse the degradation of anadromous fish habitat on Forest Service and administered lands in order to avoid the need for more listings under the Endangered Species Act,"" the Interior and Agriculture Departments said in a joint statement on Friday as they released the plan. Anadromous fish like salmon swim upstream to spawn. 'Significant Declines' Are Cited ""Recent reports have found about half of the 400 stocks of native Pacific anadromous fish are showing significant declines in numbers and 106 already are extinct,"" they said. Some of the protection zones would be as wide as 300 feet, or the length of a football field, while others would allow activity within 50 feet of the water. The 300-foot buffers would be on either side of fish-bearing streams. Fifty-foot buffers are planned for ""non-key"" watersheds -- permanent streams that do not contain fish and around ponds, reservoirs and wetlands larger than one acre. For intermittent streams, those that dry up for part of the year, protection would be limited to 100 feet on either side. The proposal, which affects only lands east of the Cascade Range, was unveiled as part of an environmental assessment issued to comply with the National Environmental Policy Act. The Administration estimates it would cost $20 million to carry out the proposal, which is the preferred option mentioned in the assessment. Other more extensive options being considered would cost as much as $54 million, the document said. 45 Days for Comment The Administration will accept public comment on the plan for 45 days before moving to make it formal Government policy on the lands. The plan would remain in place until Government scientists complete a longer-term analysis through a more extensive environmental impact statement, said Jim Sanders, a Forest Service spokesman. The Administration estimates that the new restrictions would cause a loss in logging of about 58 million board feet a year. Ranchers would be allowed to graze 42,000 fewer ""animal units"" per month; an animal unit is equivalent to a cow and her calf, or five sheep. The plan is designed to help preserve stocks of chinook, coho, chum, sockeye and pink salmon as well as steelhead and sea-run cutthroat trout. Studies have shown that logging and grazing along stream banks accelerate erosion, which fills streams with silt, and also eliminates shade needed to keep the waters cool enough for some fish species. The strategy focuses on ""habitat features needed to support healthy aquatic ecosystems, such as appropriate pool frequency and width-to-depth ratios, cool water temperatures, woody debris in streams, stream-bank stability and lower bank angles,"" the Government statement said. It does not apply to private lands, nor would it add new conservation measures to the area within the range of the northern spotted owl. ""This is the first time good fisheries science has prevailed over the timber harvest program,"" said Glen Spain, Northwest regional director of Pacific Coast Federation of Fishermen's Associations. ""It represents a fight in D.C. in the Forest Service between the timber supply program and the wildlife and fish protection biologists."" Mr. Spain represents commercial fishermen, who are going to be shut off from salmon fishing this year off the coast of Washington and northern Oregon because of declining runs."
4,3,418798522,SAVING WILDLIFE,1999-07-05,"In the Tribune's June 18 news item on the recovery and proposed removal of the bald eagle from the endangered species list, it mentioned that the eagle was near extinction. This is misleading. The bald eagle was near extinction in the lower 48 states. There have always been healthy populations in Alaska and Canada. This distinction needs to be made for several reasons. First, it's true. Second, it helps shows the flexibility of the Endangered Species Act (ESA), giving the government latitude to protect distinct populations of a species instead of its entire range. Finally, it shows the need for local conservation efforts to protect biological diversity. After all, the reason the federal government is involved is because the states failed to conserve their own native plants and animals. Illinois and Indiana each have more than 200 species of concern that could require federal action in the future if the states don't get their act together and take measures to protect them. In the past the states simply let this responsibility fall on the feds. It was also the ESA that gave the government the authority to ban the use of DDT in the U.S., leading to the recovery of the bald eagle, peregrine falcon, brown pelican and others. It should be noted that DDT is still manufactured here and is exported to other countries to perform its magic there. As an advocate for the conservation of the nation's biological resources, I strongly support the ESA, but both advocates and the media need to conduct the debate in total honesty. If not, the campaign loses credibility."
5,4,2073930752,Wildlife Threatened Again,2018-07-24,"The 1973 Endangered Species Act is at once the noblest and most contentious of the landmark environmental statutes enacted during the Nixon presidency. For 45 years it has been celebrated by conservationists for protecting, in Richard Nixon's words, ""an irreplaceable part of our natural heritage, threatened wildlife."" In equal measure, it has been reviled by developers, ranchers, loggers and oil and gas interests for elevating the needs of plants and animals and the habitats necessary for their survival over the demands of commerce. Approved by huge margins in both chambers (the House vote was an astounding 355 to 4), the act would stand zero chance of passage in today's Congress and political climate.The act's three main purposes are simply stated: identifying species that need to be listed as endangered (headed toward extinction) or threatened (likely to become endangered); designating habitat necessary for the species' survival; and nurturing the process until the species have not just survived but recovered in sustainable numbers.The act has been around long enough to have accumulated plenty of enemies, and now, emboldened by a determined anti-regulatory president, its critics are again on the march. A suite of measures in the House and others in development in the Senate would, in aggregate, weaken the role that scientists play in deciding which species need help, while increasing the influence of state governments -- many of which, particularly in the West, depend on revenues from royalties and jobs provided by extractive industries like mining, oil and gas, and care little for the species that occupy potentially productive lands.Last week came the administration's own unsettling proposals, announced by David Bernhardt, the deputy secretary of the Interior Department and one of several spear carriers for the oil and gas industry who have risen to commanding policymaking roles under Interior's boss, Ryan Zinke. Mr. Bernhardt said the changes would streamline and clarify the regulatory process, and some of the 118 pages of daunting bureaucratic prose seem, innocently enough, to attempt to do just that. But several proposals bode ill for animals and plants and well for President Trump's overarching ambition to reduce costs and other burdens for business, particularly the energy business. Here are three.One would introduce cost considerations that do not now exist. As written, the statute and its implementing regulations require listing decisions to be made ""solely on the basis of the best scientific and commercial data available"" and ""without reference to possible economic or other impacts of such determination."" The proposal would eliminate the latter phrase, thereby opening a listing decision to cost-benefit analysis. Tom Carper of Delaware, the top Democrat on the Senate Environment and Public Works Committee, fears that this could undermine science and cause federal officials to think twice about protecting a species -- hardly an unfounded fear in this administration.A second proposal would weaken safeguards for threatened species, which now enjoy the same blanket protections against harm (hunting, shooting, trapping and so on) that apply to endangered species. Threatened species would now be judged on a case-by-case basis.A third proposal could make it harder for some species to gain a foothold on the threatened list to begin with. The statute defines a threatened species as one ""that is likely to become an endangered species within the foreseeable future throughout all or a significant portion of its range."" The Obama administration defined ""foreseeable future"" liberally -- for instance, listing the Arctic bearded seal as threatened because the ice sheets the seal relies on would almost certainly disappear by the end of the century because of global warming. That's too speculative for the Trump people, whose scientists and policymakers would henceforth be required to ""avoid speculating as to what is hypothetically possible."" To Mr. Carper, that's a clear invitation to limit protections for species threatened by climate change, of which there are many.As is often the case nowadays, casuistry abounds. Republicans in Congress, for instance, love to argue that only 3 percent or so of the 1,600-plus listed species have recovered to the point where they can be removed from the list -- including, notably, the bald eagle, the peregrine falcon, the American alligator and the gray wolf. That is a perverse way of measuring progress; species once hurtling toward extinction can hardly be expected to build sustainable populations overnight. It's taken the grizzly bear more than 40 years. A far better measure is that an even smaller percentage have actually gone to their doom.Individual species aside, the act's habitat requirements have also produced great gains for ecosystems as a whole. A succession of inconspicuous birds listed as endangered or threatened -- the spotted owl, the marbled murrelet, the coastal California gnatcatcher -- has saved millions of acres of old growth forest and open space along the Pacific Coast from logging and commercial development. Efforts to save the wood stork and Florida panther have helped nourish the Everglades.If Mr. Zinke wanted real reform, he would take a leaf from the Clinton and Obama playbooks and, through economic incentives or negotiations or both, try to persuade states, landowners and industry to collaborate on a grand scale to save a species before it winds up on the endangered list. A spectacular example of this approach was the Obama administration's decision to work with states and private parties to protect millions of acres of habitat across 10 Western states occupied by the greater sage grouse so as to make a listing unnecessary.Fat chance. Not only has Mr. Zinke shown no enthusiasm for such a strategy, but responding to bleats from some oil and gas interests, he's actually seeking to repudiate much of the Obama plan. So much for collaboration. So much for the sage grouse.Follow The New York Times Opinion section on Facebook and Twitter (@NYTOpinion) , and sign up for the Opinion Today newsletter .Credit: THE EDITORIAL BOARDDRAWING (DRAWING BY LUCY JONES)"
6,5,419462981,County puts out rabid bat warning,2001-07-24,"If a bat can't fly or gets trapped inside a building, it probably has rabies, health officials warned Monday after confirming that two bats with the disease were found in Lake County. One of three brown bats captured in the hallway of a Zion apartment building on July 17 tested positive for rabies as did one trapped in Wildwood nearly a week earlier, said Leslie Piotrowski, spokeswoman for the county health department. Five residents of the apartments in the 3000 block of Edina Boulevard may have had contact with the bats and were advised by health officials to get a series of anti-viral and rabies vaccination shots. The Wildwood homeowner, whose address was not released, didn't need the shots because there was no contact with the bat or its saliva. ""To be exposed, you have to be bitten or receive skin damage by a bat, or bat saliva has to come into contact with your eyes, mouth, nose or other mucous membranes,"" said Victor Plotkin of the health department. Bats are more prevalent in the summer when mosquitoes and other insects--their favorite food--are readily available. Uninfected bats usually avoid humans, preferring bluffs, caves and other hidden dwellings. Because they play an important ecological role by eating insects, bats shouldn't be harmed or trapped, Plotkin said. Several types of bats are protected under the Endangered Species Act. Bats that are unable to fly, get into homes or drop on lawns are probably infected with rabies and should be avoided. Lake County residents who see such a bat should call the health department at 847- 360-6423. Last month, the Illinois Public Health Department urged residents to be cautious around bats--the state's most commonly identified rabid animal--after it received reports that two people were bitten and another was exposed. All three people received anti-rabies shots. Of the nearly 4,000 animals examined for rabies last year by state public health officials, 22 tested positive, all of them bats. In the 1980s and early 1990s, skunks were the most commonly identified animal with rabies in Illinois."


In [14]:
str(corpus_clean)
head(corpus_clean)

'data.frame':	5239 obs. of  5 variables:
 $ X    : int  0 1 2 3 4 5 6 7 8 9 ...
 $ GOID : num  4.22e+08 4.09e+08 4.29e+08 4.19e+08 2.07e+09 ...
 $ Title: chr  "Developers Urge Delay of Gnatcatcher Ruling; Birds: Species isn't in danger, they claim. Federal decision this "| __truncated__ "TODAY IN CONGRESS" "Plan Offered to Aid Northwest Salmon and Trout" "SAVING WILDLIFE" ...
 $ Date : chr  "2000-09-26" "1999-07-20" "1994-03-27" "1999-07-05" ...
 $ Text : chr  "Armed       study, developers   demanding   federal officials postpone   decision due     week   providing     "| __truncated__ "SENATE Meets     aDOTDOTDOTmDOTDOTDOT Committees: Armed Services --   aDOTDOTDOTmDOTDOTDOT UDOTDOTDOTSDOTDOTDOT"| __truncated__ "AP   Clinton Administration   proposed protection zones   rivers   streams     threatened fish species   Federa"| __truncated__ "Tribune  June   news item     recovery   proposed removal     bald eagle     endangered species list,   mention"| __truncated__ ...


Unnamed: 0_level_0,X,GOID,Title,Date,Text
Unnamed: 0_level_1,<int>,<dbl>,<chr>,<chr>,<chr>
1,0,421721236,"Developers Urge Delay of Gnatcatcher Ruling; Birds: Species isn't in danger, they claim. Federal decision this week could protect almost 800,000 acres.",2000-09-26,"Armed study, developers demanding federal officials postpone decision due week providing acres ""critical habitat"" tiny gnatcatcher Southern California. study, -written Jonathon L. Atwood, biologist earlier research concluded birds nearing extinction, developers proof California gnatcatcher genetically close Mexican songbird species danger. Environmentalists scientists disagree, California birds Mexican . thousand pairs gnatcatchers California, millions Mexico. study funded Building Industry Assn. Southern California, Transportation Corridor Agencies, UDOTDOTDOTSDOTDOTDOT Navy . Written Robert M. Zink, George F. Barrowclough, Rachelle C. Blackwell-Rago Atwood, study compares DNA gnatcatchers Mexico DNA obtained feathers nestlings United States. ""Put simply, based [DNA] data, northern populations constitute unique component gnatcatcher biodiversity, "" study concludes. scientists cautioned test small number birds, evidence genetic mutations cross-breeding Mexican American birds. letter mailed Monday, Irvine attorney Rob Thornton urged Interior Secretary Bruce Babbitt postpone decision critical habitat bird. judge ordered federal officials designate critical habitat bird Saturday. UDOTDOTDOTSDOTDOTDOT Fish Wildlife officials seek emergency extension court bypass order. Atwood report California gnatcatchers distinct Mexican cousins primary scientific evidence government relied deciding list gnatcatcher threatened species. Thornton, represents Transportation Corridor Agencies, Pulte Homes Corp. Forest Lawn Memorial-Park Assn., ""impossible determine [ gnatcatcher] acres essential, fact threatened."" environmental groups genetic similarities Mexican birds, obvious differences, birds' coloring. final habitat decision vital gnatcatcher sage scrub nests siege development. "" Endangered Species Act narrow, inflexible genetic definition species, "" Andrew Wetzler, attorney Natural Resources Defense Council. Environmentalists note healthy populations neighboring country species safe United States. Kimball Garrett, ornithology collections manager Natural History Museum Los Angeles County, cited birds' appearances proof differences. California birds darker, Mexican birds white breasts tails, . Caption: PHOTO: male California gnatcatcher.; PHOTOGRAPHER: AL SCHABEN / Los Angeles Times; PHOTO: slopes Eastern Transportation Corridor toll road Orange coastal sage scrub, habitat gnatcatcher.; PHOTOGRAPHER: DON KELSEN / Los Angeles Times Credit: TIMES STAFF WRITER"
2,1,408507519,TODAY IN CONGRESS,1999-07-20,"SENATE Meets aDOTDOTDOTmDOTDOTDOT Committees: Armed Services -- aDOTDOTDOTmDOTDOTDOT UDOTDOTDOTSDOTDOTDOT policy & military operations Kosovo. Defense Sec. William Cohen & Joint Chiefs Chairman Gen. Henry Shelton. Hart Office Bldg. Armed Services -- aDOTDOTDOTmDOTDOTDOT Nominations F. Whitten Peters secretary Air Force & Arthur Money assistant secretary defense command, control, communications & intelligence. Russell Office Bldg. Budget -- aDOTDOTDOTmDOTDOTDOT Mid-session review president FY budget. Dirksen Office Bldg. Energy & Natural Resources -- pDOTDOTDOTmDOTDOTDOT Forests & public land management subc. National Monument Public Participation Act. DOB. Environment & Public Works -- aDOTDOTDOTmDOTDOTDOT Fisheries, wildlife & drinking water subc. Habitat conservation plans Endangered Species Act. DOB. Foreign Relations -- aDOTDOTDOTmDOTDOTDOT Nominations ambassadors Philippines, Indonesia, Palau, Fiji, Nauru, Tonga, Tuvalu & Brunei Darussalam. DOB. Foreign Relations -- pDOTDOTDOTmDOTDOTDOT International operations subc. Closed. UDOTDOTDOTNDOTDOTDOT international criminal court. DOB. Governmental Affairs -- aDOTDOTDOTmDOTDOTDOT Permanent investigations subc. Deceptive mailing practices including sweepstakes, skill contests -alike mailings & legislation control practices. DOB. Health, Education, Labor & Pensions -- aDOTDOTDOTmDOTDOTDOT Elementary & Secondary Education Act Reauth. DOB. Special Aging -- pDOTDOTDOTmDOTDOTDOT Impact drug switching older Americans. DOB. HOUSE Meets aDOTDOTDOTmDOTDOTDOT Committees: Agriculture -- aDOTDOTDOTmDOTDOTDOT General farm commodities subc. Small Watershed Rehabilitation Amendments Longworth House Office Bldg. Appropriations -- aDOTDOTDOTmDOTDOTDOT Mark DDOTDOTDOTCDOTDOTDOT, energy & water, & foreign operations approps. FY Rayburn House Office Bldg. Banking & Financial Services -- aDOTDOTDOTmDOTDOTDOT Financial institutions & consumer credit subc. Financial privacy issues. RHOB. Commerce -- aDOTDOTDOTmDOTDOTDOT Oversight & investigations subc. Security inspections DOE Lawrence Livermore National Laboratory. close. RHOB. Commerce -- aDOTDOTDOTmDOTDOTDOT Telecommunications, trade & consumer protection subc. Corporation Public Broadcasting Authorization Act. RHOB. Judiciary -- aDOTDOTDOTmDOTDOTDOT Mark pending legislation. RHOB. Judiciary -- pDOTDOTDOTmDOTDOTDOT Mark Pain Relief Promotion Act. RHOB. Resources -- aDOTDOTDOTmDOTDOTDOT National parks, forests & lands subc. Pending legislation. LHOB. Ways & Means -- aDOTDOTDOTmDOTDOTDOT Human resources subc. Adoption & permanent placements. RHOB. Ways & Means -- pDOTDOTDOTmDOTDOTDOT Health subc. LHOB. Credit: Reuters"
3,2,429461998,Plan Offered to Aid Northwest Salmon and Trout,1994-03-27,"AP Clinton Administration proposed protection zones rivers streams threatened fish species Federal lands eastern Oregon, eastern Washington, Northern California Idaho. proposal, Pacfish, result logging, livestock grazing recreation thousands miles streams effort save salmon trout species extinction. features protective buffer strips similar planned national forests northern spotted owls western Oregon Washington. "" goal Pacfish reverse degradation anadromous fish habitat Forest Service administered lands order avoid listings Endangered Species Act, "" Interior Agriculture Departments joint statement Friday released plan. Anadromous fish salmon swim upstream spawn. 'Significant Declines' Cited ""Recent reports found half stocks native Pacific anadromous fish showing significant declines numbers extinct, "" . protection zones wide feet, length football field, activity feet water. buffers side fish-bearing streams. Fifty-foot buffers planned "" -key"" watersheds -- permanent streams fish ponds, reservoirs wetlands larger acre. intermittent streams, dry part year, protection limited feet side. proposal, affects lands east Cascade Range, unveiled part environmental assessment issued comply National Environmental Policy Act. Administration estimates cost million carry proposal, preferred option mentioned assessment. extensive options considered cost million, document . Days Comment Administration accept public comment plan days moving make formal Government policy lands. plan remain place Government scientists complete longer-term analysis extensive environmental impact statement, Jim Sanders, Forest Service spokesman. Administration estimates restrictions loss logging million board feet year. Ranchers allowed graze fewer ""animal units"" month; animal unit equivalent cow calf, sheep. plan designed preserve stocks chinook, coho, chum, sockeye pink salmon steelhead sea-run cutthroat trout. Studies shown logging grazing stream banks accelerate erosion, fills streams silt, eliminates shade needed waters cool fish species. strategy focuses ""habitat features needed support healthy aquatic ecosystems, pool frequency width- -depth ratios, cool water temperatures, woody debris streams, stream-bank stability lower bank angles, "" Government statement . apply private lands, add conservation measures area range northern spotted owl. "" time good fisheries science prevailed timber harvest program, "" Glen Spain, Northwest regional director Pacific Coast Federation Fishermen Associations. "" represents fight DDOTDOTDOTCDOTDOTDOT Forest Service timber supply program wildlife fish protection biologists."" Mr. Spain represents commercial fishermen, shut salmon fishing year coast Washington northern Oregon declining runs."
4,3,418798522,SAVING WILDLIFE,1999-07-05,"Tribune June news item recovery proposed removal bald eagle endangered species list, mentioned eagle extinction. misleading. bald eagle extinction lower states. healthy populations Alaska Canada. distinction made reasons. , true. , helps shows flexibility Endangered Species Act (ESA), giving government latitude protect distinct populations species entire range. Finally, shows local conservation efforts protect biological diversity. , reason federal government involved states failed conserve native plants animals. Illinois Indiana species concern require federal action future states act measures protect . past states simply responsibility fall feds. ESA gave government authority ban DDT UDOTDOTDOTSDOTDOTDOT, leading recovery bald eagle, peregrine falcon, brown pelican . noted DDT manufactured exported countries perform magic . advocate conservation nation biological resources, strongly support ESA, advocates media conduct debate total honesty. , campaign loses credibility."
5,4,2073930752,Wildlife Threatened Again,2018-07-24,"Endangered Species Act noblest contentious landmark environmental statutes enacted Nixon presidency. years celebrated conservationists protecting, Richard Nixon words, "" irreplaceable part natural heritage, threatened wildlife."" equal measure, reviled developers, ranchers, loggers oil gas interests elevating plants animals habitats survival demands commerce. Approved huge margins chambers ( House vote astounding act stand chance passage today Congress political climate. act main purposes simply stated: identifying species listed endangered (headed extinction) threatened ( endangered); designating habitat species' survival; nurturing process species survived recovered sustainable numbers. act long accumulated plenty enemies, , emboldened determined anti-regulatory president, critics march. suite measures House development Senate , aggregate, weaken role scientists play deciding species , increasing influence state governments -- , West, depend revenues royalties jobs provided extractive industries mining, oil gas, care species occupy potentially productive lands. week administration unsettling proposals, announced David Bernhardt, deputy secretary Interior Department spear carriers oil gas industry risen commanding policymaking roles Interior boss, Ryan Zinke. Mr. Bernhardt streamline clarify regulatory process, pages daunting bureaucratic prose , innocently , attempt . proposals bode animals plants President Trump overarching ambition reduce costs burdens business, energy business. . introduce cost considerations exist. written, statute implementing regulations require listing decisions made ""solely basis scientific commercial data "" "" reference economic impacts determination."" proposal eliminate phrase, opening listing decision cost-benefit analysis. Tom Carper Delaware, top Democrat Senate Environment Public Works Committee, fears undermine science federal officials protecting species -- unfounded fear administration. proposal weaken safeguards threatened species, enjoy blanket protections harm (hunting, shooting, trapping ) apply endangered species. Threatened species judged case- -case basis. proposal make harder species gain foothold threatened list begin . statute defines threatened species "" endangered species foreseeable future significant portion range."" Obama administration defined ""foreseeable future"" liberally -- instance, listing Arctic bearded seal threatened ice sheets seal relies disappear end century global warming. speculative Trump people, scientists policymakers henceforth required ""avoid speculating hypothetically ."" Mr. Carper, clear invitation limit protections species threatened climate change, . case nowadays, casuistry abounds. Republicans Congress, instance, love argue percent listed species recovered point removed list -- including, notably, bald eagle, peregrine falcon, American alligator gray wolf. perverse measuring progress; species hurtling extinction expected build sustainable populations overnight. grizzly bear years. measure smaller percentage doom.Individual species , act habitat requirements produced great gains ecosystems . succession inconspicuous birds listed endangered threatened -- spotted owl, marbled murrelet, coastal California gnatcatcher -- saved millions acres growth forest open space Pacific Coast logging commercial development. Efforts save wood stork Florida panther helped nourish Everglades. Mr. Zinke wanted real reform, leaf Clinton Obama playbooks , economic incentives negotiations , persuade states, landowners industry collaborate grand scale save species winds endangered list. spectacular approach Obama administration decision work states private parties protect millions acres habitat Western states occupied greater sage grouse make listing unnecessary.Fat chance. Mr. Zinke shown enthusiasm strategy, responding bleats oil gas interests, seeking repudiate Obama plan. collaboration. sage grouse.Follow York Times Opinion section Facebook Twitter , sign Opinion Today newsletter .Credit: EDITORIAL BOARDDRAWING (DRAWING LUCY JONES)"
6,5,419462981,County puts out rabid bat warning,2001-07-24,"bat fly trapped inside building, rabies, health officials warned Monday confirming bats disease found Lake County. brown bats captured hallway Zion apartment building July tested positive rabies trapped Wildwood week earlier, Leslie Piotrowski, spokeswoman county health department. residents apartments block Edina Boulevard contact bats advised health officials series anti-viral rabies vaccination shots. Wildwood homeowner, address released, shots contact bat saliva. "" exposed, bitten receive skin damage bat, bat saliva contact eyes, mouth, nose mucous membranes, "" Victor Plotkin health department. Bats prevalent summer mosquitoes insects-- favorite food-- readily . Uninfected bats avoid humans, preferring bluffs, caves hidden dwellings. play important ecological role eating insects, bats harmed trapped, Plotkin . types bats protected Endangered Species Act. Bats unable fly, homes drop lawns infected rabies avoided. Lake County residents bat call health department month, Illinois Public Health Department urged residents cautious bats-- state commonly identified rabid animal-- received reports people bitten exposed. people received anti-rabies shots. animals examined rabies year state public health officials, tested positive, bats. early skunks commonly identified animal rabies Illinois."


In [13]:
str(corpus_more_clean)
head(corpus_more_clean)

'data.frame':	5239 obs. of  5 variables:
 $ X    : int  0 1 2 3 4 5 6 7 8 9 ...
 $ GOID : num  4.22e+08 4.09e+08 4.29e+08 4.19e+08 2.07e+09 ...
 $ Title: chr  "Developers Urge Delay of Gnatcatcher Ruling; Birds: Species isn't in danger, they claim. Federal decision this "| __truncated__ "TODAY IN CONGRESS" "Plan Offered to Aid Northwest Salmon and Trout" "SAVING WILDLIFE" ...
 $ Date : chr  "2000-09-26" "1999-07-20" "1994-03-27" "1999-07-05" ...
 $ Text : chr  "arm study developer demand federal official postpone decision due week provide acre critical habitat tiny gnatc"| __truncated__ "senate meet a.m. committee arm service a.m. u.s. policy military operation kosovo defense sec william cohen joi"| __truncated__ "ap clinton administration propose protection zone river stream threaten fish species federal land eastern orego"| __truncated__ "tribune june news item recovery propose removal bald eagle endanger species list mention eagle extinction misle"| __truncated__ ...


Unnamed: 0_level_0,X,GOID,Title,Date,Text
Unnamed: 0_level_1,<int>,<dbl>,<chr>,<chr>,<chr>
1,0,421721236,"Developers Urge Delay of Gnatcatcher Ruling; Birds: Species isn't in danger, they claim. Federal decision this week could protect almost 800,000 acres.",2000-09-26,arm study developer demand federal official postpone decision due week provide acre critical habitat tiny gnatcatcher southern california study write jonathon l atwood biologist early research conclude bird near extinction developer proof california gnatcatcher genetically close mexican songbird species danger environmentalist scientist disagree california bird mexican thousand pair gnatcatchers california million mexico study fund build industry assn southern california transportation corridor agency u.s. navy write robert m zink george f barrowclough rachelle c blackwell rago atwood study compare dna gnatcatchers mexico dna obtain feather nestling unite state put simply base dna datum northern population constitute unique component gnatcatcher biodiversity study conclude scientist caution test small numb bird evidence genetic mutation cross breed mexican american bird letter mail monday irvine attorney rob thornton urge interior secretary bruce babbitt postpone decision critical habitat bird judge order federal official designate critical habitat bird saturday u.s. fish wildlife official seek emergency extension court bypass order atwood report california gnatcatchers distinct mexican cousin primary scientific evidence government rely decide list gnatcatcher threaten species thornton represent transportation corridor agency pulte home corp forest lawn memorial park assn impossible determine gnatcatcher acre essential fact threaten environmental group genetic similarity mexican bird obvious difference birds color final habitat decision vital gnatcatcher sage scrub nest siege development endanger species act narrow inflexible genetic definition species andrew wetzler attorney natural resource defense council environmentalist note healthy population neighbor country species safe unite state kimball garrett ornithology collection manager natural history museum los angeles county cite birds appearance proof difference california bird dark mexican bird white breast tail caption photo male california gnatcatcher photographer al schaben los angeles time photo slope eastern transportation corridor toll road orange coastal sage scrub habitat gnatcatcher photographer don kelsen los angeles time credit time staff writer
2,1,408507519,TODAY IN CONGRESS,1999-07-20,senate meet a.m. committee arm service a.m. u.s. policy military operation kosovo defense sec william cohen joint chief chairman gen henry shelton hart office bldg arm service a.m. nomination f whitten peter secretary air force arthur money assistant secretary defense command control communication intelligence russell office bldg budget a.m. mid session review president fy budget dirksen office bldg energy natural resource p.m. forest public land management subc national monument public participation act dob environment public work a.m. fishery wildlife drink water subc habitat conservation plan endanger species act dob foreign relation a.m. nomination ambassador philippines indonesia palau fiji nauru tonga tuvalu brunei darussalam dob foreign relation p.m. international operation subc close u.n. international criminal court dob governmental affair a.m. permanent investigation subc deceptive mail practice include sweepstake skill contest alike mailing legislation control practice dob health education labor pension a.m. elementary secondary education act reauth dob special age p.m. impact drug switch old american dob house meet a.m. committee agriculture a.m. general farm commodity subc small watershed rehabilitation amendment longworth house office bldg appropriation a.m. mark d.c. energy water foreign operation approps fy rayburn house office bldg bank financial service a.m. financial institution consumer credit subc financial privacy issue rhob commerce a.m. oversight investigation subc security inspection doe lawrence livermore national laboratory close rhob commerce a.m. telecommunication trade consumer protection subc corporation public broadcast authorization act rhob judiciary a.m. mark pend legislation rhob judiciary p.m. mark pain relief promotion act rhob resource a.m. national park forest land subc pend legislation lhob way mean a.m. human resource subc adoption permanent placement rhob way mean p.m. health subc lhob credit reuters
3,2,429461998,Plan Offered to Aid Northwest Salmon and Trout,1994-03-27,ap clinton administration propose protection zone river stream threaten fish species federal land eastern oregon eastern washington northern california idaho proposal pacfish result log livestock graze recreation thousand mile stream effort save salmon trout species extinction feature protective buff strip similar plan national forest northern spot owl western oregon washington goal pacfish reverse degradation anadromous fish habitat forest service administer land order avoid listing endanger species act interior agriculture department joint statement friday release plan anadromous fish salmon swim upstream spawn significant decline cite recent report find half stock native pacific anadromous fish show significant decline number extinct protection zone wide foot length football field activity foot water buffer side fish bear stream fifty foot buffer plan key watershed permanent stream fish pond reservoir wetland large acre intermittent stream spindry part year protection limit foot side proposal affect land east cascade range unveil part environmental assessment issue comply national environmental policy act administration estimate cost million carry proposal prefer option mention assessment extensive option consider cost million document day comment administration accept public comment plan day move make formal government policy land plan remain place government scientist complete long term analysis extensive environmental impact statement jim sander forest service spokesman administration estimate restriction loss log million board foot year rancher allow graze few animal unit month animal unit equivalent cow calf sheep plan design preserve stock chinook coho chum sockeye pink salmon steelhead sea run cutthroat trout study show log graze stream bank accelerate erosion fill stream silt eliminate shade need water cool fish species strategy focus habitat feature need support healthy aquatic ecosystem pool frequency width depth ratio cool water temperature woody debris stream stream bank stability low bank angle government statement apply private land add conservation measure area range northern spot owl time good fishery science prevail timber harvest program glen spain northwest regional director pacific coast federation fisherman association represent fight d.c. forest service timber supply program wildlife fish protection biologist mr spain represent commercial fisherman shut salmon fish year coast washington northern oregon decline run
4,3,418798522,SAVING WILDLIFE,1999-07-05,tribune june news item recovery propose removal bald eagle endanger species list mention eagle extinction mislead bald eagle extinction low state healthy population alaska canada distinction make reason true help show flexibility endanger species act esa give government latitude protect distinct population species entire range finally show local conservation effort protect biological diversity reason federal government involve state fail conserve native plant animal illinois indiana species concern require federal action future state act measure protect past state simply responsibility fall feds esa give government authority ban ddt u.s. lead recovery bald eagle peregrine falcon brown pelican note ddt manufacture export country perform magic advocate conservation nation biological resource strongly support esa advocate medium conduct debate total honesty campaign lose credibility
5,4,2073930752,Wildlife Threatened Again,2018-07-24,endanger species act noble contentious landmark environmental statute enact nixon presidency year celebrate conservationist protect richard nixon word irreplaceable part natural heritage threaten wildlife equal measure revile developer rancher logger oil gas interest elevate plant animal habitat survival demand commerce approve huge margin chamber house vote astound act stand chance passage today congress political climate act main purpose simply state identify species list endanger head extinction threaten endanger designate habitat species survival nurture process species survive recover sustainable number act long accumulate plenty enemy embolden determine anti regulatory president critic march suite measure house development senate aggregate weaken role scientist play decide species increase influence state government west depend revenue royalty job provide extractive industry mine oil gas care species occupy potentially productive land week administration unsettle proposal announce david bernhardt deputy secretary interior department spear carrier oil gas industry rise command policymaking role interior boss ryan zinke mr bernhardt streamline clarify regulatory process page daunt bureaucratic prose innocently attempt proposal bide animal plant president trump overarch ambition reduce cost burden business energy business introduce cost consideration exist write statute implement regulation require list decision make solely basis scientific commercial datum reference economic impact determination proposal eliminate phrase open list decision cost benefit analysis tom carper delaware top democrat senate environment public work committee fear undermine science federal official protect species unfounded fear administration proposal weaken safeguard threaten species enjoy blanket protection harm hunt shoot trap apply endanger species threaten species judge case case basis proposal make hard species gain foothold threaten list begin statute define threaten species endanger species foreseeable future significant portion range obama administration define foreseeable future liberally instance list arctic beard seal threaten ice sheet seal rely disappear end century global warm speculative trump people scientist policymaker henceforth require avoid speculate hypothetically mr carper clear invitation limit protection species threaten climate change case nowadays casuistry abound republican congress instance love argue percent list species recover point remove list include notably bald eagle peregrine falcon american alligator gray wolf perverse measure progress species hurtle extinction expect build sustainable population overnight grizzly bear year measure small percentage doom individual species act habitat requirement produce great gain ecosystem succession inconspicuous bird list endanger threaten spot owl marble murrelet coastal california gnatcatcher save million acre growth forest open space pacific coast log commercial development effort save wood stork florida panther help nourish everglades mr zinke want real reform leaf clinton obama playbooks economic incentive negotiation persuade state landowner industry collaborate grand scale save species wind endanger list spectacular approach obama administration decision work state private party protect million acre habitat western state occupy great sage grouse make list unnecessary fat chance mr zinke show enthusiasm strategy respond bleat oil gas interest seek repudiate obama plan collaboration sage grouse follow york time opinion section facebook twitter sign opinion today newsletter credit editorial boarddrawing draw lucy jones
6,5,419462981,County puts out rabid bat warning,2001-07-24,bat fly trap inside build rabies health official warn monday confirm bat disease find lake county brown bat capture hallway zion apartment build july test positive rabies trap wildwood week early leslie piotrowski spokeswoman county health department resident apartment block edina boulevard contact bat advise health official series anti viral rabies vaccination shot wildwood homeowner address release shot contact bat saliva expose bite receive skin damage bat bat saliva contact eye mouth nose mucous membrane victor plotkin health department bat prevalent summer mosquito insect favorite food readily uninfected bat avoid human prefer bluff cave hide dwelling play important ecological role eat insect bat harm trap plotkin type bat protect endanger species act bat unable fly home drop lawn infect rabies avoid lake county resident bat call health department month illinois public health department urge resident cautious bat state commonly identify rabid animal receive report people bite expose people receive anti rabies shot animal examine rabies year state public health official test positive bat early skunk commonly identify animal rabies illinois


In [15]:
# Save the corpus as a csv file
write.csv(
    corpus_more_clean,
    here("data", "processed", "clean_text_corpus.csv")
)

In [None]:
# Testing
#Text <- c(
#    "TestIng/gr.com/ tESt/.Net/org",
#    ".Net/org Test .com can't   S. Lily's",
#    "   S. Lily's first PlAce 1st, tests patches",
#    "howtotestthis.com/",
#    "U.S.A. u.s.a lily356@edu.net/",
#    "lily356@edu.net/ gr8t. --g-5_ GrizZly bear.!",
#    "howtotestthis.net hottohelp.com grizzlies-",
#    "testhelp.com/help.org testhelp.com/help.org/",
#    ",GRIZZLY BEAR:  GRIZZLY-BEAR, grizzlies testing",
#    " midflorida.com.GRIZZLY BEAR: ",
#    "grizzlybearsarecool coolgrizzliesbear"
#)
#GOID <- c(11345, 324, 876, 335, 987, 574, 9, 87, 768, 24, 5)

#sample_corpus <- data.frame(GOID = GOID, Text = Text)
#sample_corpus