# Imports and packages

In [1]:
## helpful packages

import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import random
import re
import string

## nltk imports
import nltk
### uncomment and run these lines if you haven't downloaded relevant nltk add-ons yet
#nltk.download('averaged_perceptron_tagger')
#nltk.download('stopwords')
from nltk import pos_tag
from nltk.tokenize import word_tokenize, wordpunct_tokenize
from nltk.stem.snowball import SnowballStemmer
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords

## spacy imports
import spacy
### uncomment and run the below line if you haven't loaded the en_core_web_sm library yet
#! python3 -m spacy download en_core_web_sm
import en_core_web_sm
nlp = en_core_web_sm.load()

## vectorizer
from sklearn.feature_extraction.text import CountVectorizer

## sentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

## lda
from gensim import corpora
import gensim

## repeated printouts and wide-format text
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
pd.set_option('display.max_colwidth', None)

# Text analysis of Department of Justice (DOJ) press releases (50 points total)

- For background:

    - DOJ is the federal law enforcement agency responsible for federal prosecutions; this contrasts with the local prosecutions in the Cook County dataset we analyzed earlier. Here's a short explainer on which crimes get prosecuted federally versus locally: https://www.criminaldefenselawyer.com/resources/criminal-defense/federal-crime/state-vs-federal-crimes.htm#:~:text=Federal%20criminal%20prosecutions%20are%20handled,of%20state%20and%20local%20law. 
    - Here's the Kaggle that contains the data: https://www.kaggle.com/jbencina/department-of-justice-20092018-press-releases 
    - Here's the code the dataset creator used to scrape those press releases here if you're interested: https://github.com/jbencina/dojreleases
    
- See here for a codebook: https://docs.google.com/spreadsheets/d/1UopmSvFGrwJvz_c3Plh32Yxkqwff64oS_CcpfATOV8k/edit?usp=sharing

In [2]:
## first, unzip the combined.json.zip file
## then, run this code to load the unzipped json file and convert to a dataframe
## and convert some of the attributes from lists to values
## make sure to change the pathname if you need to
doj = pd.read_json("combined.json", lines = True)

## due to json, topics are in a list so remove them and concatenate with ;
doj['topics_clean'] = ["; ".join(topic) 
                      if len(topic) > 0 else "No topic" 
                      for topic in doj.topics]

## similarly with components
doj['components_clean'] = ["; ".join(comp) 
                           if len(comp) > 0 else "No component" 
                           for comp in doj.components]

## drop older columns from data
doj = doj[['id', 'title', 'contents', 'date', 'topics_clean', 
           'components_clean']].copy()

doj.head()


Unnamed: 0,id,title,contents,date,topics_clean,components_clean
0,,Convicted Bomb Plotter Sentenced to 30 Years,"PORTLAND, Oregon. – Mohamed Osman Mohamud, 23, who was convicted in 2013 of attempting to use a weapon of mass destruction (explosives) in connection with a plot to detonate a vehicle bomb at an annual Christmas tree lighting ceremony in Portland, was sentenced today to serve 30 years in prison, followed by a lifetime term of supervised release. Mohamud, a naturalized U.S. citizen from Somalia and former resident of Corvallis, Oregon, was arrested on Nov. 26, 2010, after he attempted to detonate what he believed to be an explosives-laden van that was parked near the tree lighting ceremony in Portland. The arrest was the culmination of a long-term undercover operation, during which Mohamud was monitored closely for months as his bomb plot developed. The device was in fact inert, and the public was never in danger from the device. At sentencing, United States District Court Judge Garr M. King, who presided over Mohamed’s 14-day trial, said “the intended crime was horrific,” and that the defendant, even though he was presented with options by undercover FBI employees, “never once expressed a change of heart.” King further noted that the Christmas tree ceremony was attended by up to 10,000 people, and that the defendant “wanted everyone to leave either dead or injured.” King said his sentence was necessary in view of the seriousness of the crime and to serve as deterrence to others who might consider similar acts. “With today’s sentencing, Mohamed Osman Mohamud is being held accountable for his attempted use of what he believed to be a massive bomb to attack innocent civilians attending a public Christmas tree lighting ceremony in Portland,” said John P. Carlin, Assistant Attorney General for National Security. “The evidence clearly indicated that Mohamud was intent on killing as many people as possible with his attack. Fortunately, law enforcement was able to identify him as a threat, insert themselves in the place of a terrorist that Mohamud was trying to contact, and thwart Mohamud’s efforts to conduct an attack on our soil. This case highlights how the use of undercover operations against would-be terrorists allows us to engage and disrupt those who wish to commit horrific acts of violence against the innocent public. The many agents, analysts, and prosecutors who have worked on this case deserve great credit for their roles in protecting Portland from the threat posed by this defendant and ensuring that he was brought to justice.” “This trial provided a rare glimpse into the techniques Al Qaeda employs to radicalize home-grown extremists,” said Amanda Marshall, U.S. Attorney for the District of Oregon. “With the sentencing today, the court has held this defendant accountable. I thank the dedicated professionals in the law enforcement and intelligence communities who were responsible for this successful outcome. I look forward to our continued work with Muslim communities in Oregon who are committed to ensuring that all young people are safe from extremists who seek to radicalize others to engage in violence.” According to the trial evidence, in February 2009, Mohamud began communicating via e-mail with Samir Khan, a now-deceased al Qaeda terrorist who published Jihad Recollections, an online magazine that advocated violent jihad, and who also published Inspire, the official magazine of al-Qaeda in the Arabian Peninsula. Between February and August 2009, Mohamed exchanged approximately 150 emails with Khan. Mohamud wrote several articles for Jihad Recollections that were published under assumed names. In August 2009, Mohamud was in email contact with Amro Al-Ali, a Saudi national who was in Yemen at the time and is today in custody in Saudi Arabia for terrorism offenses. Al-Ali sent Mohamud detailed e-mails designed to facilitate Mohamud’s travel to Yemen to train for violent jihad. In December 2009, while Al-Ali was in the northwest frontier province of Pakistan, Mohamud and Al-Ali discussed the possibility of Mohamud traveling to Pakistan to join Al-Ali in terrorist activities. Mohamud responded to Al-Ali in an e-mail: “yes, that would be wonderful, just tell me what I need to do.” Al-Ali referred Mohamud to a second associate overseas and provided Mohamud with a name and email address to facilitate the process. In the following months, Mohamud made several unsuccessful attempts to contact Al-Ali’s associate. Ultimately, an FBI undercover operative contacted Mohamud via email under the guise of being an associate of Al-Ali’s. Mohamud and the FBI undercover operative agreed to meet in Portland in July 2010. At the meeting, Mohamud told the FBI undercover operative he had written articles that were published in Jihad Recollections. Mohamud also said that he wanted to become “operational.” Asked what he meant by “operational,” Mohamud said he wanted to put an explosion together, but needed help. According to evidence presented at trial, at a meeting in August 2010, Mohamud told undercover FBI operatives he had been thinking of committing violent jihad since the age of 15. Mohamud then told the undercover FBI operatives that he had identified a potential target for a bomb: the annual Christmas tree lighting ceremony in Portland’s Pioneer Courthouse Square on Nov. 26, 2010. The undercover FBI operatives cautioned Mohamud several times about the seriousness of this plan, noting there would be many people at the event, including children, and emphasized that Mohamud could abandon his attack plans at any time with no shame. Mohamud indicated the deaths would be justified and that he would not mind carrying out a suicide attack on the crowd. According to evidence presented at trial, in the ensuing months Mohamud continued to express his interest in carrying out the attack and worked on logistics. On Nov. 4, 2010, Mohamud and the undercover FBI operatives traveled to a remote location in Lincoln County, Oregon, where they detonated a bomb concealed in a backpack as a trial run for the upcoming attack. During the drive back to Corvallis, Mohamud was asked if was capable looking at all the bodies of those who would be killed during the explosion. In response, Mohamud noted, “I want whoever is attending that event to be, to leave either dead or injured.” Mohamud later recorded a video of himself, with the assistance of the undercover FBI operatives, in which he read a statement that offered his rationale for his bomb attack. On Nov. 18, 2010, undercover FBI operatives picked up Mohamud to travel to Portland to finalize the details of the attack. On Nov. 26, 2010, just hours before the planned attack, Mohamud examined the 1,800 pound bomb in the van and remarked that it was “beautiful.” Later that day, Mohamud was arrested after he attempted to remotely detonate the inert vehicle bomb rked near the Christmas tree lighting ceremony This case was investigated by the FBI, with assistance from the Oregon State Police, the Corvallis Police Department, the Lincoln County Sheriff’s Office and the Portland Police Bureau. The prosecution was handled by Assistant U.S. Attorneys Ethan D. Knight and Pamala Holsinger from the U.S. Attorney’s Office for the District of Oregon. Trial Attorney Jolie F. Zimmerman, from the Counterterrorism Section of the Justice Department’s National Security Division, assisted. # # # 14-1077",2014-10-01T00:00:00-04:00,No topic,National Security Division (NSD)
1,12-919,$1 Million in Restitution Payments Announced to Preserve North Carolina Wetlands,"WASHINGTON – North Carolina’s Waccamaw River watershed will benefit from a $1 million restitution order from a federal court, funding environmental projects to acquire and preserve wetlands in an area damaged by illegal releases of wastewater from a corporate hog farm, announced Ignacia S. Moreno, Assistant Attorney General of the Justice Department’s Environment and Natural Resources Division; U.S. Attorney for the Eastern District of North Carolina Thomas G. Walker; Director Greg McLeod from the North Carolina State Bureau of Investigation; and Camilla M. Herlevich, Executive Director of the North Carolina Coastal Land Trust. Freedman Farms Inc. was sentenced in February 2012 to five years of probation and ordered to pay $1.5 million in fines, restitution and community service payments for violating the Clean Water Act when it discharged hog waste into a stream that leads to the Waccamaw River. William B. Freedman, president of Freedman Farms, was sentenced to six months in prison to be followed by six months of home confinement. Freedman Farms also is required to implement a comprehensive environmental compliance program and institute an annual training program. In an order issued on April 19, 2012, the court ordered that the defendants would be responsible for restitution of $1 million in the form of five annual payments starting in January 2013, which the court will direct to the North Carolina Coastal Land Trust (NCCLT). The NCCLT plans to use the money to acquire and conserve land along streams in the Waccamaw watershed. The court also directed a $75,000 community service payment to the Southern Environmental Enforcement Network, an organization dedicated to environmental law enforcement training and information sharing in the region. “The resolution of the case against Freedman Farms demonstrates the commitment of the Department of Justice to enforcing the Clean Water Act to ensure the protection of human health and the environment,” said Assistant Attorney General Moreno. “The court-ordered restitution in this case will conserve wetlands for the benefit of the people of North Carolina. By enforcing the nation’s environmental laws, we will continue to ensure that concentrated animal feeding operations (CAFOs) operate without threatening our drinking water, the health of our communities and the environment.” “This office is committed to doing our part to hold accountable those who commit crimes against our environment, which can cause serious health problems to residents and damage the environment that makes North Carolina such a beautiful place to live and visit,” said U.S. Attorney Walker. “This case shows what we can accomplish when our SBI agents work closely with their local, state and federal partners to investigate environmental crimes and hold the polluters accountable,” said Director McLeod. “We’ll continue our efforts to fight illegal pollution that damages our water and puts the public’s health at risk.” “The Waccamaw is unique and wild,” said Director Herlevich of the North Carolina Coastal Land Trust. “Its watershed includes some of the most extensive cypress gum swamps in the state, and its headwaters at Lake Waccamaw contain fish that are found nowhere else on Earth. We appreciate the trust of the court and the U. S. Attorney, and we look forward to using these funds for conservation projects in a river system that is one of our top conservation priorities.” According to evidence presented in court, in December 2007 Freedman Farms discharged hog waste into Browder’s Branch, a tributary to the Waccamaw River that flows through the White Marsh, a large wetlands complex. Freedman Farms, located in Columbus County, N.C., is in the business of raising hogs for market, and this particular farm had some 4,800 hogs. The hog waste was supposed to be directed to two lagoons for treatment and disposal. Instead, hog waste was discharged from Freedman Farms directly into Browder’s Branch. The Clean Water Act is a federal law that makes it illegal to knowingly or negligently discharge a pollutant into a water of the United States. The Freedman case was investigated by the U.S. Environmental Protection Agency (EPA) Criminal Investigation Division, the U.S. Army Corps of Engineers and the North Carolina State Bureau of Investigation, with assistance from the EPA Science and Ecosystem Support Division. The case was prosecuted by Assistant U.S. Attorney J. Gaston B. Williams of the Eastern District of North Carolina and Trial Attorney Mary Dee Carraway of the Environmental Crimes Section of the Justice Department’s Environment and Natural Resources Division. The North Carolina Coastal Land Trust is celebrating its 20th anniversary of saving special lands in eastern North Carolina. The organization has protected nearly 50,000 acres of lands with scenic, recreational, historic and ecological values. North Carolina Coastal Land Trust has saved streams and wetlands that provide clean water, forests that are havens for wildlife, working farms that provide local food and nature parks that everyone can enjoy. More information about the Coastal Land Trust is available at www.coastallandtrust.org.",2012-07-25T00:00:00-04:00,No topic,Environment and Natural Resources Division
2,11-1002,$1 Million Settlement Reached for Natural Resource Damages at Superfund Site in Massachusetts,"BOSTON– A $1-million settlement has been reached for natural resource damages (NRD) at the Blackburn & Union Privileges Superfund Site in Walpole, Mass., the Departments of Justice and Interior (DOI), and the Office of the Massachusetts Attorney General announced today. The Blackburn & Union Privileges Superfund Site includes 22 acres of contaminated land and water in Walpole. The contamination resulted from the operations of various industrial facilities dating back to the 19th century that exposed the site to asbestos, arsenic, lead and other hazardous substances. The private parties involved in the settlement include two former owners and operators of the site, W.R. Grace & Co.– Conn. and Tyco Healthcare Group LP, as well as the current owners, BIM Investment Corp. and Shaffer Realty Nominee Trust. From about 1915 to 1936, a predecessor of W.R. Grace manufactured asbestos brake linings and clutch linings on a large portion of the property. From 1946 to about 1983, a predecessor of Tyco Healthcare operated a cotton fabric manufacturing business, which used caustic solutions, on a portion of the property. In a 2010 settlement with U.S. Environmental Protection Agency (EPA), the four private parties agreed to perform a remedial action to clean up the site at an estimated cost of $13 million. The consent decree lodged today resolves both state and federal NRD liability claims; it requires the parties to pay $1,094,169.56 to the state and federal natural resource trustees, the Massachusetts Executive Office of Energy and Environmental Affairs (EEA) and DOI, for injuries to ecological resources including groundwater and wetlands, which provide habitat for waterfowl and wading birds, including black ducks and great blue herons. The trustees will use the settlement funds for natural resource restoration projects in the area. “This settlement demonstrates our commitment to recovering damages from the parties responsible for injury to natural resources, in partnership with state trustees,” said Bruce Gelber, Acting Deputy Assistant Attorney General of the Justice Department’s Environment and Natural Resources Division. “The citizens of Walpole have had to live with the environmental impact of this contamination for many years,” Attorney General Martha Coakley said. “We are pleased that today’s agreement will not only require the responsible parties to reimburse taxpayer dollars, but will also provide funding to begin restoring or replacing the wetland and other natural resources.” The consent decree was lodged in the U.S. District Court for Massachusetts. A portion of the funds, $300,000, will be distributed to the EEA-sponsored groundwater restoration projects; $575,000 will be used for ecological restoration projects jointly sponsored by EEA and the U.S. Fish and Wildlife Service (FWS). In addition, $125,000 will go for projects jointly sponsored by EEA and FWS that achieve both ecological and groundwater restoration; $57,491.34 will be allocated for reimbursement for the FWS’s assessment costs; and $36,678.22 will be distributed as reimbursement for the commonwealth’s assessment costs. “This settlement provides the means for a range of projects designed to compensate the public for decades of groundwater and other ecological damage at this site. I encourage local citizens and organizations to become engaged in the public process that will take place as we solicit, take comment on, and choose these projects in the months ahead,” said Energy and Environmental Affairs Secretary Richard K. Sullivan Jr., who serves as the Commonwealth’s Natural Resources Damages trustee. “This settlement will help restore habitat for fish and wildlife in the Neponset River watershed,” said Tom Chapman of the FWS New England Field Office. “We look forward to working with the commonwealth and local stakeholders to implement restoration.” “More than 100 years-worth of industrial activities at this site caused major environmental contamination to the Neponset River, nearby wetlands and to groundwater below the site,” said Commissioner Kenneth Kimmell of the Massachusetts Department of Environmental Protection (MassDEP), which will staff the Trustee Council for the Commonwealth. “We will ensure that the community and the public will be active participants in the process to use these NRD funds to restore the injured natural resources.” Under the federal Comprehensive Environmental Response, Compensation and Liability Act, EEA and DOI, acting through the FWS, are the designated state and federal natural resource Trustees for the site. The site has been listed on the EPA’s National Priorities List since 1994. The consent decree is subject to a public comment period and court approval. A copy of the consent decree and instructions about how to submit comments is available on www.usdoj.gov/enrd/Consent_Decrees.html . After the consent decree is approved, EEA and FWS will develop proposed restoration plans to use the settlement funds for restoration projects. The proposed restoration plans will also be made available to the public for review and comment. Assistant Attorney General Matthew Brock of Massachusetts Attorney General Coakley's Environmental Protection Division handled this matter. Attorney Jennifer Davis of MassDEP, Attorney Anna Blumkin of EEA and MassDEP’s NRD Coordinator Karen Pelto also worked on this settlement.",2011-08-03T00:00:00-04:00,No topic,Environment and Natural Resources Division
3,10-015,10 Las Vegas Men Indicted \r\nfor Falsifying Vehicle Emissions Tests,"WASHINGTON—A federal grand jury in Las Vegas today returned indictments against 10 Nevada-certified emissions testers for falsifying vehicle emissions test reports, the Justice Department announced. Each defendant faces one felony Clean Air Act count for falsifying reports between November 2007 and May 2009. The number of falsifications varied by defendant, with some defendants having falsified approximately 250 records, while others falsified more than double that figure. One defendant is alleged to have falsified over 700 reports. The individuals indicted include: Escudero resides in Pahrump, Nev. All other individuals are from Clark County, Nev. The 10 defendants are alleged to have engaged in a practice known as ""clean scanning"" vehicles. The scheme involved entering the Vehicle Identification Number (VIN) for a vehicle that would not pass the emissions test into the computerized system, then connecting a different vehicle the testers knew would pass the test. These falsifications were allegedly performed for anywhere from $10 to $100 over and above the usual emissions testing fee. The U.S. Environmental Protection Agency (EPA), under the Clean Air Act, requires the state of Nevada to conduct vehicle emissions testing in certain areas because the areas exceed national standards for carbon monoxide and ozone. Las Vegas is currently required to perform emissions testing. To obtain a registration renewal, vehicle owners bring the vehicles to a licensed inspection station for testing. The emissions inspector logs into a computer to activate the system by using a unique password issued to the emissions inspector. The emissions inspector manually inputs the vehicle’s VIN to identify the tested vehicle, then connects the vehicle for model year 1996 and later to an onboard diagnostics port connected to an analyzer. The analyzer downloads data from the vehicle’s computer, analyzes the data and provides a ""pass"" or ""fail"" result. The pass or fail result and vehicle identification data are reported on the Vehicle Inspection Report. It is a crime to knowingly alter or conceal any record or other document required to be maintained by the Clean Air Act. ""Falsifications of vehicle emissions testing, such as those alleged in the indictments unsealed today, are serious matters and we intend to use all of our enforcement tools to stop this harmful practice. These actions undermine a system that is designed to reduce air pollutants including smog and provide better air quality for the citizens of Nevada,"" said Ignacia S. Moreno, Assistant Attorney General for the Justice Department’s Environment and Natural Resources Division. ""The residents of Nevada deserve to know that the vast majority of licensed vehicle emission inspectors are not corrupt and are not circumventing emission testing procedures,"" said U.S. Attorney Bogden. ""These indictments should serve as a clear warning to offenders that the Department of Justice will prosecute you if you make fraudulent statements and reports concerning compliance with the federal Clean Air Act."" ""Lying about car emissions means dirtier air, which is especially of concern in areas like Las Vegas that are already experiencing air quality problems,"" said Cynthia Giles, Assistant Administrator for Enforcement and Compliance Assurance at EPA. ""We will take aggressive action to ensure communities have clean air."" The maximum penalty for the felony violations contained in the indictments includes up to two years in prison and a fine of up to $250,000. An indictment is merely an accusation, and a defendant is presumed innocent unless and until proven guilty in a court of law. The case was investigated by the EPA, Criminal Investigation Division; and the Nevada Department of Motor Vehicles Compliance Enforcement Division. The case is being prosecuted by the U.S. Attorney’s Office for the District of Nevada and the Justice Department’s Environmental Crimes Section.",2010-01-08T00:00:00-05:00,No topic,Environment and Natural Resources Division
4,18-898,"$100 Million Settlement Will Speed Cleanup Work at Centredale Manor Superfund Site in North Providence, R.I.","The U.S. Department of Justice, the U.S. Environmental Protection Agency (EPA), and the Rhode Island Department of Environmental Management (RIDEM) announced today that two subsidiaries of Stanley Black & Decker Inc.—Emhart Industries Inc. and Black & Decker Inc.—have agreed to clean up dioxin contaminated sediment and soil at the Centredale Manor Restoration Project Superfund Site in North Providence and Johnston, Rhode Island. “We are pleased to reach a resolution through collaborative work with the responsible parties, EPA, and other stakeholders,” said Acting Assistant Attorney General Jeffrey H. Wood for the Justice Department's Environment and Natural Resources Division . “Today’s settlement ends protracted litigation and allows for important work to get underway to restore a healthy environment for citizens living in and around the Centredale Manor Site and the Woonasquatucket River.” “This settlement demonstrates the tremendous progress we are achieving working with responsible parties, states, and our federal partners to expedite sites through the entire Superfund remediation process,” said EPA Acting Administrator Andrew Wheeler. “The Centredale Manor Site has been on the National Priorities List for 18 years; we are taking charge and ensuring the Agency makes good on its promise to clean it up for the betterment of the environment and those communities affected.” “Successfully concluding this settlement paves the way for EPA to make good on our commitment to aggressively pursue cleaning up the Centredale Manor Superfund Site,” said EPA New England Regional Administrator Alexandra Dunn. “We are excited to get to work on the cleanup at this site, and get it closer to the goal of being fully utilized by the North Providence and Johnston communities.” “We are pleased that the collective efforts of the State of Rhode Island, EPA, and DOJ in these negotiations have concluded in this major milestone toward the cleanup of the Centredale Manor Restoration Superfund site and are consistent with our long-standing efforts to make the polluter pay,” said RIDEM Director Janet Coit. “The settlement will speed up a remedy that protects public health and the river environment, and moves us closer to the day that we can reclaim recreational uses of this beautiful river resource.” The settlement, which includes cleanup work in the Woonasquatucket River (River) and bordering residential and commercial properties along the River, requires the companies to perform the remedy selected by EPA for the Site in 2012, which is estimated to cost approximately $100 million, and resolves longstanding litigation. The cleanup remedy includes excavation of contaminated sediment and floodplain soil from the Woonasquatucket River, including from adjacent residential properties. Once the cleanup remedy is completed, full access to the Woonasquatucket River should be restored for local citizens. The cleanup will be a step toward the State’s goal of a fishable and swimmable river. The work will also include upgrading caps over contaminated soil in the peninsula area of the Site that currently house two high-rise apartment buildings. The settlement also ensures that the long-term monitoring and maintenance of the site, as directed in the remedy, will be implemented to ensure that public health is protected. Under the settlement, Emhart and Black & Decker will reimburse EPA for approximately $42 million in past costs incurred at the Site. The companies will also reimburse EPA and the State of Rhode Island for future costs incurred by those agencies in overseeing the work required by the settlement. The settlement will also include payments on behalf of two federal agencies to resolve claims against those agencies. These payments, along with prior settlements related to the Site, will result in a 100 percent recovery for the United States of its past and future response costs related to the Site. Litigation related to the Site has been ongoing for nearly eight years. While the Federal District Court found Black & Decker and Emhart to be liable for their hazardous waste and responsible to conduct the cleanup of the Site, it had also ruled that EPA needed to reconsider certain aspects of that cleanup. EPA appealed the decision requiring it to reconsider aspects of the cleanup. This settlement, once entered by the District Court, will resolve the litigation between the United States, Rhode Island, and Emhart and Black and Decker, allowing the cleanup of the Site to begin. The Site spans a one and a half mile stretch of the Woonasquatucket River and encompasses a nine-acre peninsula, two ponds and a significant forested wetland. From the 1940s to the early 1970s, Emhart’s predecessor operated a chemical manufacturing facility on the peninsula and used a raw material that was contaminated with 2,3,7,8-tetrachlorodibenzo-p-dioxin, a toxic form of dioxin. The Site property was also previously used by a barrel refurbisher. Elevated levels of dioxins and other contaminants have been detected in soil, groundwater, sediment, surface water and fish. The Site was added to the National Priorities List (NPL) in 2000, and in December 2017, EPA included the Centredale Manor Restoration Project Superfund Site on a list of Superfund sites targeted for immediate and intense attention. Several short-term actions were previously performed at the Site to address immediate threats to the residents and minimize potential erosion and downstream transport of contaminated soil and sediment. This settlement is the latest agreement EPA has reached since the Site was listed on the NPL. Prior agreements addressed the performance and recovery of costs for the past environmental investigations and interim cleanup actions from Emhart, the barrel reconditioning company, the current owners of the peninsula portion of the Site, and other potentially responsible parties. The Consent Decree, lodged in the U.S. District Court of Rhode Island, will be posted in the Federal Register and available for public comment for a period of 30 days. The Consent Decree can be viewed on the Justice Department website: www.justice.gov/enrd/Consent_Decrees.html. EPA information on the Centredale Manor Superfund Site: www.epa.gov/superfund/centredale.",2018-07-09T00:00:00-04:00,Environment,Environment and Natural Resources Division


## 1. Tagging and sentiment scoring (16 points)

Focus on the following press release: `id` == "17-1204" about this pharmaceutical kickback prosecution: https://www.forbes.com/sites/michelatindera/2017/11/16/fentanyl-billionaire-john-kapoor-to-plead-not-guilty-in-opioid-kickback-case/?sh=21b8574d6c6c 

The `contents` column is the one we're treating as a document. You may need to to convert it from a pandas series to a single string.

We'll call the raw string of this press release `pharma`

In [3]:
contents = doj[doj['id'] == '17-1204']['contents']
pharma = contents.to_string(index=False)
type(pharma)
len(pharma)

str

9252

### 1.1 part of speech tagging (3 points)

A. Preprocess the `pharma` press release to remove all punctuation / digits (so can use `.isalpha()` to subset)

B. With the preprocessed press release from part A, use the part of speech tagger within nltk to tag all the words in that one press release with their part of speech. 

C. Using the output from B, extract the adjectives and sort those adjectives from most occurrences to fewest occurrences. Print a dataframe with the 5 most frequent adjectives and their counts in the `pharma` release. See here for a list of the names of adjectives within nltk: https://pythonprogramming.net/natural-language-toolkit-nltk-part-speech-tagging/

**Resources**:

- Documentation for .isalpha(): https://www.w3schools.com/python/ref_string_isalpha.asp
- `process_step1` function here has an example of tokenizing and filtering to words where .isalpha() is true: 
https://github.com/rebeccajohnson88/PPOL564_slides_activities/blob/main/activities/fall_22/solutions/09_textasdata_partII_topicmodeling_solution.ipynb 
- Part of speech tagging section of this code: 
https://github.com/rebeccajohnson88/PPOL564_slides_activities/blob/main/activities/fall_22/solutions/08_textasdata_partI_textmining_solutions.ipynb



In [4]:
tokens = [word for word in word_tokenize(pharma) if word.isalpha()]
# tokens

In [5]:
tokens_pos = pos_tag(tokens) # assign parts of speech for each token
# tokens_pos

In [6]:
from collections import Counter

# [element for element in list_of_elements]

# output = []
# for element in list_of_elements:
#   if element > 5:
#     if element < 99:
#        output.append(element)

tokens_adj = [pos for pos in tokens_pos if pos[1] in set(["JJ", "JJR", "JJS"]) and len(pos[0]) > 1]
adj_count = Counter(tokens_adj)
adj_count.most_common(5)

[(('former', 'JJ'), 8),
 (('opioid', 'JJ'), 5),
 (('nationwide', 'JJ'), 4),
 (('other', 'JJ'), 3),
 (('addictive', 'JJ'), 3)]

## 1.2 named entity recognition (3 points)



A. Using the original `pharma` press release (so the one before stripping punctuation/digits), use spaCy to extract all named entities from the press release.

B. Print the unique named entities with the tag: `LAW`. Here's some background on what RICO means: https://www.justia.com/criminal/docs/rico/ 

**Resources**:
- For parts A and B: named entity recognition part of this code: 
https://github.com/rebeccajohnson88/PPOL564_slides_activities/blob/main/activities/fall_22/solutions/08_textasdata_partI_textmining_solutions.ipynb

In [7]:
spacy_doc = nlp(pharma)
spacy_ents = spacy_doc.ents
LAW = set([ent.text for ent in spacy_ents if ent.label_ == "LAW"])
LAW

{'RICO', 'the Controlled Substances Act'}

C. You want to extract the possible sentence lengths the CEO is facing; pull out the named entities with (1) the label `DATE` and (2) that contain the word "year" or "years". Print these named entities.

**Hint:**  
You may want to use the `re` module for the second part.

In [9]:
# %pip install session_info

In [10]:
# %pip install spacy==3.4.2

In [11]:
# import session_info
# session_info.show()

In [8]:
CEO = set([ent.text for ent in spacy_ents if ent.label_ == "DATE" and "year" in ent.text])
CEO

{'20 years', 'last year', 'three years'}

D. Parse the pharma string at the sentence level. Note that this involves more than just splitting on each `.`; for full credit, add at least one additional delimiter that marks the end of the sentence.

Then, using those sentences, pull and print the original sentences from the press releases where those year lengths are mentioned. Describe in your own words (1 sentence) what length of sentence (prison) and probation (supervised release) the CEO may be facing if convicted after this indictment (if there are multiple lengths mentioned describe the maximum). 

**Hint:**  
You may want to use re.search or re.findall 

**Resources**:

- re.search and re.findall examples here for filtering to ones containing year (multiple approaches; some need not involve `re`): 
https://github.com/rebeccajohnson88/PPOL564_slides_activities/blob/main/activities/fall_22/solutions/07_regex_solutions.ipynb


In [9]:
# This is how we do it in practice
LL = [[sent for sent in spacy_doc.sents if str(e) in str(sent)] for e in CEO]
{s for l in LL for s in l}

# To show we can do it the way we've been told
pharma_clean=pharma.replace(u'\xa0',u'')
re.findall(r'([^.]+?%s+[^.]*)'%'year',pharma_clean)

{ "More than 20,000 Americans died of synthetic opioid overdoses last year, and millions are addicted to opioids.,
 The charges of conspiracy to commit RICO and conspiracy to commit mail and wire fraud each provide for a sentence of no greater than 20 years in prison, three years of supervised release and a fine of $250,000, or twice the amount of pecuniary gain or loss.,
   The charges of conspiracy to violate the Anti-Kickback Law provide for a sentence of no greater than five years in prison, three years of supervised release and a $25,000 fine.}

['"More than 20,000 Americans died of synthetic opioid overdoses last year, and millions are addicted to opioids',
 'The charges of conspiracy to commit RICO and conspiracy to commit mail and wire fraud each provide for a sentence of no greater than 20 years in prison, three years of supervised release and a fine of $250,000, or twice the amount of pecuniary gain or loss',
 ' The charges of conspiracy to violate the Anti-Kickback Law provide for a sentence of no greater than five years in prison, three years of supervised release and a $25,000 fine']

A RICO conviction could bring 20 years prison time and three years probation; but an Anti-Kickback conviction could bring five years prison time and three years probation.

## 1.3 sentiment analysis  (10 points)

- Sentiment analysis section of this script: 
https://github.com/rebeccajohnson88/PPOL564_slides_activities/blob/main/activities/fall_22/solutions/08_textasdata_partI_textmining_solutions.ipynb


A. Subset the press releases to those labeled with one of three topics via `topics_clean`: Civil Rights, Hate Crimes, and Project Safe Childhood. We'll call this `doj_subset` going forward and it should have 717 rows.



In [10]:
doj_subset=doj.loc[(doj.topics_clean=='Civil Rights')|(doj.topics_clean=='Hate Crimes')|(doj.topics_clean=='Project Safe Childhood')].reset_index()
doj_subset.shape
doj_subset.loc[doj_subset.id=='14-248']
doj_subset.loc[doj_subset.id=='16-718']

(717, 7)

Unnamed: 0,index,id,title,contents,date,topics_clean,components_clean
13,329,14-248,Albuquerque Man Charged with Federal Hate Crime Related to Anti-Semitic Threats Against Businesswoman,"The Department of Justice announced that this morning John W. Ng, 58, of Albuquerque, N.M., made his initial appearance in federal court on a criminal complaint charging him with a hate crime offense. This charge is related to anti-Semitic threats Ng made against a Jewish woman who owns and operates the Nosh Jewish Delicatessen and Bakery in Albuquerque. Ng was arrested by the FBI on March 7, 2014, based on a criminal complaint alleging that he interfered with the victim’s federally protected rights by threatening her and interfering with her business because of her religion. According to the criminal complaint, between Jan. 22, 2014, and Feb. 8, 2014, Ng allegedly posted threatening anti-Semitic notes on and in the vicinity of the victim’s business. A criminal complaint merely establishes probable cause, and Ng is presumed innocent unless proven guilty. If convicted on the offense charged in the criminal complaint, Ng faces a maximum statutory penalty of one year in prison. This matter was investigated by the Albuquerque Division of the FBI and is being prosecuted by Assistant U.S. Attorney Mark T. Baker of the U.S. Attorney’s Office for the District of New Mexico and Trial Attorney AeJean Cha of the U.S. Department of Justice’s Civil Rights Division.",2014-03-10T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section


Unnamed: 0,index,id,title,contents,date,topics_clean,components_clean
632,11593,16-718,Three Mississippi Correctional Officers Indicted for Inmate Assault and Cover-Up,"In a nine-count indictment unsealed today, two Mississippi correctional officers were charged with beating an inmate and a third was charged with helping to cover it up. The indictment charged Lawardrick Marsher, 28, and Robert Sturdivant, 47, officers at Mississippi State Penitentiary, in Parchman, Mississippi, with a beating that included kicking, punching and throwing the victim to the ground. Marsher and Sturdivant were charged with violating the right of K.H., a convicted prisoner, to be free from cruel and unusual punishment. Sturdivant was also charged with failing to intervene while Marsher was punching and beating K.H. The indictment alleges that their actions involved the use of a dangerous weapon and resulted in bodily injury to the victim. A third officer, Deonte Pate, 23, was charged along with Marsher and Sturdivant for conspiring to cover up the beating. The indictment alleges that all three officers submitted false reports and that all three lied to the FBI. If convicted, Marsher and Sturdivant face a maximum sentence of 10 years in prison on the excessive force charges. Each of the three officers faces up to five years in prison on the conspiracy and false statement charges, and up to 20 years in prison on the false report charges. An indictment is merely an accusation, and the defendants are presumed innocent unless and until proven guilty. This case is being investigated by the FBI’s Jackson Division, with the cooperation of the Mississippi Department of Corrections. It is being prosecuted by Assistant U.S. Attorney Robert Coleman of the Northern District of Mississippi and Trial Attorney Dana Mulhauser of the Civil Rights Division’s Criminal Section. Marsher Indictment",2016-06-21T00:00:00-04:00,Civil Rights,"Civil Rights Division; Civil Rights - Criminal Section; USAO - Mississippi, Northern"


B. Write a function that takes one press release string as an input and:

- Removes named entities from each press release string (**Hint:** you may want to use `re.sub` with an or condition)
- Scores the sentiment of the entire press release using the `SentimentIntensityAnalyzer` and `polarity_scores`
- Returns the length-four (negative, positive, neutral, compound) sentiment dictionary (any order is fine)

Apply that function to each of the press releases in `doj_subset`. 

**Hints**: 

- I used a function + list comprehension to execute and it takes about 30 seconds on my local machine; if it's taking a very long time, you may want to check your code for inefficiencies. If you can't fix those, for partial credit on this part/full credit on remainder, you can take a small random sample of the 717


In [11]:
def sentiment_analysis(press_release_string):
    spacy_press_release_string = nlp(press_release_string)
    all_entities = [one_element.text for one_element in spacy_press_release_string.ents]
    long_statement = "|".join(all_entities).replace('(','').replace(')','')
    new_str = re.sub(long_statement, '', press_release_string)
    sent_obj = SentimentIntensityAnalyzer()
    sentiment = sent_obj.polarity_scores(new_str)
    return sentiment

sentiment_analysis(doj_subset.contents.iloc[11])

{'neg': 0.105, 'neu': 0.832, 'pos': 0.063, 'compound': -0.9153}

In [12]:
all_sentiment=[sentiment_analysis(press_release_string) for press_release_string in doj_subset.contents]
all_sentiment

[{'neg': 0.201, 'neu': 0.75, 'pos': 0.05, 'compound': -0.9931},
 {'neg': 0.136, 'neu': 0.795, 'pos': 0.07, 'compound': -0.9325},
 {'neg': 0.094, 'neu': 0.828, 'pos': 0.078, 'compound': -0.7579},
 {'neg': 0.126, 'neu': 0.79, 'pos': 0.084, 'compound': -0.9037},
 {'neg': 0.178, 'neu': 0.777, 'pos': 0.044, 'compound': -0.9864},
 {'neg': 0.147, 'neu': 0.8, 'pos': 0.052, 'compound': -0.987},
 {'neg': 0.161, 'neu': 0.76, 'pos': 0.078, 'compound': -0.9595},
 {'neg': 0.088, 'neu': 0.842, 'pos': 0.071, 'compound': -0.6597},
 {'neg': 0.105, 'neu': 0.834, 'pos': 0.061, 'compound': -0.9136},
 {'neg': 0.168, 'neu': 0.776, 'pos': 0.057, 'compound': -0.9801},
 {'neg': 0.216, 'neu': 0.741, 'pos': 0.043, 'compound': -0.9972},
 {'neg': 0.105, 'neu': 0.832, 'pos': 0.063, 'compound': -0.9153},
 {'neg': 0.085, 'neu': 0.848, 'pos': 0.067, 'compound': -0.6486},
 {'neg': 0.315, 'neu': 0.654, 'pos': 0.031, 'compound': -0.9951},
 {'neg': 0.175, 'neu': 0.758, 'pos': 0.067, 'compound': -0.9889},
 {'neg': 0.126, 'n

C. Add the four sentiment scores to the `doj_subset` dataframe to create a dataframe: `doj_subset_wscore`. Sort from highest to lowest neg score (so most negative to least negative) and print the `id`, `contents`, and `neg` columns of the two most negative press releases. 

Notes:

- Don't worry if your sentiment score differs slightly from our output on GitHub; differences in preprocessing can lead to diff scores

In [13]:
df_all_sentiment = pd.DataFrame.from_dict(all_sentiment)

doj_subset_wscore = pd.concat([doj_subset, df_all_sentiment], axis=1).sort_values(by='neg', ascending=False)
doj_subset_wscore.loc[:, ['id', 'contents', 'neg']].head(n=2)

Unnamed: 0,id,contents,neg
13,14-248,"The Department of Justice announced that this morning John W. Ng, 58, of Albuquerque, N.M., made his initial appearance in federal court on a criminal complaint charging him with a hate crime offense. This charge is related to anti-Semitic threats Ng made against a Jewish woman who owns and operates the Nosh Jewish Delicatessen and Bakery in Albuquerque. Ng was arrested by the FBI on March 7, 2014, based on a criminal complaint alleging that he interfered with the victim’s federally protected rights by threatening her and interfering with her business because of her religion. According to the criminal complaint, between Jan. 22, 2014, and Feb. 8, 2014, Ng allegedly posted threatening anti-Semitic notes on and in the vicinity of the victim’s business. A criminal complaint merely establishes probable cause, and Ng is presumed innocent unless proven guilty. If convicted on the offense charged in the criminal complaint, Ng faces a maximum statutory penalty of one year in prison. This matter was investigated by the Albuquerque Division of the FBI and is being prosecuted by Assistant U.S. Attorney Mark T. Baker of the U.S. Attorney’s Office for the District of New Mexico and Trial Attorney AeJean Cha of the U.S. Department of Justice’s Civil Rights Division.",0.315
632,16-718,"In a nine-count indictment unsealed today, two Mississippi correctional officers were charged with beating an inmate and a third was charged with helping to cover it up. The indictment charged Lawardrick Marsher, 28, and Robert Sturdivant, 47, officers at Mississippi State Penitentiary, in Parchman, Mississippi, with a beating that included kicking, punching and throwing the victim to the ground. Marsher and Sturdivant were charged with violating the right of K.H., a convicted prisoner, to be free from cruel and unusual punishment. Sturdivant was also charged with failing to intervene while Marsher was punching and beating K.H. The indictment alleges that their actions involved the use of a dangerous weapon and resulted in bodily injury to the victim. A third officer, Deonte Pate, 23, was charged along with Marsher and Sturdivant for conspiring to cover up the beating. The indictment alleges that all three officers submitted false reports and that all three lied to the FBI. If convicted, Marsher and Sturdivant face a maximum sentence of 10 years in prison on the excessive force charges. Each of the three officers faces up to five years in prison on the conspiracy and false statement charges, and up to 20 years in prison on the false report charges. An indictment is merely an accusation, and the defendants are presumed innocent unless and until proven guilty. This case is being investigated by the FBI’s Jackson Division, with the cooperation of the Mississippi Department of Corrections. It is being prosecuted by Assistant U.S. Attorney Robert Coleman of the Northern District of Mississippi and Trial Attorney Dana Mulhauser of the Civil Rights Division’s Criminal Section. Marsher Indictment",0.296


D. With the dataframe from part C, find the mean compound sentiment score for each of the three topics in `topics_clean` using group_by and agg.

E. Add a 1 sentence interpretation of why we might see the variation in scores (remember that compound is a standardized summary where -1 is most negative; +1 is most positive)


In [14]:
doj_subset_wscore.groupby('topics_clean').agg({'compound':np.mean})


Unnamed: 0_level_0,compound
topics_clean,Unnamed: 1_level_1
Civil Rights,-0.10069
Hate Crimes,-0.935284
Project Safe Childhood,-0.661069


This shows that the words describing Civil Rights abuses are only slightly less negative compared to those used with more severe crimes like child abuse or hate crimes.

# 2. Topic modeling (25 points)

For this question, use the `doj_subset_wscores` data that is restricted to civil rights, hate crimes, and project safe childhood and with the sentiment scores added


## 2.1 Preprocess the data by removing stopwords, punctuation, and non-alpha words (5 points)

A. Write a function that:

- Takes in a single raw string in the `contents` column from that dataframe
- Does the following preprocessing steps:

    - Converts the words to lowercase
    - Removes stopwords, adding the custom stopwords in the code cell below to the default stopwords list
    - Only retains alpha words (so removes digits and punctuation)
    - Only retains words 4 characters or longer
    - Uses the snowball stemmer from nltk to stem
    - Returns a joined preprocessed string (so if press release is something like "The CEO was indicted" it might return "ceo indict" 
    
B. Use `apply` or list comprehension to execute that function and create a new column in the data called `processed_text`. Note: there will be a deduction if your code uses a non-list comprehension for loop that uses append.
    
C. Print the `id`, `contents`, and `processed_text` columns for the following press releases:

id = 16-718 (this case: https://www.seattletimes.com/nation-world/doj-miami-police-reach-settlement-in-civil-rights-case/)

id = 16-217 (this case: https://www.wlbt.com/story/32275512/three-mississippi-correctional-officers-indicted-for-inmate-assault-and-cover-up/)
    
**Resources**:

- Here's code examples for the snowball stemmer: https://www.geeksforgeeks.org/snowball-stemmer-nlp/
- Here's code with topic modeling steps: 
https://github.com/rebeccajohnson88/PPOL564_slides_activities/blob/main/activities/fall_22/solutions/09_textasdata_partII_topicmodeling_solution.ipynb

In [15]:
custom_doj_stopwords = ["civil", "rights", "division", "department", "justice",
                        "office", "attorney", "district", "case", "investigation", "assistant",
                       "trial", "assistance", "assist"]

In [16]:
def text_process(raw_string):
    try:
        # Load stopwords list and augment with our own custom ones
        list_stopwords = stopwords.words("english")
        list_stopwords_new = list_stopwords + custom_doj_stopwords
        
        ## convert to lowercase
        lower = raw_string.lower()
        
        # initialize stemmer
        porter = PorterStemmer()

        nostop_listing = [word for word in wordpunct_tokenize(lower)
                          if word not in list_stopwords_new]
        clean_listing = [porter.stem(word) for word in nostop_listing
                        if word.isalpha() 
                        and len(word) > 3]
        clean_listing_str = " ".join(clean_listing)
        return(clean_listing_str)
    except:
        return("")

In [17]:
cleaned_listings = [text_process(raw_string) for raw_string in 
                   doj_subset_wscore.contents]
doj_subset_wscore['processed_text'] = cleaned_listings

In [18]:
doj_subset_wscore.loc[(doj_subset_wscore.id=='16-718')|(doj_subset_wscore.id=='16-217')][['id','contents','processed_text']]

Unnamed: 0,id,contents,processed_text
632,16-718,"In a nine-count indictment unsealed today, two Mississippi correctional officers were charged with beating an inmate and a third was charged with helping to cover it up. The indictment charged Lawardrick Marsher, 28, and Robert Sturdivant, 47, officers at Mississippi State Penitentiary, in Parchman, Mississippi, with a beating that included kicking, punching and throwing the victim to the ground. Marsher and Sturdivant were charged with violating the right of K.H., a convicted prisoner, to be free from cruel and unusual punishment. Sturdivant was also charged with failing to intervene while Marsher was punching and beating K.H. The indictment alleges that their actions involved the use of a dangerous weapon and resulted in bodily injury to the victim. A third officer, Deonte Pate, 23, was charged along with Marsher and Sturdivant for conspiring to cover up the beating. The indictment alleges that all three officers submitted false reports and that all three lied to the FBI. If convicted, Marsher and Sturdivant face a maximum sentence of 10 years in prison on the excessive force charges. Each of the three officers faces up to five years in prison on the conspiracy and false statement charges, and up to 20 years in prison on the false report charges. An indictment is merely an accusation, and the defendants are presumed innocent unless and until proven guilty. This case is being investigated by the FBI’s Jackson Division, with the cooperation of the Mississippi Department of Corrections. It is being prosecuted by Assistant U.S. Attorney Robert Coleman of the Northern District of Mississippi and Trial Attorney Dana Mulhauser of the Civil Rights Division’s Criminal Section. Marsher Indictment",nine count indict unseal today mississippi correct offic charg beat inmat third charg help cover indict charg lawardrick marsher robert sturdiv offic mississippi state penitentiari parchman mississippi beat includ kick punch throw victim ground marsher sturdiv charg violat right convict prison free cruel unusu punish sturdiv also charg fail interven marsher punch beat indict alleg action involv danger weapon result bodili injuri victim third offic deont pate charg along marsher sturdiv conspir cover beat indict alleg three offic submit fals report three lie convict marsher sturdiv face maximum sentenc year prison excess forc charg three offic face five year prison conspiraci fals statement charg year prison fals report charg indict mere accus defend presum innoc unless proven guilti investig jackson cooper mississippi correct prosecut robert coleman northern mississippi dana mulhaus crimin section marsher indict
313,16-217,"The Justice Department has reached a comprehensive settlement agreement with the city of Miami and the Miami Police Department (MPD) resolving the Justice Department’s investigation of officer-involved shootings by MPD officers, announced Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division and U.S. Attorney Wifredo A. Ferrer of the Southern District of Florida. The settlement, which was approved by Miami’s city commission today and will go into effect when the agreement is signed by all parties, resolves claims stemming from the Justice Department’s investigation into officer-involved shootings by MPD officers, which was conducted under the Violent Crime Control and Law Enforcement Act of 1994. The investigation’s findings, issued in July 2013, identified a pattern or practice of excessive use of force through officer-involved shootings in violation of the Fourth Amendment of the Constitution. The city’s compliance with the settlement will be monitored by an independent reviewer, former Tampa, Florida, Police Chief Jane Castor. Under the settlement agreement, the city will implement comprehensive reforms to ensure constitutional policing and support public trust. The settlement agreement is designed to minimize officer-involved shootings and to more effectively and quickly investigate officer-involved shootings that do occur, through measures that include: “This settlement represents a renewed commitment by the city of Miami and Chief Rodolfo Llanes to provide constitutional policing for Miami residents and to protect public safety through sustainable reform,” said Principal Deputy Assistant Attorney General Gupta. “The agreement will help to strengthen the relationship between the MPD and the communities they serve by improving accountability for officers who fire their weapons unlawfully, and provides for community participation in the enforcement of this agreement.” “Today's agreement is the result of a joint effort between the Department of Justice and the City of Miami to ensure that the Miami Police Department continues its efforts to make our community safe while protecting the sacred Constitutional rights of all of our citizens,” said U.S. Attorney Ferrer. “Through oversight and communication, the agreement seeks to make permanent the positive changes that former Chief Orosa and Chief Llanes have made, and we applaud the City Commission’s vote.” The settlement agreement builds upon important reforms implemented by the city since the Justice Department issued its findings, including: The investigation was conducted by attorneys and staff from the Civil Rights Division’s Special Litigation Section and the Civil Division of the U. S. Attorney’s Office of the Southern District of Florida.",reach comprehens settlement agreement citi miami miami polic resolv offic involv shoot offic announc princip deputi gener vanita gupta head wifredo ferrer southern florida settlement approv miami citi commiss today effect agreement sign parti resolv claim stem offic involv shoot offic conduct violent crime control enforc find issu juli identifi pattern practic excess forc offic involv shoot violat fourth amend constitut citi complianc settlement monitor independ review former tampa florida polic chief jane castor settlement agreement citi implement comprehens reform ensur constitut polic support public trust settlement agreement design minim offic involv shoot effect quickli investig offic involv shoot occur measur includ settlement repres renew commit citi miami chief rodolfo llane provid constitut polic miami resid protect public safeti sustain reform said princip deputi gener gupta agreement help strengthen relationship commun serv improv account offic fire weapon unlaw provid commun particip enforc agreement today agreement result joint effort citi miami ensur miami polic continu effort make commun safe protect sacr constitut citizen said ferrer oversight commun agreement seek make perman posit chang former chief orosa chief llane made applaud citi commiss vote settlement agreement build upon import reform implement citi sinc issu find includ conduct attorney staff special litig section southern florida


## 2.2 Create a document-term matrix from the preprocessed press releases and to explore top words (5 points)

A. Use the `create_dtm` function I provide (alternately, feel free to write your own!) and create a document-term matrix using the preprocessed press releases; make sure metadata contains the following columns: `id`, `compound` sentiment column you added, and the `topics_clean` column

B. Print the top 10 words for press releases with compound sentiment in the top 5% (so the most positive sentiment)

C. Print the top 10 words for press releases with compound sentiment in the bottom 5% (so the most negative sentiment)

**Hint**: for these, remember the pandas quantile function from pset two.  

D. Print the top 10 words for press releases in each of the three `topics_clean`

For steps B - D, to receive full credit, write a function `get_topwords` that helps you avoid duplicated code when you find top words for the different subsets of the data. There are different ways to structure it but one way is to feed it subsetted data (so data subsetted to one topic etc.) and for it to get the top words for that subset.

**Resources**:

- Here contains an example of applying the create_dtm function: 
https://github.com/rebeccajohnson88/PPOL564_slides_activities/blob/main/activities/fall_22/solutions/09_textasdata_partII_topicmodeling_solution.ipynb


In [19]:
def create_dtm(list_of_strings, metadata):
    vectorizer = CountVectorizer(lowercase = True)
    dtm_sparse = vectorizer.fit_transform(list_of_strings)
    dtm_dense_named = pd.DataFrame(dtm_sparse.todense(), 
        columns=vectorizer.get_feature_names())
    metadata.columns = ["metadata_" + col for col in metadata.columns]
    dtm_dense_named_withid = pd.concat([metadata.reset_index(), dtm_dense_named], axis = 1)
    return(dtm_dense_named_withid)

In [20]:
doj_subset_wscore_small = doj_subset_wscore.loc[~doj_subset_wscore.processed_text.isnull(),
           ['id', 'compound', 'processed_text','topics_clean']].copy().sample(n=700, random_state = 9899)

dtm_nopre = create_dtm(list_of_strings = doj_subset_wscore_small.processed_text,
                metadata = doj_subset_wscore_small[['id', 'compound', 'topics_clean']])

dtm_nopre.head()

Unnamed: 0,index,metadata_id,metadata_compound,metadata_topics_clean,aaron,abandon,abbat,abbi,abbott,abdomen,...,zane,zealand,zealou,zeeman,zero,zionism,zobel,zone,zunggeemog,zwengel
0,311,16-529,0.9727,Civil Rights,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1,135,15-1290,-0.9524,Project Safe Childhood,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,37,17-1277,-0.9775,Hate Crimes,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,146,17-181,-0.9781,Civil Rights,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,120,17-160,-0.9744,Hate Crimes,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [21]:
## A.
def get_topwords(subsetted_list_of_string):
    term=subsetted_list_of_string[[col for col in subsetted_list_of_string.columns
                      if "metadata" not in col and
                      col != "index"]].sum(axis = 0)
    topwords=term.sort_values(ascending = False).head(n=10)
    return(topwords)

In [22]:
## B.Print the top 10 words for press releases with compound sentiment in the top 5%
dtm_nopre_top = dtm_nopre.loc[dtm_nopre.metadata_compound >= dtm_nopre.metadata_compound.quantile(0.95)]
get_topwords(dtm_nopre_top)

## C.Print the top 10 words for press releases with compound sentiment in the bottom 5%
dtm_nopre_bottom = dtm_nopre.loc[dtm_nopre.metadata_compound <= dtm_nopre.metadata_compound.quantile(0.05)]
get_topwords(dtm_nopre_bottom)

agreement     171
enforc        129
commun        114
ensur         108
state         108
disabl        107
court          89
student        87
servic         86
settlement     86
dtype: int64

assault     188
crime       170
victim      165
offic       155
hate        129
defend      126
conspir     115
american    103
sentenc     100
anderson     94
dtype: int64

In [23]:
# D. Print the top 10 words for press releases in each of the three 'topics_clean'
Civil_Rights=dtm_nopre.loc[dtm_nopre.metadata_topics_clean=='Civil Rights']
get_topwords(Civil_Rights)
Project_Safe_Childhood=dtm_nopre.loc[dtm_nopre.metadata_topics_clean=='Project Safe Childhood']
get_topwords(Project_Safe_Childhood)
Hate_Crimes=dtm_nopre.loc[dtm_nopre.metadata_topics_clean=='Hate Crimes']
get_topwords(Hate_Crimes)

offic        624
discrimin    602
hous         596
enforc       532
disabl       522
said         484
feder        468
violat       464
state        444
agreement    408
dtype: int64

child          993
exploit        689
sexual         560
safe           466
childhood      461
project        460
pornographi    433
children       417
crimin         396
prosecut       365
dtype: int64

victim      587
crime       555
hate        522
defend      479
prosecut    471
charg       458
sentenc     453
american    445
feder       425
guilti      424
dtype: int64

## 2.3 Estimate a topic model using those preprocessed words (5 points)

A. Going back to the preprocessed words from part 2.1, estimate a topic model with 3 topics, since you want to see if the unsupervised topic models recover different themes for each of the three manually-labeled topics (civil rights; hate crimes; project safe childhood). You have free rein over the other topic model parameters beyond the number of topics.

B. After estimating the topic model, print the top 15 words in each topic.

**Hints and Resources**:

- Same topic modeling resources linked to above
- Make sure to use the `random_state` argument within the model so that the numbering of topics does not move around between runs of your code

In [24]:
## A.
doj_subset_wscore = doj_subset_wscore[doj_subset_wscore.processed_text != ""].copy()

# Tokenize the processed_text;
tokenized_processed_text = [wordpunct_tokenize(one_text) 
                for one_text in 
                doj_subset_wscore.processed_text]

### create dictionary
text_proc_dict = corpora.Dictionary(tokenized_processed_text)

### filter dictionary- using 2% as bounds
text_proc_dict.filter_extremes(no_below = round(doj_subset_wscore.shape[0]*0.02),
                             no_above = round(doj_subset_wscore.shape[0]*0.98))

### create corpus from dictionary
corpus_fromdict_proc = [text_proc_dict.doc2bow(one_text) 
                       for one_text in tokenized_processed_text]

### estimate model
n_topics = 3
ldamod_proc = gensim.models.ldamodel.LdaModel(corpus_fromdict_proc, 
                                         num_topics = n_topics, id2word=text_proc_dict, 
                                         passes=6, alpha = 'auto',
                                         per_word_topics = True, random_state = 91988)

In [25]:
import pprint
pp = pprint.PrettyPrinter(indent=4)

topics = ldamod_proc.print_topics(num_words = 15)
for topic in topics:
    pp.pprint(topic[1])
    print("")

('0.017*"hous" + 0.016*"discrimin" + 0.015*"disabl" + 0.010*"agreement" + '
 '0.009*"state" + 0.008*"said" + 0.008*"enforc" + 0.008*"fair" + '
 '0.008*"settlement" + 0.008*"court" + 0.008*"requir" + 0.007*"individu" + '
 '0.007*"alleg" + 0.007*"employ" + 0.007*"inform"')

('0.013*"victim" + 0.013*"child" + 0.012*"sentenc" + 0.012*"prosecut" + '
 '0.011*"charg" + 0.010*"guilti" + 0.010*"feder" + 0.009*"year" + '
 '0.009*"investig" + 0.009*"exploit" + 0.008*"defend" + 0.008*"prison" + '
 '0.008*"sexual" + 0.008*"crime" + 0.008*"crimin"')

('0.012*"offic" + 0.011*"enforc" + 0.011*"religi" + 0.010*"commun" + '
 '0.009*"said" + 0.008*"gener" + 0.008*"feder" + 0.007*"counti" + '
 '0.007*"protect" + 0.007*"violat" + 0.007*"state" + 0.007*"court" + '
 '0.006*"school" + 0.006*"today" + 0.006*"polic"')



## 2.4 Add topics back to main data and explore correlation between manual labels and our estimated topics (10 points)

A. Extract the document-level topic probabilities. Within `get_document_topics`, use the argument `minimum_probability` = 0 to make sure all 3 topic probabilities are returned. Write an assert statement to make sure the length of the list is equal to the number of rows in the `doj_subset_wscores` dataframe

B. Add the topic probabilities to the `doj_subset_wscores` dataframe as columns and create a column, `top_topic`, that reflects each document to its highest-probability topic (eg topic 1, 2, or 3)

C. For each of the manual labels in `topics_clean` (Hate Crime, Civil Rights, Project Safe Childhood), print the breakdown of the % of documents with each top topic (so, for instance, Hate Crime has 246 documents-- if 123 of those documents are coded to topic_1, that would be 50%; and so on). 
**Hint**:    
pd.crosstab and normalize may be helpful: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.crosstab.html

D. Using a couple press releases as examples, write a 1-2 sentence interpretation of why some of the manual topics map on more cleanly to an estimated topic than other manual topic(s)

**Resources**:

- End of this code (`Additional summaries of topics and documents`) contains example of how to use `get_document_topics` and other steps to add topic probabilities back to data: 
https://github.com/rebeccajohnson88/PPOL564_slides_activities/blob/main/activities/fall_22/solutions/09_textasdata_partII_topicmodeling_solution.ipynb
- If you're getting errors, use shape, len, and other commands to check the dimensionality of things at different steps since documents may be dropped if they contain no words post-processing 

In [26]:
topic_probs_bydoc = [ldamod_proc.get_document_topics(item,minimum_probability=0) for item in corpus_fromdict_proc]
one_list_tup = topic_probs_bydoc[0]
assert_state = np.where(len(topic_probs_bydoc) == len(doj_subset_wscore), True, False)
assert_state

array(True)

In [27]:
## B.Add those topic probabilities to the dataframe
## create a long form dataframe by flattening the list
topic_probs_bydoc_long = pd.DataFrame([t for lst in topic_probs_bydoc for t in lst],
                                     columns = ['topic', 'probability'])

## add id var- we're repeating each id in the original data k times
## for the number of topics
topic_probs_bydoc_long['doc_id'] = list(np.concatenate([[one_id] * 
                                    n_topics for one_id in doj_subset_wscore.id]).flat)

## pivot to wide format
topic_probs_bydoc_wide = pd.pivot_table(topic_probs_bydoc_long, index = ['doc_id'],
                        columns = ['topic']).reset_index().reset_index(drop = True)
topic_probs_bydoc_wide.columns = ['doc_id'] + ["topic_" + str(i+1) for i in np.arange(0, n_topics)]

## merge with original data using doc id
topic_wmeta = pd.merge(topic_probs_bydoc_wide,
                      doj_subset_wscore,
                      left_on = 'doc_id',
                      right_on = 'id')

## create indicator for listing's top topic
topic_wmeta['top_topic'] = topic_wmeta[[col for col in topic_wmeta.columns if 
                                    "topic_" in col]].idxmax(axis=1)
topic_wmeta.sample(n = 5, random_state = 555)

Unnamed: 0,doc_id,topic_1,topic_2,topic_3,index,id,title,contents,date,topics_clean,components_clean,neg,neu,pos,compound,processed_text,top_topic
259,15-1263,0.000359,0.874648,0.124993,4215,15-1263,"Former Mamou, Louisiana, Police Chief Sentenced, Second Former Police Chief Pleads Guilty to Firing Taser at Non-Combative Prisoners","The Justice Department announced today that the former Mamou, Louisiana, Police Chief Gregory W. Dupuis was sentenced to one year and one day in prison, and former Mamou Police Officer and Chief Robert McGee pleaded guilty to one count of the deprivation of rights under color of law, both for their roles in a series of incidents in which they deployed TASERs on non-resistant inmates at the Mamou Jail. Dupuis’s and McGee’s guilty pleas are the result of a federal investigation into the illegal use of excessive force upon inmates at the Mamou jail. Dupuis, 57, of Mamou, pleaded guilty to one count of violation of an individual’s civil rights on Apr. 13, 2015, and was sentenced today by U.S. District Judge Richard T. Haik of the Western District of Louisiana. According to evidence presented at Dupuis’s plea hearing, Dupuis served as police chief from 1994 to 1997 and from 2004 to 2014. During his tenure as chief, officers, including McGee, repeatedly administered TASER shocks as a form of punishment on inmates who were being disruptive, even if the inmates’ disruption was purely verbal, and on inmates who were calm and compliant when the officer deployed the TASER. On Apr. 25, 2010, Dupuis went to the department’s jail to deal with a verbally disruptive detainee. Dupuis ordered the detainee to get down from his bunk and put his hands on the far wall. The detainee complied. Dupuis then entered the cell and deployed the TASER on the detainee’s back, causing the detainee to fall to the ground, suffer pain and injure his knee. At his plea hearing, Dupuis admitted that he knew at the time that his actions were unlawful. McGee, 44, of Mamou, pleaded guilty today to one count of violation of an individual’s civil rights committed as an officer in 2010, prior to his 2014 election as chief of the Mamou Police Department. According to McGee’s guilty plea, McGee was called to the Mamou Police Department on multiple occasions in 2010 and 2011 to deal with disruptive inmates. On Aug. 6, 2010, McGee and an inmate were engaged in a conversation. Although the inmate posed no threat to himself or the officers, McGee fired the TASER at the inmate, causing the inmate to fall and experience pain. McGee, who was elected Mamou police chief after this incident, resigned his position as chief on Oct. 8, 2015, as a result of the federal investigation. McGee faces up to 10 years in prison, three years supervised release and a $250,000 fine. A sentencing date was not set. “The defendants abused the trust given to them as law enforcement officers when they engaged in a pattern of repeatedly tasing compliant detainees,” said Principal Deputy Assistant Attorney General Vanita Gupta, head of the the Civil Rights Division. “The Justice Department will vigorously prosecute those who violate the civil rights laws to ensure that the rights of all individuals, including those in custody, are protected.” “Law enforcement officers have a duty to ensure that detainees are treated fairly and humanely when taken into custody,” said U.S. Attorney Stephanie A. Finley of the Western District of Louisiana. “Mr. Dupuis and Mr. McGee breached that trust and violated their oaths by using excessive force on incarcerated individuals.” The FBI and the Louisiana State Police conducted the investigation. Trial Attorneys Stephen Curran and Sanjay Patel of the Civil Rights Division, and Assistant U.S. Attorneys Myers P. Namie and Robert Abendroth of the Western District of Louisiana are prosecuting the case.",2015-10-13T00:00:00-04:00,Civil Rights,"Civil Rights Division; Civil Rights - Criminal Section; USAO - Louisiana, Western",0.149,0.797,0.054,-0.9938,announc today former mamou louisiana polic chief gregori dupui sentenc year prison former mamou polic offic chief robert mcgee plead guilti count depriv color role seri incid deploy taser resist inmat mamou jail dupui mcgee guilti plea result feder illeg excess forc upon inmat mamou jail dupui mamou plead guilti count violat individu sentenc today judg richard haik western louisiana accord evid present dupui plea hear dupui serv polic chief tenur chief offic includ mcgee repeatedli administ taser shock form punish inmat disrupt even inmat disrupt pure verbal inmat calm compliant offic deploy taser dupui went jail deal verbal disrupt detaine dupui order detaine bunk hand wall detaine compli dupui enter cell deploy taser detaine back caus detaine fall ground suffer pain injur knee plea hear dupui admit knew time action unlaw mcgee mamou plead guilti today count violat individu commit offic prior elect chief mamou polic accord mcgee guilti plea mcgee call mamou polic multipl occas deal disrupt inmat mcgee inmat engag convers although inmat pose threat offic mcgee fire taser inmat caus inmat fall experi pain mcgee elect mamou polic chief incid resign posit chief result feder mcgee face year prison three year supervis releas fine sentenc date defend abus trust given enforc offic engag pattern repeatedli tase compliant detaine said princip deputi gener vanita gupta head vigor prosecut violat law ensur individu includ custodi protect enforc offic duti ensur detaine treat fairli human taken custodi said stephani finley western louisiana dupui mcgee breach trust violat oath use excess forc incarcer individu louisiana state polic conduct attorney stephen curran sanjay patel attorney myer nami robert abendroth western louisiana prosecut,topic_2
429,16-252,0.000611,0.563148,0.436242,3581,16-252,Former Alcorn State University Police Officer Pleads Guilty to Assaulting Former Student,"The Justice Department announced that Berthurm Allen, 42, a former police officer with the Alcorn State University (ASU) Police Department in Lorman, Mississippi, pleaded guilty today in federal court to violating the civil rights of an arrestee. During his guilty plea before Senior U.S. District Judge David Bramlette III of the Southern District of Mississippi, Allen admitted that while acting under his authority as an ASU police officer, he elbowed the victim in the face and threw the victim to the ground without legal justification. He also admitted that he misrepresented the circumstances surrounding the incident in his official police report to minimize his exposure to allegations of misconduct. Allen’s actions caused injuries to the victim’s nose and face. According to information presented in court, the incident occurred at the Claiborne County Jail in Port Gibson, Mississippi, and was recorded by the jail’s surveillance cameras. “When police officers violate the laws they swear to uphold, it threatens the credibility of our criminal justice system,” said Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division. “The Justice Department will continue to vigorously prosecute and hold accountable those officers who violate the constitutional rights of people in their custody.” “The use of excessive force by law enforcement officers is a violation of the officer’s oath to protect the constitutional rights of all persons, even those in custody,” said U. S. Attorney Gregory K. Davis of the Southern District of Mississippi. “Ensuring that law enforcement officers do not victimize the citizens they are sworn to serve and protect is a top priority of this office.” This case was investigated by the FBI’s Jackson Division, and is being prosecuted by Trial Attorneys Julia Gegenheimer and Sheldon L. Beer of the Civil Rights Division’s Criminal Section and Assistant U.S. Attorney Christopher Wansley of the Southern District of Mississippi.",2016-03-03T00:00:00-05:00,Civil Rights,"Civil Rights Division; Civil Rights - Criminal Section; USAO - Mississippi, Southern",0.138,0.797,0.065,-0.9695,announc berthurm allen former polic offic alcorn state univers polic lorman mississippi plead guilti today feder court violat arreste guilti plea senior judg david bramlett southern mississippi allen admit act author polic offic elbow victim face threw victim ground without legal justif also admit misrepres circumst surround incid offici polic report minim exposur alleg misconduct allen action caus injuri victim nose face accord inform present court incid occur claiborn counti jail port gibson mississippi record jail surveil camera polic offic violat law swear uphold threaten credibl crimin system said princip deputi gener vanita gupta head continu vigor prosecut hold account offic violat constitut peopl custodi excess forc enforc offic violat offic oath protect constitut person even custodi said gregori davi southern mississippi ensur enforc offic victim citizen sworn serv protect prioriti investig jackson prosecut attorney julia gegenheim sheldon beer crimin section christoph wansley southern mississippi,topic_2
374,16-1229,0.98645,0.012408,0.001142,7125,16-1229,"Justice Department Seeks to Intervene in Lawsuit Alleging Race Discrimination and Retaliation by Pocomoke City, Maryland, the Worcester County Sheriff and the State of Maryland","The Justice Department announced today that it has moved to intervene in Savage et al. v. Pocomoke City et al., a private lawsuit alleging race discrimination and retaliation under Title VII of the Civil Rights Act of 1964 by Pocomoke City, Maryland, the Worcester County Sheriff and the state of Maryland. Title VII is a federal statute that prohibits employment discrimination on the basis of the basis of sex, race, color, national origin and religion. The United States’ complaint in intervention alleges that the Worcester County Sheriff and the state of Maryland subjected former Pocomoke City Police Officer Franklin Savage to a racially-hostile work environment while he was assigned to a joint task force operated by the sheriff’s office. Specifically, Officer Savage was repeatedly subjected to racial epithets as well as other racially-charged acts of harassment, humiliation and intimidation by his co-workers and supervisors. Officer Savage’s complaints about racial harassment allegedly resulted in a series of retaliatory actions against him by the Worcester County Sheriff’s Office and Pocomoke City, concluding with the termination of his employment. The complaint further alleges that Pocomoke City similarly retaliated against two other officers – former Pocomoke City Police Chief Kelvin Sewell and former Pocomoke City Police Lieutenant Lynell Green – for supporting Officer Savage in the course of his complaints. Chief Sewell was eventually terminated as well. The complaint seeks a court order that requires the defendants to implement policies and procedures that will ensure a workplace environment free of discrimination and retaliatory conduct. The United States also seeks relief, including monetary relief for the three charging parties as compensation for damages caused by the alleged discrimination. “Federal law protects against discrimination and retaliation in the workplace,” said Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division. “In police departments, that protection is vital not only for individual officials, but also for the communities they serve. The Justice Department is firmly committed to ensuring that our nation’s state and local law enforcement agencies comply with Title VII’s promise of a workplace free from racial discrimination and retaliation.” Officer Savage, Chief Sewell and Lieutenant Green each filed charges of discrimination with the Equal Employment Opportunity Commission (EEOC). The EEOC’s Baltimore Office investigated the charges and made reasonable cause findings. After unsuccessful conciliation efforts, the EEOC referred the charges to the Justice Department. “No one should have to face harassment and retaliation while at work,” said EEOC Chair Jenny R. Yang. “When public employees face discrimination, it undermines the trust and credibility in our public institutions. This case represents the latest partnership between EEOC and the Department of Justice to advance our shared Title VII enforcement responsibilities.” This lawsuit was brought as a result of a joint collaborative effort by the EEOC and the Civil Rights Division to vigorously enforce Title VII. “EEOC is committed to ensuring the employees who serve the public in critical law enforcement positions are protected by the laws forbidding unlawful harassment and retaliation in the workplace,” said Director Spencer H. Lewis Jr. of EEOC’s Philadelphia District Office, which includes the Baltimore Field Office. “I am pleased that EEOC and the Department of Justice have established a collaborative relationship that will impact public employers and work together to redress violations of the law when they occur.” Enforcement of federal employment discrimination laws remains a top priority of the Justice Department. More information about Title VII and other federal employment laws is available on the Civil Rights Division’s website at www.justice.gov/crt. The EEOC enforces federal laws prohibiting employment discrimination. Further information about the EEOC is available on its website at www.eeoc.gov. Pocomoke City Motion to Intervene",2016-10-19T00:00:00-04:00,Civil Rights,Civil Rights Division; Civil Rights - Employment Litigation Section,0.123,0.782,0.095,-0.9426,announc today move interven savag pocomok citi privat lawsuit alleg race discrimin retali titl pocomok citi maryland worcest counti sheriff state maryland titl feder statut prohibit employ discrimin basi basi race color nation origin religion unit state complaint intervent alleg worcest counti sheriff state maryland subject former pocomok citi polic offic franklin savag racial hostil work environ assign joint task forc oper sheriff specif offic savag repeatedli subject racial epithet well racial charg act harass humili intimid worker supervisor offic savag complaint racial harass allegedli result seri retaliatori action worcest counti sheriff pocomok citi conclud termin employ complaint alleg pocomok citi similarli retali offic former pocomok citi polic chief kelvin sewel former pocomok citi polic lieuten lynel green support offic savag cours complaint chief sewel eventu termin well complaint seek court order requir defend implement polici procedur ensur workplac environ free discrimin retaliatori conduct unit state also seek relief includ monetari relief three charg parti compens damag caus alleg discrimin feder protect discrimin retali workplac said princip deputi gener vanita gupta head polic depart protect vital individu offici also commun serv firmli commit ensur nation state local enforc agenc compli titl promis workplac free racial discrimin retali offic savag chief sewel lieuten green file charg discrimin equal employ opportun commiss eeoc eeoc baltimor investig charg made reason caus find unsuccess concili effort eeoc refer charg face harass retali work said eeoc chair jenni yang public employe face discrimin undermin trust credibl public institut repres latest partnership eeoc advanc share titl enforc respons lawsuit brought result joint collabor effort eeoc vigor enforc titl eeoc commit ensur employe serv public critic enforc posit protect law forbid unlaw harass retali workplac said director spencer lewi eeoc philadelphia includ baltimor field pleas eeoc establish collabor relationship impact public employ work togeth redress violat occur enforc feder employ discrimin law remain prioriti inform titl feder employ law avail websit eeoc enforc feder law prohibit employ discrimin inform eeoc avail websit eeoc pocomok citi motion interven,topic_1
82,11-1328,0.001033,0.997941,0.001026,8491,11-1328,Member of United Aryan Brotherhood Pleads Guilty to a Hate Motivated Assault of Jewish Inmate in Texas,"WASHINGTON – The Justice Department announced today that Timothy Lee York, a 35 year-old, self-proclaimed member of the United Aryan Brotherhood, pleaded guilty to violently assaulting a Jewish inmate at a federal correctional facility in Texas. York, of Fountain Valley, Calif., pleaded guilty to assault with a dangerous weapon before U.S. Magistrate Judge Irma C. Ramirez in federal court in Dallas, Texas. York admitted in court that on Dec. 28, 2007, he attacked Stuart Rosoff, his Jewish cellmate, while Rosoff was sleeping. York admitted that he used a dangerous weapon, a ligature that he placed around Rosoff’s neck, to forcibly pull Rosoff to the floor where he lost consciousness. Once Rosoff was on the floor, York repeatedly kicked Rosoff in the head and punched him in the head and body. York acknowledged that he attacked Rosoff because he was Jewish. “The Department of Justice will not hesitate to prosecute those who assault others because of their race or religion,” said Thomas E. Perez, Assistant Attorney General for Civil Rights Division. Sentencing is scheduled for Feb. 6, 2012, before Judge Sam A. Lindsay. The case was investigated by the Dallas Division of the FBI, and is being prosecuted by Trial Attorneys Jared Fishman and Ryan Murguía of the Department of Justice’s Civil Rights Division.",2011-10-06T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,0.232,0.732,0.037,-0.9887,washington announc today timothi york year self proclaim member unit aryan brotherhood plead guilti violent assault jewish inmat feder correct facil texa york fountain valley calif plead guilti assault danger weapon magistr judg irma ramirez feder court dalla texa york admit court attack stuart rosoff jewish cellmat rosoff sleep york admit use danger weapon ligatur place around rosoff neck forcibl pull rosoff floor lost conscious rosoff floor york repeatedli kick rosoff head punch head bodi york acknowledg attack rosoff jewish hesit prosecut assault other race religion said thoma perez gener sentenc schedul judg lindsay investig dalla prosecut attorney jare fishman ryan murguía,topic_2
239,15-015,0.000138,0.999725,0.000137,11763,15-015,"Two Brandon, Mississippi, Men Plead Guilty for Committing Hate Crimes Against African Americans in Jackson, Mississippi","Acting Assistant Attorney General Vanita Gupta for the Justice Department’s Civil Rights Division and U.S. Attorney Gregory K. Davis for the Southern District of Mississippi announced that John Louis Blalack, 20, and Robert Henry Rice, 24, both from Brandon, Mississippi, pleaded guilty today in U.S. District Court in Jackson to federal hate crime charges in connection with their roles in a series of assaults on African Americans in Jackson, Mississippi. Blalack and Rice are the ninth and 10th individuals associated with a group of people who conspired to target and assault African Americans based on their race in the spring of 2011. “Justice has been served,” said Attorney General Eric Holder. “The hate crimes to which these defendants have pleaded guilty were as shocking as they were reprehensible—targeting innocent people for racially-motivated acts of violence that inflicted grievous harm and even claimed a life. The Justice Department will never rest in our pursuit of those who victimize their fellow citizens. This landmark case should send a clear message: that anyone who commits an act of bias-motivated violence, or who violates the civil rights to which all Americans are entitled, will be held accountable to the fullest extent of the law.” Prior to today's guilty pleas, Deryl Paul Dedmon, 22; John Aaron Rice, 21; Dylan Wade Butler, 23; Jonathan Kyle Gaskamp, 22; and Joseph Paul Dominick, 23, all from Brandon, Mississippi, and William Kirk Montgomery, 25, from Puckett, Mississippi, Shelbie Brooke Richards, 21, from Pearl, Mississippi, and Sarah Adelia Graves, 21, from Crystal Springs, Mississippi, pleaded guilty in connection with their roles in these offenses. The conspiracy culminated in the death of James Craig Anderson, who was assaulted and killed on June 26, 2011. Blalack pleaded guilty to two counts of violating the Matthew Shepard – James Byrd Jr. Hate Crimes Prevention Act. Rice pleaded guilty to one count of violating the same act. The statutory maximum sentence for these violations is 10 years in prison and a $250,000 fine. Sentencing for Blalack is set for April 23, 2015, and sentencing for Rice is set for April 30, 2015. The federal investigation revealed that beginning in the spring of 2011, Blalack, Robert Rice and others conspired with one another to harass and assault African-American people in and around Jackson. On numerous occasions the co-conspirators used dangerous weapons, including beer bottles, sling shots and motor vehicles, to cause, and attempt to cause, bodily injury to African-American people. They would specifically target African Americans they believed to be homeless or under the influence of alcohol because they believed that such individuals would be less likely to report an assault. The co-conspirators would often boast about these racially motivated assaults. On June 25, 2011, Blalack and others attended a birthday party/bonfire for a mutual friend in Puckett, Mississippi. During the party, Blalack and others talked about going to Jackson to harass and assault African-American people. By the early morning hours of June 26, 2011, Blalack, Montgomery, Dedmon, John Aaron Rice, Butler, Richards and Graves agreed to carry out their plan to find, harass and assault African-American people. Robert Rice did not go to Jackson on June 26, 2011. At around 4:15 a.m., Blalack, Montgomery, John Aaron Rice, and Butler drove to Jackson in Montgomery’s white Jeep with the understanding that Dedmon, Richards and Graves would join them a short time later. Blalack and the other three occupants of the Jeep then drove around Jackson and threw beer bottles from the moving vehicle at African-American pedestrians they encountered. At approximately 5:00 a.m., Blalack and the other three occupants of the Jeep spotted Anderson in a motel parking lot off of Ellis Avenue. The occupants of the Jeep decided that Anderson would be a good target for an assault because he was African-American and appeared to be visibly intoxicated. Blalack and John Aaron Rice decided to get out of the Jeep to distract Anderson while they waited for Dedmon, Richards and Graves to arrive. After Dedmon Richards, and Graves arrived in Dedmon’s Ford F250 truck, Dedmon and John Aaron Rice physically assaulted Anderson. Rice first punched Anderson in the face with sufficient force to knock Anderson to the ground, and then Dedmon punched Anderson in the face multiple times while he was on the ground. After the assault, Blalack, Montgomery, Rice and Butler left the motel parking lot in the Jeep. As they left, one of the occupants of the Jeep yelled, “White Power!” Prior to getting back into his truck, Dedmon responded by also yelling “White Power!” Once back in his Ford F250 truck, Dedmon deliberately used his vehicle to run over Anderson, causing injuries which resulted in his death. Blalack’s guilty plea includes his role in this offense. On a previous occasion, Blalack, Montgomery, Butler and Dominick drove around west Jackson to find and assault African Americans. Blalack and the other occupants of the vehicle purchased bottles of beer to drink and then threw the beer bottles at African Americans. The occupants of the vehicle also purchased a sling-shot. Some of the occupants of the vehicle, including Blalack, threw beer bottles and shot metal ball bearings out of the moving vehicle at African American pedestrians. Blalack pleaded guilty for his role in this offense. Another previous occasion involved a racially motivated assault at or near a golf course in Jackson. On this particular evening, Robert Rice, Blalack, Montgomery, Gaskamp, Dedmon and John Aaron Rice were in a vehicle, searching for, and eventually finding, a vulnerable African-American man to assault. The vehicle was stopped so Dedmon, John Aaron Rice and Gaskamp could chase the victim down. The three men beat the man to the point that he begged for his life. Robert Rice’s guilty plea includes his role in this offense. “Today’s guilty pleas are the culmination of an extensive federal investigation into this violent hate crime conspiracy,” said Acting Assistant Attorney General Gupta. “Ten defendants have now pleaded guilty to crimes associated with this conspiracy. We hope that today’s guilty pleas provide closure to the victim’s family and to the community that has mourned Mr. Anderson’s tragic death and been shocked by the scope of the conspiracy to commit racially motivated assaults in Jackson by a group of ten co-conspirators.” “There can be no tolerance for acts of gratuitous violence targeting innocent persons simply because of their race,” said U.S. Attorney Davis. “This case is a testament to the United States Attorney’s Office’s dedication to vigorously investigate and prosecute violations of federal hate crime laws. I commend not only Mr. Anderson’s family for their continued cooperation throughout this investigation, but our law enforcement partners, including the FBI and Jackson Police Department, who worked tirelessly in this case to ensure our hate crime laws are strictly enforced.” “With today's guilty pleas, the FBI and its law enforcement partners have identified and brought to justice all those individuals who conspired to deprive Mr. Anderson and other citizens of their civil rights simply because of the color of their skin,” said Special Agent in Charge Donald Alway for the FBI in Mississippi. “The FBI remains dedicated to protecting the cherished freedoms of all Americans, including aggressively investigating allegations of hate crimes and working to prevent them.” These guilty pleas were the result of a cooperative effort between the Civil Rights Division, the U.S. Attorney’s Office for the Southern District of Mississippi and the Hinds County District Attorney’s office. This case was investigated by the FBI’s Jackson Division and the Jackson Police Department. It is being prosecuted by Trial Attorney Sheldon L. Beer and Deputy Chief Paige M. Fitzgerald of the Civil Rights Division, and Assistant U.S. Attorney Glenda R. Haynes of the U.S. Attorney’s Office for the Southern District of Mississippi.",2015-01-07T00:00:00-05:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section; Civil Rights - Housing and Civil Enforcement Section,0.162,0.79,0.048,-0.9988,act gener vanita gupta gregori davi southern mississippi announc john loui blalack robert henri rice brandon mississippi plead guilti today court jackson feder hate crime charg connect role seri assault african american jackson mississippi blalack rice ninth individu associ group peopl conspir target assault african american base race spring serv said gener eric holder hate crime defend plead guilti shock reprehens target innoc peopl racial motiv act violenc inflict grievou harm even claim life never rest pursuit victim fellow citizen landmark send clear messag anyon commit bia motiv violenc violat american entitl held account fullest extent prior today guilti plea deryl paul dedmon john aaron rice dylan wade butler jonathan kyle gaskamp joseph paul dominick brandon mississippi william kirk montgomeri puckett mississippi shelbi brook richard pearl mississippi sarah adelia grave crystal spring mississippi plead guilti connect role offens conspiraci culmin death jame craig anderson assault kill june blalack plead guilti count violat matthew shepard jame byrd hate crime prevent rice plead guilti count violat statutori maximum sentenc violat year prison fine sentenc blalack april sentenc rice april feder reveal begin spring blalack robert rice other conspir anoth harass assault african american peopl around jackson numer occas conspir use danger weapon includ beer bottl sling shot motor vehicl caus attempt caus bodili injuri african american peopl would specif target african american believ homeless influenc alcohol believ individu would less like report assault conspir would often boast racial motiv assault june blalack other attend birthday parti bonfir mutual friend puckett mississippi parti blalack other talk go jackson harass assault african american peopl earli morn hour june blalack montgomeri dedmon john aaron rice butler richard grave agre carri plan find harass assault african american peopl robert rice jackson june around blalack montgomeri john aaron rice butler drove jackson montgomeri white jeep understand dedmon richard grave would join short time later blalack three occup jeep drove around jackson threw beer bottl move vehicl african american pedestrian encount approxim blalack three occup jeep spot anderson motel park elli avenu occup jeep decid anderson would good target assault african american appear visibl intox blalack john aaron rice decid jeep distract anderson wait dedmon richard grave arriv dedmon richard grave arriv dedmon ford truck dedmon john aaron rice physic assault anderson rice first punch anderson face suffici forc knock anderson ground dedmon punch anderson face multipl time ground assault blalack montgomeri rice butler left motel park jeep left occup jeep yell white power prior get back truck dedmon respond also yell white power back ford truck dedmon deliber use vehicl anderson caus injuri result death blalack guilti plea includ role offens previou occas blalack montgomeri butler dominick drove around west jackson find assault african american blalack occup vehicl purchas bottl beer drink threw beer bottl african american occup vehicl also purchas sling shot occup vehicl includ blalack threw beer bottl shot metal ball bear move vehicl african american pedestrian blalack plead guilti role offens anoth previou occas involv racial motiv assault near golf cours jackson particular even robert rice blalack montgomeri gaskamp dedmon john aaron rice vehicl search eventu find vulner african american assault vehicl stop dedmon john aaron rice gaskamp could chase victim three beat point beg life robert rice guilti plea includ role offens today guilti plea culmin extens feder violent hate crime conspiraci said act gener gupta defend plead guilti crime associ conspiraci hope today guilti plea provid closur victim famili commun mourn anderson tragic death shock scope conspiraci commit racial motiv assault jackson group conspir toler act gratuit violenc target innoc person simpli race said davi testament unit state dedic vigor investig prosecut violat feder hate crime law commend anderson famili continu cooper throughout enforc partner includ jackson polic work tirelessli ensur hate crime law strictli enforc today guilti plea enforc partner identifi brought individu conspir depriv anderson citizen simpli color skin said special agent charg donald alway mississippi remain dedic protect cherish freedom american includ aggress investig alleg hate crime work prevent guilti plea result cooper effort southern mississippi hind counti investig jackson jackson polic prosecut sheldon beer deputi chief paig fitzgerald glenda hayn southern mississippi,topic_2


In [28]:
## C.Summarize the topic proportions for each of the topics_clean 
## group by topic
#topic_wmeta.groupby('topics_clean').agg({'top_topic': pd.crosstab})
pd.crosstab(topic_wmeta.topics_clean, topic_wmeta.top_topic, rownames=['topics_clean'], colnames=['top_topic'], normalize=True)

top_topic,topic_1,topic_2,topic_3
topics_clean,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Civil Rights,0.212164,0.118812,0.094767
Hate Crimes,0.0,0.335219,0.01273
Project Safe Childhood,0.0,0.223479,0.002829


This seems to indicate that Civil Rights is mostly topic_1 with a bit of bleed over as topic_2. But topic_2 seems to be a diverse blend of all three main subjects (but it's mainly comprised of Civil Rights, Project Safe Childhood, and then Civil Rights and in that order).



# 3. Extend the analysis from unigrams to bigrams (9 points)

In the previous question, you found top words via a unigram representation of the text. Now, we want to see how those top words change with bigrams (pairs of words)

A. Using the `doj_subset_wscore` data and the `processed_text` column (so the words after stemming/other preprocessing), create a column in the data called `processed_text_bigrams` that combines each consecutive pairs of word into a bigram separated by an underscore. Eg:

"depart reach settlem" would become "depart_reach reach_settlem"

Do this by writing a function `create_bigram_onedoc` that takes in a single `processed_text` string and returns a string with its bigrams structured similarly to above example
 
**Hint**: there are many ways to solve but `zip` may be helpful: https://stackoverflow.com/questions/21303224/iterate-over-all-pairs-of-consecutive-items-in-a-list

B. Print the `id`, `processed_text`, and `processed_text_bigram` columns for press release with id = 16-217

In [29]:
## A. Write a function to return bigrams structured
processed_text_list = doj_subset_wscore.processed_text.to_list()

# Write the function here
def create_bigram_onedoc(single_string):
    
    dictionary = {}
    i = 0
    # Split all the words
    single_string_split = single_string.split(' ')
    
    # Make all words in pair
    for first,second in zip(single_string_split,single_string_split[1:]):
        string_pair = first + '_' + second
        dictionary[i] = string_pair
        i += 1
    
    # make dictionary to list
    dic_list = list(dictionary.values())
    
    # connect the list
    s = ','
    list_connect = s.join(dic_list)
    list_space = list_connect.replace(',',' ')
    return(list_space)

doj_subset_wscore['processed_text_bigram'] = [create_bigram_onedoc(text) for text in doj_subset_wscore.processed_text]
doj_subset_wscore

Unnamed: 0,index,id,title,contents,date,topics_clean,components_clean,neg,neu,pos,compound,processed_text,processed_text_bigram
13,329,14-248,Albuquerque Man Charged with Federal Hate Crime Related to Anti-Semitic Threats Against Businesswoman,"The Department of Justice announced that this morning John W. Ng, 58, of Albuquerque, N.M., made his initial appearance in federal court on a criminal complaint charging him with a hate crime offense. This charge is related to anti-Semitic threats Ng made against a Jewish woman who owns and operates the Nosh Jewish Delicatessen and Bakery in Albuquerque. Ng was arrested by the FBI on March 7, 2014, based on a criminal complaint alleging that he interfered with the victim’s federally protected rights by threatening her and interfering with her business because of her religion. According to the criminal complaint, between Jan. 22, 2014, and Feb. 8, 2014, Ng allegedly posted threatening anti-Semitic notes on and in the vicinity of the victim’s business. A criminal complaint merely establishes probable cause, and Ng is presumed innocent unless proven guilty. If convicted on the offense charged in the criminal complaint, Ng faces a maximum statutory penalty of one year in prison. This matter was investigated by the Albuquerque Division of the FBI and is being prosecuted by Assistant U.S. Attorney Mark T. Baker of the U.S. Attorney’s Office for the District of New Mexico and Trial Attorney AeJean Cha of the U.S. Department of Justice’s Civil Rights Division.",2014-03-10T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,0.315,0.654,0.031,-0.9951,announc morn john albuquerqu made initi appear feder court crimin complaint charg hate crime offens charg relat anti semit threat made jewish woman own oper nosh jewish delicatessen bakeri albuquerqu arrest march base crimin complaint alleg interf victim feder protect threaten interf busi religion accord crimin complaint allegedli post threaten anti semit note vicin victim busi crimin complaint mere establish probabl caus presum innoc unless proven guilti convict offens charg crimin complaint face maximum statutori penalti year prison matter investig albuquerqu prosecut mark baker mexico aejean,announc_morn morn_john john_albuquerqu albuquerqu_made made_initi initi_appear appear_feder feder_court court_crimin crimin_complaint complaint_charg charg_hate hate_crime crime_offens offens_charg charg_relat relat_anti anti_semit semit_threat threat_made made_jewish jewish_woman woman_own own_oper oper_nosh nosh_jewish jewish_delicatessen delicatessen_bakeri bakeri_albuquerqu albuquerqu_arrest arrest_march march_base base_crimin crimin_complaint complaint_alleg alleg_interf interf_victim victim_feder feder_protect protect_threaten threaten_interf interf_busi busi_religion religion_accord accord_crimin crimin_complaint complaint_allegedli allegedli_post post_threaten threaten_anti anti_semit semit_note note_vicin vicin_victim victim_busi busi_crimin crimin_complaint complaint_mere mere_establish establish_probabl probabl_caus caus_presum presum_innoc innoc_unless unless_proven proven_guilti guilti_convict convict_offens offens_charg charg_crimin crimin_complaint complaint_face face_maximum maximum_statutori statutori_penalti penalti_year year_prison prison_matter matter_investig investig_albuquerqu albuquerqu_prosecut prosecut_mark mark_baker baker_mexico mexico_aejean
632,11593,16-718,Three Mississippi Correctional Officers Indicted for Inmate Assault and Cover-Up,"In a nine-count indictment unsealed today, two Mississippi correctional officers were charged with beating an inmate and a third was charged with helping to cover it up. The indictment charged Lawardrick Marsher, 28, and Robert Sturdivant, 47, officers at Mississippi State Penitentiary, in Parchman, Mississippi, with a beating that included kicking, punching and throwing the victim to the ground. Marsher and Sturdivant were charged with violating the right of K.H., a convicted prisoner, to be free from cruel and unusual punishment. Sturdivant was also charged with failing to intervene while Marsher was punching and beating K.H. The indictment alleges that their actions involved the use of a dangerous weapon and resulted in bodily injury to the victim. A third officer, Deonte Pate, 23, was charged along with Marsher and Sturdivant for conspiring to cover up the beating. The indictment alleges that all three officers submitted false reports and that all three lied to the FBI. If convicted, Marsher and Sturdivant face a maximum sentence of 10 years in prison on the excessive force charges. Each of the three officers faces up to five years in prison on the conspiracy and false statement charges, and up to 20 years in prison on the false report charges. An indictment is merely an accusation, and the defendants are presumed innocent unless and until proven guilty. This case is being investigated by the FBI’s Jackson Division, with the cooperation of the Mississippi Department of Corrections. It is being prosecuted by Assistant U.S. Attorney Robert Coleman of the Northern District of Mississippi and Trial Attorney Dana Mulhauser of the Civil Rights Division’s Criminal Section. Marsher Indictment",2016-06-21T00:00:00-04:00,Civil Rights,"Civil Rights Division; Civil Rights - Criminal Section; USAO - Mississippi, Northern",0.296,0.671,0.033,-0.9964,nine count indict unseal today mississippi correct offic charg beat inmat third charg help cover indict charg lawardrick marsher robert sturdiv offic mississippi state penitentiari parchman mississippi beat includ kick punch throw victim ground marsher sturdiv charg violat right convict prison free cruel unusu punish sturdiv also charg fail interven marsher punch beat indict alleg action involv danger weapon result bodili injuri victim third offic deont pate charg along marsher sturdiv conspir cover beat indict alleg three offic submit fals report three lie convict marsher sturdiv face maximum sentenc year prison excess forc charg three offic face five year prison conspiraci fals statement charg year prison fals report charg indict mere accus defend presum innoc unless proven guilti investig jackson cooper mississippi correct prosecut robert coleman northern mississippi dana mulhaus crimin section marsher indict,nine_count count_indict indict_unseal unseal_today today_mississippi mississippi_correct correct_offic offic_charg charg_beat beat_inmat inmat_third third_charg charg_help help_cover cover_indict indict_charg charg_lawardrick lawardrick_marsher marsher_robert robert_sturdiv sturdiv_offic offic_mississippi mississippi_state state_penitentiari penitentiari_parchman parchman_mississippi mississippi_beat beat_includ includ_kick kick_punch punch_throw throw_victim victim_ground ground_marsher marsher_sturdiv sturdiv_charg charg_violat violat_right right_convict convict_prison prison_free free_cruel cruel_unusu unusu_punish punish_sturdiv sturdiv_also also_charg charg_fail fail_interven interven_marsher marsher_punch punch_beat beat_indict indict_alleg alleg_action action_involv involv_danger danger_weapon weapon_result result_bodili bodili_injuri injuri_victim victim_third third_offic offic_deont deont_pate pate_charg charg_along along_marsher marsher_sturdiv sturdiv_conspir conspir_cover cover_beat beat_indict indict_alleg alleg_three three_offic offic_submit submit_fals fals_report report_three three_lie lie_convict convict_marsher marsher_sturdiv sturdiv_face face_maximum maximum_sentenc sentenc_year year_prison prison_excess excess_forc forc_charg charg_three three_offic offic_face face_five five_year year_prison prison_conspiraci conspiraci_fals fals_statement statement_charg charg_year year_prison prison_fals fals_report report_charg charg_indict indict_mere mere_accus accus_defend defend_presum presum_innoc innoc_unless unless_proven proven_guilti guilti_investig investig_jackson jackson_cooper cooper_mississippi mississippi_correct correct_prosecut prosecut_robert robert_coleman coleman_northern northern_mississippi mississippi_dana dana_mulhaus mulhaus_crimin crimin_section section_marsher marsher_indict
22,501,11-626,Arkansas Man Pleads Guilty to Federal Hate Crime Related to the Assault of Five Hispanic Men,"WASHINGTON – The Justice Department announced today that Sean Popejoy, 19, of Green Forest, Ark., pleaded guilty in federal court to one count of committing a federal hate crime and one count of conspiring to commit a federal hate crime. This is the first conviction for a violation of the Matthew Shepard and James Byrd Jr. Hate Crimes Prevention Act, which was enacted in October 2009. Information presented during the plea hearing established that in the early morning hours of June 20, 2010, Popejoy admitted that he was part of a conspiracy to threaten and injure five Hispanic men who had pulled into a gas station parking lot. The co-conspirators pursued the victims in a truck. When the co-conspirators caught up to the victims, Popejoy leaned outside of the front passenger window and waived a tire wrench at the victims and continued to threaten and hurl racial epithets at the victims. The co-conspirator rammed into the victims' car, which caused the victims’ car to cross the opposite lane of traffic, go off the road, crash into a tree and ignite. As a result of the co-conspirators’ actions, the victims suffered bodily injury, including one victim who sustained life-threatening injuries. “James Byrd, Jr. and Matthew Shepard were brutally murdered more than a decade ago, and today the first defendant is convicted for a hate crime under the critical new law enacted in their names,” said Thomas E. Perez, Assistant Attorney General for the Civil Rights Division. “It is unacceptable that violent acts of hate committed because of someone’s race continue to occur in 2011, and the department will continue to use every available tool to identify and prosecute hate crimes whenever and wherever they occur. “It is terrible and disturbing that violence motivated by hatred of another’s race continues to occur,” said Conner Eldridge, U.S. Attorney for the Western District of Arkansas. “We are committed to prosecuting such crimes in the Western District of Arkansas.” If convicted, the defendant faces a maximum punishment of 15 years in prison. This case is being investigated by the FBI’s Fayetteville Division in cooperation with the Arkansas State Police Department and the Carroll County Sheriff’s Office. The case is being prosecuted by Trial Attorney Edward Chung of the Department of Justice’s Civil Rights Division and Assistant U.S. Attorney Kyra Jenner for the Western District of Arkansas.",2011-05-16T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,0.287,0.682,0.031,-0.9985,washington announc today sean popejoy green forest plead guilti feder court count commit feder hate crime count conspir commit feder hate crime first convict violat matthew shepard jame byrd hate crime prevent enact octob inform present plea hear establish earli morn hour june popejoy admit part conspiraci threaten injur five hispan pull station park conspir pursu victim truck conspir caught victim popejoy lean outsid front passeng window waiv tire wrench victim continu threaten hurl racial epithet victim conspir ram victim caus victim cross opposit lane traffic road crash tree ignit result conspir action victim suffer bodili injuri includ victim sustain life threaten injuri jame byrd matthew shepard brutal murder decad today first defend convict hate crime critic enact name said thoma perez gener unaccept violent act hate commit someon race continu occur continu everi avail tool identifi prosecut hate crime whenev wherev occur terribl disturb violenc motiv hatr anoth race continu occur said conner eldridg western arkansa commit prosecut crime western arkansa convict defend face maximum punish year prison investig fayettevil cooper arkansa state polic carrol counti sheriff prosecut edward chung kyra jenner western arkansa,washington_announc announc_today today_sean sean_popejoy popejoy_green green_forest forest_plead plead_guilti guilti_feder feder_court court_count count_commit commit_feder feder_hate hate_crime crime_count count_conspir conspir_commit commit_feder feder_hate hate_crime crime_first first_convict convict_violat violat_matthew matthew_shepard shepard_jame jame_byrd byrd_hate hate_crime crime_prevent prevent_enact enact_octob octob_inform inform_present present_plea plea_hear hear_establish establish_earli earli_morn morn_hour hour_june june_popejoy popejoy_admit admit_part part_conspiraci conspiraci_threaten threaten_injur injur_five five_hispan hispan_pull pull_station station_park park_conspir conspir_pursu pursu_victim victim_truck truck_conspir conspir_caught caught_victim victim_popejoy popejoy_lean lean_outsid outsid_front front_passeng passeng_window window_waiv waiv_tire tire_wrench wrench_victim victim_continu continu_threaten threaten_hurl hurl_racial racial_epithet epithet_victim victim_conspir conspir_ram ram_victim victim_caus caus_victim victim_cross cross_opposit opposit_lane lane_traffic traffic_road road_crash crash_tree tree_ignit ignit_result result_conspir conspir_action action_victim victim_suffer suffer_bodili bodili_injuri injuri_includ includ_victim victim_sustain sustain_life life_threaten threaten_injuri injuri_jame jame_byrd byrd_matthew matthew_shepard shepard_brutal brutal_murder murder_decad decad_today today_first first_defend defend_convict convict_hate hate_crime crime_critic critic_enact enact_name name_said said_thoma thoma_perez perez_gener gener_unaccept unaccept_violent violent_act act_hate hate_commit commit_someon someon_race race_continu continu_occur occur_continu continu_everi everi_avail avail_tool tool_identifi identifi_prosecut prosecut_hate hate_crime crime_whenev whenev_wherev wherev_occur occur_terribl terribl_disturb disturb_violenc violenc_motiv motiv_hatr hatr_anoth anoth_race race_continu continu_occur occur_said said_conner conner_eldridg eldridg_western western_arkansa arkansa_commit commit_prosecut prosecut_crime crime_western western_arkansa arkansa_convict convict_defend defend_face face_maximum maximum_punish punish_year year_prison prison_investig investig_fayettevil fayettevil_cooper cooper_arkansa arkansa_state state_polic polic_carrol carrol_counti counti_sheriff sheriff_prosecut prosecut_edward edward_chung chung_kyra kyra_jenner jenner_western western_arkansa
594,11248,10-1194,Tennessee Man Sentenced for Conspiring to Commit Murders of African-Americans,"WASHINGTON - The Justice Department announced that Daniel Cowart was sentenced today to 14 years in prison and three years of supervised release for his role in a conspiracy to murder dozens of African-Americans, including then-Senator and presidential candidate Barack Obama, because of their race. On March 29, 2010, Cowart pleaded guilty to conspiracy, threatening to kill and inflict bodily harm upon a major candidate for the office of President of the United States, interstate transportation of a short-barreled shotgun, interstate transportation of a firearm for the purpose of committing a felony, unlicensed transportation of an unauthorized short-barreled shotgun, possession of a short-barreled shotgun, intentional damage to religious real property and discharge of a firearm during and in relation to a crime of violence. Cowart, 22, of Bells, Tenn., admitted to conspiring with Paul Schlesselman of West Helena, Ark., to engage in a killing spree specifically targeting African-Americans. He further acknowledged that he intended to culminate these attacks by assassinating President Obama, a U.S. Senator and presidential candidate at the time of the conspiracy. Cowart admitted that he and Schlesselman also conspired to burglarize a federally-licensed firearms dealer to obtain additional weapons for their scheme. He also admitted to transporting a sawed-off shotgun from Arkansas to Tennessee for the purpose of committing felonies. Cowart additionally admitted to shooting the window of the Allen Baptist Church in Brownsville, Tenn. Under the plea agreement, Cowart agreed that an appropriate sentence would be between twelve and eighteen years. The charges to which he pleaded guilty carried a minimum sentence of 10 years and a maximum sentence of 75 years in prison. ""Threats of violence fueled by bigotry and hate have no place in the United States of America, and they will not be tolerated,"" said Thomas E. Perez, Assistant Attorney General for the Civil Rights Division. ""Although the heroic intervention of law enforcement spared us from a tragedy, this conspiracy and its associated crimes demanded a severe sentence. The sentence imposed constitutes serious punishment for a serious crime."" ""Thankfully, the defendants were not able to execute their violent scheme. Nevertheless, this is a grave matter and Judge Breen’s sentence reflects that crimes of this magnitude demand stiff penalties,"" said Edward L. Stanton III, U.S. Attorney for the Western District of Tennessee. ""I would like to recognize the extraordinary diligence of the Crockett County Sheriff’s Department, the Bureau of Alcohol, Tobacco and Firearms, the U.S Secret Service, and the FBI."" Cowart’s co-defendant, Paul Schlesselman, pleaded guilty on Jan. 14, 2010, to one count of conspiracy, one count of threatening to kill and inflict bodily harm upon a presidential candidate, and one count of possessing a firearm in furtherance of a crime of violence. Schlesselman was sentenced to 10 years in prison on April 15, 2010. This case was investigated by the Bureau of Alcohol, Tobacco, Firearms and Explosives; the U.S. Secret Service; the FBI; and the Crockett County Sheriff’s Office. The case was prosecuted by Assistant U.S. Attorneys Larry Laurenzi and James Powell and Civil Rights Division Trial Attorney Jonathan Skrmetti.",2010-10-22T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,0.282,0.653,0.065,-0.9990,washington announc daniel cowart sentenc today year prison three year supervis releas role conspiraci murder dozen african american includ senat presidenti candid barack obama race march cowart plead guilti conspiraci threaten kill inflict bodili harm upon major candid presid unit state interst transport short barrel shotgun interst transport firearm purpos commit feloni unlicens transport unauthor short barrel shotgun possess short barrel shotgun intent damag religi real properti discharg firearm relat crime violenc cowart bell tenn admit conspir paul schlesselman west helena engag kill spree specif target african american acknowledg intend culmin attack assassin presid obama senat presidenti candid time conspiraci cowart admit schlesselman also conspir burglar feder licens firearm dealer obtain addit weapon scheme also admit transport saw shotgun arkansa tennesse purpos commit feloni cowart addit admit shoot window allen baptist church brownsvil tenn plea agreement cowart agre appropri sentenc would twelv eighteen year charg plead guilti carri minimum sentenc year maximum sentenc year prison threat violenc fuel bigotri hate place unit state america toler said thoma perez gener although heroic intervent enforc spare tragedi conspiraci associ crime demand sever sentenc sentenc impos constitut seriou punish seriou crime thank defend abl execut violent scheme nevertheless grave matter judg breen sentenc reflect crime magnitud demand stiff penalti said edward stanton western tennesse would like recogn extraordinari dilig crockett counti sheriff bureau alcohol tobacco firearm secret servic cowart defend paul schlesselman plead guilti count conspiraci count threaten kill inflict bodili harm upon presidenti candid count possess firearm further crime violenc schlesselman sentenc year prison april investig bureau alcohol tobacco firearm explos secret servic crockett counti sheriff prosecut attorney larri laurenzi jame powel jonathan skrmetti,washington_announc announc_daniel daniel_cowart cowart_sentenc sentenc_today today_year year_prison prison_three three_year year_supervis supervis_releas releas_role role_conspiraci conspiraci_murder murder_dozen dozen_african african_american american_includ includ_senat senat_presidenti presidenti_candid candid_barack barack_obama obama_race race_march march_cowart cowart_plead plead_guilti guilti_conspiraci conspiraci_threaten threaten_kill kill_inflict inflict_bodili bodili_harm harm_upon upon_major major_candid candid_presid presid_unit unit_state state_interst interst_transport transport_short short_barrel barrel_shotgun shotgun_interst interst_transport transport_firearm firearm_purpos purpos_commit commit_feloni feloni_unlicens unlicens_transport transport_unauthor unauthor_short short_barrel barrel_shotgun shotgun_possess possess_short short_barrel barrel_shotgun shotgun_intent intent_damag damag_religi religi_real real_properti properti_discharg discharg_firearm firearm_relat relat_crime crime_violenc violenc_cowart cowart_bell bell_tenn tenn_admit admit_conspir conspir_paul paul_schlesselman schlesselman_west west_helena helena_engag engag_kill kill_spree spree_specif specif_target target_african african_american american_acknowledg acknowledg_intend intend_culmin culmin_attack attack_assassin assassin_presid presid_obama obama_senat senat_presidenti presidenti_candid candid_time time_conspiraci conspiraci_cowart cowart_admit admit_schlesselman schlesselman_also also_conspir conspir_burglar burglar_feder feder_licens licens_firearm firearm_dealer dealer_obtain obtain_addit addit_weapon weapon_scheme scheme_also also_admit admit_transport transport_saw saw_shotgun shotgun_arkansa arkansa_tennesse tennesse_purpos purpos_commit commit_feloni feloni_cowart cowart_addit addit_admit admit_shoot shoot_window window_allen allen_baptist baptist_church church_brownsvil brownsvil_tenn tenn_plea plea_agreement agreement_cowart cowart_agre agre_appropri appropri_sentenc sentenc_would would_twelv twelv_eighteen eighteen_year year_charg charg_plead plead_guilti guilti_carri carri_minimum minimum_sentenc sentenc_year year_maximum maximum_sentenc sentenc_year year_prison prison_threat threat_violenc violenc_fuel fuel_bigotri bigotri_hate hate_place place_unit unit_state state_america america_toler toler_said said_thoma thoma_perez perez_gener gener_although although_heroic heroic_intervent intervent_enforc enforc_spare spare_tragedi tragedi_conspiraci conspiraci_associ associ_crime crime_demand demand_sever sever_sentenc sentenc_sentenc sentenc_impos impos_constitut constitut_seriou seriou_punish punish_seriou seriou_crime crime_thank thank_defend defend_abl abl_execut execut_violent violent_scheme scheme_nevertheless nevertheless_grave grave_matter matter_judg judg_breen breen_sentenc sentenc_reflect reflect_crime crime_magnitud magnitud_demand demand_stiff stiff_penalti penalti_said said_edward edward_stanton stanton_western western_tennesse tennesse_would would_like like_recogn recogn_extraordinari extraordinari_dilig dilig_crockett crockett_counti counti_sheriff sheriff_bureau bureau_alcohol alcohol_tobacco tobacco_firearm firearm_secret secret_servic servic_cowart cowart_defend defend_paul paul_schlesselman schlesselman_plead plead_guilti guilti_count count_conspiraci conspiraci_count count_threaten threaten_kill kill_inflict inflict_bodili bodili_harm harm_upon upon_presidenti presidenti_candid candid_count count_possess possess_firearm firearm_further further_crime crime_violenc violenc_schlesselman schlesselman_sentenc sentenc_year year_prison prison_april april_investig investig_bureau bureau_alcohol alcohol_tobacco tobacco_firearm firearm_explos explos_secret secret_servic servic_crockett crockett_counti counti_sheriff sheriff_prosecut prosecut_attorney attorney_larri larri_laurenzi laurenzi_jame jame_powel powel_jonathan jonathan_skrmetti
564,10635,11-1531,Seven Ohio Men Arrested for Hate Crime Attacks Against Amish Men,"CLEVELAND – Seven Ohio men were arrested today on charges that they committed and conspired to commit religiously-motivated physical assaults in violation of the Matthew Shepard-James Byrd Hate Crimes Prevention Act. The arrests were announced today by Thomas E. Perez, Assistant Attorney General for the Civil Rights Division and Steven M. Dettelbach, U.S. Attorney for the Northern District of Ohio. The criminal complaint, filed in Cleveland, charges Samuel Mullet Sr., Johnny S. Mullet, Daniel S. Mullet, Levi F. Miller, Eli M. Miller and Emanuel Schrock, all of Bergholz, Ohio; and Lester S. Mullet, of Hammondsville, Ohio, with willfully causing bodily injury to any person, or attempting to do so by use of a dangerous weapon, because of the actual or perceived religion of that person. The maximum potential penalty for these violations is life in prison. According to the affidavit filed in support of the arrest warrants, the defendants conspired to carry out a series of assaults against fellow Amish individuals with whom they were having a religiously-based dispute. In doing so, the defendants forcibly restrained multiple Amish men and cut off their beards and head hair with scissors and battery-powered clippers, causing bodily injury to these men while also injuring others who attempted to stop the attacks. In the Amish religion, a man’s beard and head hair are sacred. This case is being investigated by the Cleveland Division of the FBI and is being prosecuted by Assistant U.S. Attorney Bridget M. Brennan of the U.S. Attorney’s Office for the Northern District of Ohio and Deputy Chief Kristy Parker of the Civil Rights Division’s Criminal Section. A criminal complaint is merely an accusation. All defendants are presumed innocent of the charges until proven guilty beyond a reasonable doubt in court.",2011-11-23T00:00:00-05:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,0.282,0.686,0.032,-0.9968,cleveland seven ohio arrest today charg commit conspir commit religi motiv physic assault violat matthew shepard jame byrd hate crime prevent arrest announc today thoma perez gener steven dettelbach northern ohio crimin complaint file cleveland charg samuel mullet johnni mullet daniel mullet levi miller miller emanuel schrock bergholz ohio lester mullet hammondsvil ohio will caus bodili injuri person attempt danger weapon actual perceiv religion person maximum potenti penalti violat life prison accord affidavit file support arrest warrant defend conspir carri seri assault fellow amish individu religi base disput defend forcibl restrain multipl amish beard head hair scissor batteri power clipper caus bodili injuri also injur other attempt stop attack amish religion beard head hair sacr investig cleveland prosecut bridget brennan northern ohio deputi chief kristi parker crimin section crimin complaint mere accus defend presum innoc charg proven guilti beyond reason doubt court,cleveland_seven seven_ohio ohio_arrest arrest_today today_charg charg_commit commit_conspir conspir_commit commit_religi religi_motiv motiv_physic physic_assault assault_violat violat_matthew matthew_shepard shepard_jame jame_byrd byrd_hate hate_crime crime_prevent prevent_arrest arrest_announc announc_today today_thoma thoma_perez perez_gener gener_steven steven_dettelbach dettelbach_northern northern_ohio ohio_crimin crimin_complaint complaint_file file_cleveland cleveland_charg charg_samuel samuel_mullet mullet_johnni johnni_mullet mullet_daniel daniel_mullet mullet_levi levi_miller miller_miller miller_emanuel emanuel_schrock schrock_bergholz bergholz_ohio ohio_lester lester_mullet mullet_hammondsvil hammondsvil_ohio ohio_will will_caus caus_bodili bodili_injuri injuri_person person_attempt attempt_danger danger_weapon weapon_actual actual_perceiv perceiv_religion religion_person person_maximum maximum_potenti potenti_penalti penalti_violat violat_life life_prison prison_accord accord_affidavit affidavit_file file_support support_arrest arrest_warrant warrant_defend defend_conspir conspir_carri carri_seri seri_assault assault_fellow fellow_amish amish_individu individu_religi religi_base base_disput disput_defend defend_forcibl forcibl_restrain restrain_multipl multipl_amish amish_beard beard_head head_hair hair_scissor scissor_batteri batteri_power power_clipper clipper_caus caus_bodili bodili_injuri injuri_also also_injur injur_other other_attempt attempt_stop stop_attack attack_amish amish_religion religion_beard beard_head head_hair hair_sacr sacr_investig investig_cleveland cleveland_prosecut prosecut_bridget bridget_brennan brennan_northern northern_ohio ohio_deputi deputi_chief chief_kristi kristi_parker parker_crimin crimin_section section_crimin crimin_complaint complaint_mere mere_accus accus_defend defend_presum presum_innoc innoc_charg charg_proven proven_guilti guilti_beyond beyond_reason reason_doubt doubt_court
...,...,...,...,...,...,...,...,...,...,...,...,...,...
392,7594,16-539,"Justice Department Statements Regarding Court Approval of the Agreement with Newark, New Jersey, to Reform Unconstitutional Policing Practices","Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division, and U.S. Attorney Paul J. Fishman of the District of New Jersey released the following statements regarding the U.S. District Court for the District of New Jersey’s approval of the department’s agreement with the city of Newark, New Jersey, to reform the police department’s unconstitutional practices: “We appreciate the court’s swift approval of the Justice Department’s consent decree with the city of Newark,” said Principal Deputy Assistant Attorney General Gupta. “This agreement will help the Newark Police Department reform policies, improve systems and rebuild trust between officers and the community they serve. As Newark implements this agreement, we will continue to work closely with city officials, law enforcement and community members to put in place the necessary changes that can make Newark a national model for constitutional, effective and accountable policing. Once fully implemented, these reforms will make all of those in Newark – officers and civilians alike – safer. And these reforms will ensure that law enforcement in Newark complies with the Constitution and safeguards the civil rights of every Newark resident.” “This consent decree, now approved by the court, provides a roadmap for reform in Newark and a model for best practices for police departments across the country,” said U.S. Attorney Fishman. “Implementing the systemic changes outlined in the consent decree will take time, but this is what the city of Newark and the men and women who serve in the Police department want and need, and it is what the people of Newark deserve: a first-class police department that keeps them safe and respects their constitutional rights.”",2016-05-06T00:00:00-04:00,Civil Rights,Civil Rights Division; Civil Rights - Special Litigation Section; USAO - New Jersey,0.000,0.825,0.175,0.9854,princip deputi gener vanita gupta head paul fishman jersey releas follow statement regard court jersey approv agreement citi newark jersey reform polic unconstitut practic appreci court swift approv consent decre citi newark said princip deputi gener gupta agreement help newark polic reform polici improv system rebuild trust offic commun serv newark implement agreement continu work close citi offici enforc commun member place necessari chang make newark nation model constitut effect account polic fulli implement reform make newark offic civilian alik safer reform ensur enforc newark compli constitut safeguard everi newark resid consent decre approv court provid roadmap reform newark model best practic polic depart across countri said fishman implement system chang outlin consent decre take time citi newark women serv polic want need peopl newark deserv first class polic keep safe respect constitut,princip_deputi deputi_gener gener_vanita vanita_gupta gupta_head head_paul paul_fishman fishman_jersey jersey_releas releas_follow follow_statement statement_regard regard_court court_jersey jersey_approv approv_agreement agreement_citi citi_newark newark_jersey jersey_reform reform_polic polic_unconstitut unconstitut_practic practic_appreci appreci_court court_swift swift_approv approv_consent consent_decre decre_citi citi_newark newark_said said_princip princip_deputi deputi_gener gener_gupta gupta_agreement agreement_help help_newark newark_polic polic_reform reform_polici polici_improv improv_system system_rebuild rebuild_trust trust_offic offic_commun commun_serv serv_newark newark_implement implement_agreement agreement_continu continu_work work_close close_citi citi_offici offici_enforc enforc_commun commun_member member_place place_necessari necessari_chang chang_make make_newark newark_nation nation_model model_constitut constitut_effect effect_account account_polic polic_fulli fulli_implement implement_reform reform_make make_newark newark_offic offic_civilian civilian_alik alik_safer safer_reform reform_ensur ensur_enforc enforc_newark newark_compli compli_constitut constitut_safeguard safeguard_everi everi_newark newark_resid resid_consent consent_decre decre_approv approv_court court_provid provid_roadmap roadmap_reform reform_newark newark_model model_best best_practic practic_polic polic_depart depart_across across_countri countri_said said_fishman fishman_implement implement_system system_chang chang_outlin outlin_consent consent_decre decre_take take_time time_citi citi_newark newark_women women_serv serv_polic polic_want want_need need_peopl peopl_newark newark_deserv deserv_first first_class class_polic polic_keep keep_safe safe_respect respect_constitut
72,1857,17-271,"Court Approves Desegregation Plan for Cleveland, Mississippi, Schools","Cleveland School District to Open Consolidated Middle and High Schools by August 2017 U.S. District Court Judge Debra M. Brown of the Northern District of Mississippi today approved a joint settlement agreement filed on Feb. 8 by the Justice Department, private plaintiffs, and the Cleveland School District. The agreement will lead to the effective desegregation of Cleveland’s middle and high schools by the start of the next school year. Under the terms approved today, the school district agrees to comply with a May 13, 2016 court ruling mandating consolidation of Cleveland middle and high schools to remedy decades-long segregation in the school district. The consolidated high school, to be named Cleveland Central High School, will open by August at the current Margaret Green/Cleveland High campus. Also by August, the district will open the consolidated middle school (seventh and eighth grades), Cleveland Central Middle School, at the current East Side High facility. Under the agreement, sixth grade students will attend district elementary schools rather than the consolidated middle school. As part of the agreement, the district and plaintiffs have withdrawn all alternative desegregation proposals from consideration by the Court. The district has also withdrawn its pending appeal before the U.S. Court of Appeals for the Fifth Circuit. “The Department is pleased to have reached agreement with the Cleveland School District and private plaintiffs to settle this decades-long litigation,” said Acting Assistant Attorney General Tom Wheeler of the Justice Department’s Civil Rights Division. “The plan approved today allows the community to move forward together. It reflects the parties’ shared commitment to high quality equal educational opportunities for all Cleveland students.” Additional information is available on the Justice Department’s website at: www.justice.gov/opa/pr/federal-court-orders-justice-department-desegregation-plan-cleveland-mississippi-schools. Promoting school desegregation and enforcing Title IV of the Civil Rights Act of 1964 is a top priority of the Justice Department’s Civil Rights Division. Additional information about the Civil Rights Division is available on its website at www.justice.gov/crt.",2017-03-13T00:00:00-04:00,Civil Rights,"Civil Rights Division; Civil Rights - Educational Opportunities Section; USAO - Louisiana, Eastern; USAO - Mississippi, Northern",0.000,0.835,0.165,0.9909,cleveland school open consolid middl high school august court judg debra brown northern mississippi today approv joint settlement agreement file privat plaintiff cleveland school agreement lead effect desegreg cleveland middl high school start next school year term approv today school agre compli court rule mandat consolid cleveland middl high school remedi decad long segreg school consolid high school name cleveland central high school open august current margaret green cleveland high campu also august open consolid middl school seventh eighth grade cleveland central middl school current east side high facil agreement sixth grade student attend elementari school rather consolid middl school part agreement plaintiff withdrawn altern desegreg propos consider court also withdrawn pend appeal court appeal fifth circuit pleas reach agreement cleveland school privat plaintiff settl decad long litig said act gener wheeler plan approv today allow commun move forward togeth reflect parti share commit high qualiti equal educ opportun cleveland student addit inform avail websit feder court order desegreg plan cleveland mississippi school promot school desegreg enforc titl prioriti addit inform avail websit,cleveland_school school_open open_consolid consolid_middl middl_high high_school school_august august_court court_judg judg_debra debra_brown brown_northern northern_mississippi mississippi_today today_approv approv_joint joint_settlement settlement_agreement agreement_file file_privat privat_plaintiff plaintiff_cleveland cleveland_school school_agreement agreement_lead lead_effect effect_desegreg desegreg_cleveland cleveland_middl middl_high high_school school_start start_next next_school school_year year_term term_approv approv_today today_school school_agre agre_compli compli_court court_rule rule_mandat mandat_consolid consolid_cleveland cleveland_middl middl_high high_school school_remedi remedi_decad decad_long long_segreg segreg_school school_consolid consolid_high high_school school_name name_cleveland cleveland_central central_high high_school school_open open_august august_current current_margaret margaret_green green_cleveland cleveland_high high_campu campu_also also_august august_open open_consolid consolid_middl middl_school school_seventh seventh_eighth eighth_grade grade_cleveland cleveland_central central_middl middl_school school_current current_east east_side side_high high_facil facil_agreement agreement_sixth sixth_grade grade_student student_attend attend_elementari elementari_school school_rather rather_consolid consolid_middl middl_school school_part part_agreement agreement_plaintiff plaintiff_withdrawn withdrawn_altern altern_desegreg desegreg_propos propos_consider consider_court court_also also_withdrawn withdrawn_pend pend_appeal appeal_court court_appeal appeal_fifth fifth_circuit circuit_pleas pleas_reach reach_agreement agreement_cleveland cleveland_school school_privat privat_plaintiff plaintiff_settl settl_decad decad_long long_litig litig_said said_act act_gener gener_wheeler wheeler_plan plan_approv approv_today today_allow allow_commun commun_move move_forward forward_togeth togeth_reflect reflect_parti parti_share share_commit commit_high high_qualiti qualiti_equal equal_educ educ_opportun opportun_cleveland cleveland_student student_addit addit_inform inform_avail avail_websit websit_feder feder_court court_order order_desegreg desegreg_plan plan_cleveland cleveland_mississippi mississippi_school school_promot promot_school school_desegreg desegreg_enforc enforc_titl titl_prioriti prioriti_addit addit_inform inform_avail avail_websit
324,6787,17-132,Justice Department Reaches Agreement with St. James Parish Louisiana School District to Desegregate Schools,"The Department of Justice has reached an agreement with the St. James Parish School District in Louisiana that upon completion will end court supervision of the district’s schools. The consent order, approved yesterday by the U.S. District Court for the Eastern District of Louisiana, addresses all remaining issues in the school desegregation case, and when fully implemented will lead to the closing of that case. The consent order, negotiated with the school district and private plaintiffs, represented by the NAACP Legal Defense and Educational Fund, puts the district on a path to full unitary status within three years provided it: The consent order declares that the district has already met its desegregation obligations in the area of transportation. The court will retain jurisdiction over the consent order during its implementation, and the Justice Department will monitor the district’s compliance. “We are pleased to have worked hand-in-hand with the schools to ensure equal and fair treatment for the students of the St. James Parish School District,” said Acting Assistant Attorney General Tom Wheeler of the Civil Rights Division. “We look forward to working with the district and private plaintiffs to implement the consent order and bring this case to a successful close.” Promoting school desegregation and enforcing Title IV of the Civil Rights Act of 1964 is a top priority of the Justice Department’s Civil Rights Division. Additional information about the Civil Rights Division is available on its website at www.justice.gov/crt. St. James Parish Consent Order",2017-01-31T00:00:00-05:00,Civil Rights,Civil Rights Division; Civil Rights - Educational Opportunities Section,0.000,0.838,0.162,0.9812,reach agreement jame parish school louisiana upon complet court supervis school consent order approv yesterday court eastern louisiana address remain issu school desegreg fulli implement lead close consent order negoti school privat plaintiff repres naacp legal defens educ fund put path full unitari statu within three year provid consent order declar alreadi desegreg oblig area transport court retain jurisdict consent order implement monitor complianc pleas work hand hand school ensur equal fair treatment student jame parish school said act gener wheeler look forward work privat plaintiff implement consent order bring success close promot school desegreg enforc titl prioriti addit inform avail websit jame parish consent order,reach_agreement agreement_jame jame_parish parish_school school_louisiana louisiana_upon upon_complet complet_court court_supervis supervis_school school_consent consent_order order_approv approv_yesterday yesterday_court court_eastern eastern_louisiana louisiana_address address_remain remain_issu issu_school school_desegreg desegreg_fulli fulli_implement implement_lead lead_close close_consent consent_order order_negoti negoti_school school_privat privat_plaintiff plaintiff_repres repres_naacp naacp_legal legal_defens defens_educ educ_fund fund_put put_path path_full full_unitari unitari_statu statu_within within_three three_year year_provid provid_consent consent_order order_declar declar_alreadi alreadi_desegreg desegreg_oblig oblig_area area_transport transport_court court_retain retain_jurisdict jurisdict_consent consent_order order_implement implement_monitor monitor_complianc complianc_pleas pleas_work work_hand hand_hand hand_school school_ensur ensur_equal equal_fair fair_treatment treatment_student student_jame jame_parish parish_school school_said said_act act_gener gener_wheeler wheeler_look look_forward forward_work work_privat privat_plaintiff plaintiff_implement implement_consent consent_order order_bring bring_success success_close close_promot promot_school school_desegreg desegreg_enforc enforc_titl titl_prioriti prioriti_addit addit_inform inform_avail avail_websit websit_jame jame_parish parish_consent consent_order
346,6981,17-003,Justice Department Releases Report on Civil Rights Division’s Pattern and Practice Police Reform Work,"The Justice Department released a comprehensive report today that provides an overview of the Civil Rights Division’s police reform work under Section 14141 of the Violent Crime Control and Law Enforcement Act of 1994. The report, “The Civil Rights Division’s Pattern and Practice Police Reform Work: 1994-Present,” is designed to serve as a resource for local law enforcement agencies and communities by making the division’s police reform work more accessible and transparent. It examines a range of topics, including the history and purpose of Section 14141, initiation and methodology of pattern-or-practice investigations, negotiation of reform agreements, the current reform model and its rationale, conclusion of agreements and the impact of pattern-or-practice enforcement on police reform and community-police trust. To supplement the report, the division also published an interactive Police Reform Finder, which allows users to search how reform agreements have addressed specific kinds of policing issues. “Over the years, countless law enforcement officials and community members have requested additional information about the Civil Rights Division’s policing work,” said Principal Deputy Assistant Attorney General Vanita Gupta, head of the Civil Rights Division. “We hope stakeholders find our report and interactive tool useful in our collective efforts to advance constitutional policing, strengthen police-community trust and promote officer and public safety.” Since 2009, the Civil Rights Division has opened 25 investigations into law enforcement agencies and is currently enforcing 19 agreements, including 14 consent decrees and one post-judgment order. Police Reform Report Police Reform Finder Police Reform Accomplishments",2017-01-04T00:00:00-05:00,Civil Rights,Civil Rights Division; Civil Rights - Special Litigation Section,0.000,0.861,0.139,0.9766,releas comprehens report today provid overview polic reform work section violent crime control enforc report pattern practic polic reform work present design serv resourc local enforc agenc commun make polic reform work access transpar examin rang topic includ histori purpos section initi methodolog pattern practic investig negoti reform agreement current reform model rational conclus agreement impact pattern practic enforc polic reform commun polic trust supplement report also publish interact polic reform finder allow user search reform agreement address specif kind polic issu year countless enforc offici commun member request addit inform polic work said princip deputi gener vanita gupta head hope stakehold find report interact tool use collect effort advanc constitut polic strengthen polic commun trust promot offic public safeti sinc open investig enforc agenc current enforc agreement includ consent decre post judgment order polic reform report polic reform finder polic reform accomplish,releas_comprehens comprehens_report report_today today_provid provid_overview overview_polic polic_reform reform_work work_section section_violent violent_crime crime_control control_enforc enforc_report report_pattern pattern_practic practic_polic polic_reform reform_work work_present present_design design_serv serv_resourc resourc_local local_enforc enforc_agenc agenc_commun commun_make make_polic polic_reform reform_work work_access access_transpar transpar_examin examin_rang rang_topic topic_includ includ_histori histori_purpos purpos_section section_initi initi_methodolog methodolog_pattern pattern_practic practic_investig investig_negoti negoti_reform reform_agreement agreement_current current_reform reform_model model_rational rational_conclus conclus_agreement agreement_impact impact_pattern pattern_practic practic_enforc enforc_polic polic_reform reform_commun commun_polic polic_trust trust_supplement supplement_report report_also also_publish publish_interact interact_polic polic_reform reform_finder finder_allow allow_user user_search search_reform reform_agreement agreement_address address_specif specif_kind kind_polic polic_issu issu_year year_countless countless_enforc enforc_offici offici_commun commun_member member_request request_addit addit_inform inform_polic polic_work work_said said_princip princip_deputi deputi_gener gener_vanita vanita_gupta gupta_head head_hope hope_stakehold stakehold_find find_report report_interact interact_tool tool_use use_collect collect_effort effort_advanc advanc_constitut constitut_polic polic_strengthen strengthen_polic polic_commun commun_trust trust_promot promot_offic offic_public public_safeti safeti_sinc sinc_open open_investig investig_enforc enforc_agenc agenc_current current_enforc enforc_agreement agreement_includ includ_consent consent_decre decre_post post_judgment judgment_order order_polic polic_reform reform_report report_polic polic_reform reform_finder finder_polic polic_reform reform_accomplish


In [30]:
doj_subset_wscore.loc[doj_subset_wscore.id == '16-217'][['id','processed_text','processed_text_bigram']]

Unnamed: 0,id,processed_text,processed_text_bigram
313,16-217,reach comprehens settlement agreement citi miami miami polic resolv offic involv shoot offic announc princip deputi gener vanita gupta head wifredo ferrer southern florida settlement approv miami citi commiss today effect agreement sign parti resolv claim stem offic involv shoot offic conduct violent crime control enforc find issu juli identifi pattern practic excess forc offic involv shoot violat fourth amend constitut citi complianc settlement monitor independ review former tampa florida polic chief jane castor settlement agreement citi implement comprehens reform ensur constitut polic support public trust settlement agreement design minim offic involv shoot effect quickli investig offic involv shoot occur measur includ settlement repres renew commit citi miami chief rodolfo llane provid constitut polic miami resid protect public safeti sustain reform said princip deputi gener gupta agreement help strengthen relationship commun serv improv account offic fire weapon unlaw provid commun particip enforc agreement today agreement result joint effort citi miami ensur miami polic continu effort make commun safe protect sacr constitut citizen said ferrer oversight commun agreement seek make perman posit chang former chief orosa chief llane made applaud citi commiss vote settlement agreement build upon import reform implement citi sinc issu find includ conduct attorney staff special litig section southern florida,reach_comprehens comprehens_settlement settlement_agreement agreement_citi citi_miami miami_miami miami_polic polic_resolv resolv_offic offic_involv involv_shoot shoot_offic offic_announc announc_princip princip_deputi deputi_gener gener_vanita vanita_gupta gupta_head head_wifredo wifredo_ferrer ferrer_southern southern_florida florida_settlement settlement_approv approv_miami miami_citi citi_commiss commiss_today today_effect effect_agreement agreement_sign sign_parti parti_resolv resolv_claim claim_stem stem_offic offic_involv involv_shoot shoot_offic offic_conduct conduct_violent violent_crime crime_control control_enforc enforc_find find_issu issu_juli juli_identifi identifi_pattern pattern_practic practic_excess excess_forc forc_offic offic_involv involv_shoot shoot_violat violat_fourth fourth_amend amend_constitut constitut_citi citi_complianc complianc_settlement settlement_monitor monitor_independ independ_review review_former former_tampa tampa_florida florida_polic polic_chief chief_jane jane_castor castor_settlement settlement_agreement agreement_citi citi_implement implement_comprehens comprehens_reform reform_ensur ensur_constitut constitut_polic polic_support support_public public_trust trust_settlement settlement_agreement agreement_design design_minim minim_offic offic_involv involv_shoot shoot_effect effect_quickli quickli_investig investig_offic offic_involv involv_shoot shoot_occur occur_measur measur_includ includ_settlement settlement_repres repres_renew renew_commit commit_citi citi_miami miami_chief chief_rodolfo rodolfo_llane llane_provid provid_constitut constitut_polic polic_miami miami_resid resid_protect protect_public public_safeti safeti_sustain sustain_reform reform_said said_princip princip_deputi deputi_gener gener_gupta gupta_agreement agreement_help help_strengthen strengthen_relationship relationship_commun commun_serv serv_improv improv_account account_offic offic_fire fire_weapon weapon_unlaw unlaw_provid provid_commun commun_particip particip_enforc enforc_agreement agreement_today today_agreement agreement_result result_joint joint_effort effort_citi citi_miami miami_ensur ensur_miami miami_polic polic_continu continu_effort effort_make make_commun commun_safe safe_protect protect_sacr sacr_constitut constitut_citizen citizen_said said_ferrer ferrer_oversight oversight_commun commun_agreement agreement_seek seek_make make_perman perman_posit posit_chang chang_former former_chief chief_orosa orosa_chief chief_llane llane_made made_applaud applaud_citi citi_commiss commiss_vote vote_settlement settlement_agreement agreement_build build_upon upon_import import_reform reform_implement implement_citi citi_sinc sinc_issu issu_find find_includ includ_conduct conduct_attorney attorney_staff staff_special special_litig litig_section section_southern southern_florida


C. Use the create_dtm function and the `processed_text_bigrams` column to create a document-term matrix (`dtm_bigram`) with these bigrams. Keep the following three columns in the data: `id`, `topics_clean`, and `compound` 

D. Print the 
 (1) dimensions of the `dtm` matrix from question 2.2  and 
 (2) the dimensions of the `dtm_bigram` matrix. Comment on why the bigram matrix has more dimensions than the unigram matrix 

In [31]:
## C. 
doj_subset_wscore_bigram = doj_subset_wscore.loc[~doj_subset_wscore.processed_text_bigram.isnull(),
           ['id', 'compound', 'processed_text_bigram','topics_clean']].copy().sample(n =700, random_state = 9899)

 
dtm_bigram = create_dtm(list_of_strings= doj_subset_wscore_bigram.processed_text_bigram,
                metadata = 
                doj_subset_wscore_bigram[['id', 'compound', 'topics_clean']])

dtm_bigram.head()

Unnamed: 0,index,metadata_id,metadata_compound,metadata_topics_clean,aaron_ford,aaron_latham,aaron_mcgrath,aaron_parrish,aaron_polster,aaron_rice,...,zone_ordin,zone_practic,zone_religi,zone_restrict,zone_student,zone_varianc,zunggeemog_noel,zunggeemog_prompt,zunggeemog_write,zwengel_princeton
0,311,16-529,0.9727,Civil Rights,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,135,15-1290,-0.9524,Project Safe Childhood,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,37,17-1277,-0.9775,Hate Crimes,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,146,17-181,-0.9781,Civil Rights,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,120,17-160,-0.9744,Hate Crimes,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
## D. Print the dimension

# Dimension of dtm_nopre in 2.2
dtm_nopre.shape
# Dimension of dtm_bigram
dtm_bigram.shape

(700, 6881)

(700, 71717)

It makes sense that there would be more dimensions because pairs of words will be higher than the raw number of words.

E. Find and print the 10 most prevelant bigrams for each of the three topics_clean using the `get_topwords` function from 2.2


In [33]:
## E. Print the 10 most prevelant bigrams for each of the three topics_clean
Civil_Rights_bigram = dtm_bigram.loc[dtm_bigram.metadata_topics_clean == 'Civil Rights']
get_topwords(Civil_Rights_bigram)
Project_Safe_Childhood_bigram = dtm_bigram.loc[dtm_bigram.metadata_topics_clean == 'Project Safe Childhood']
get_topwords(Project_Safe_Childhood_bigram)
Hate_Crimes_bigram = dtm_bigram.loc[dtm_bigram.metadata_topics_clean == 'Hate Crimes']
get_topwords(Hate_Crimes_bigram)

fair_hous         219
deputi_gener      214
princip_deputi    214
vanita_gupta      195
gupta_head        193
gener_vanita      192
said_princip      180
unit_state        153
nation_origin     141
consent_decre     126
dtype: int64

safe_childhood       461
project_safe         460
child_pornographi    431
child_exploit        277
sexual_exploit       219
exploit_children     196
plead_guilti         191
exploit_obscen       172
obscen_section       171
child_sexual         170
dtype: int64

hate_crime          378
african_american    361
plead_guilti        271
year_prison         158
special_agent       124
racial_motiv        114
thoma_perez         109
grand_juri          100
perez_gener          93
said_thoma           89
dtype: int64

# 4. Optional extra credit 1 (1 point)

You notice that the pharmaceutical kickbacks press release we analyzed in question 1 was for an indictment, and that in the original data, there's not a clear label for whether a press release outlines an indictment (charging someone with a crime), a conviction (convicting them after that charge either via a settlement or trial), or a sentencing (how many years of prison or supervised release a defendant is sentenced to after their conviction).

You want to see if you can identify pairs of press releases where one press release is from one stage (e.g., indictment) and another is from a different stage (e.g., a sentencing).

You decide that one way to approach is to find the pairwise string similarity between each of the processed press releases in `doj_subset`. There are many ways to do this, so Google for some approaches, focusing on ones that work well for entire documents rather than small strings.

Find the top two pairs (so four press releases total)-- do they seem like different stages of the same crime or just press releases covering similar crimes?

In [None]:
## your code here

# 5. Optional extra credit 2 (3 points)

Review the scraping code here: https://github.com/jbencina/dojreleases/blob/master/scraper.py
    
Write code to scrape press releases from more recent years from the DOJ website than the years available in the combined.json and produce a visualization of how top words or themes in the press releases changed across the Trump administration (2016-December 2020) versus Biden administration (January 2021 onwards)- you can do this in a .py file that you submit separately and just read in the data produced by that scraping

In [None]:
## your code here