author: Ben P. Meredith, Ed.D., February 2019

# *Pulling and Processing PDFs *

This is a KEY piece of code. This code 
1. pulls a .pdf from a directory
1. rejects the .pdf if it has "._"_ in the title
1. pulls the text from the .pdf
1. cleans the text
1. tokenizes the text by sentence
1. searches each sentence for our designated list of keywords
1. saves the sentences that contain one of our keywords in a dataframe
    - saves the name of the file for reference
    - saves the keyword in the same row for reference
    - saves the sentence in the same row



# Load Libraries

`__nlp_init_nsv__.py` is a local library that I produced. It is necessary to run this program. It will take a moment to run as it is rather long. It is loading a lot of things. 

In [4]:
%run lib/__nlp_init_nsv__.py


 Welcome to our Natural Language Processing Library. I am loading at the moment and will be done in a nanosecond... 

******************************************************************************** 

nltk imported
nltk SnowballStemmer imported
nltk WordNetLemmatizer imported
spacy imported
Pattern parse imported
Pattern imported
Pandas imported as pd
Numpy imported as np
re imported
BeautifulSoup imported
CONTRACTION_MAP imported
Unicodedata imported.
PyPDF2 imported.
Gensim's summarize imported.
Textract imported.
Gensim's keywords imported
Pyphen imported.
Textblob imported.
Logging imported logging for summarizing and reading text from PDFs
sys imported.
Counter imported.
NLTK's sent_tokenize imported.
NLTK's word_tokenize imported.

 __________________________________________________ 

	Our libraries are loaded correctly.
 __________________________________________________
DONE! Phew, I am good! Record Time!


I also loaded the following functions:

	pull_all_text(df)
	add_to_lib

# The Real Work of the Program

In [5]:
# Pulling only those files with a .pdf extention

import os

df = pd.DataFrame(columns=['1'])
i = 0

# directory is where you point Python to where the .pdfs are stored in a blob
directory = 'test_pdfs/'

#keywords are those words or parts of words for which we are searching
keywords = ['correlat', 
     'male', 
     'race', 
     'ethn', 
     'will be used',
     'female',
     'youth',
     'weapon',
     'adolescent',
     'gun',
     'knife',
     ' caus', 
    ]

for root, dirs, files in os.walk(directory):
    for file in files:
        if file.endswith(".pdf"):
#             print(os.path.join(root, file)) ## This line is only used to check our work
            if '._' in file:
                # This line tells us which files do not make the criteria for being processed
                # This printed line is NOT saved. 
                print('\n\n',(os.path.join(root, file)),'ignored.', '\n\n')
                pass
            else:
                #if there is a unicodedecode error because of the .pdf, this will skip the bad file. 
                try:
                    s = str(directory+file)
                    # This line tells us which files are being processed. It is not saved. 
                    print('\n\n',s, 'being processed.','\n\n')
                    
                    # import the text from the .pdfs, convert it to a string, and clean it
                    text = import_pdf(s)
                    text = str(text)
                    text = clean_the_text(text) #uses my internal library from __nlp_init_nsv__.py
                    
                    #Where the real work is done
                    for term in keywords:
                        for sentence in sent_tokenize(text):
                            if term in sentence:
                                sentence = clean_the_text(sentence)
                                print(term, '\n', sentence)
                                df.loc[i, 'Article'] = s
                                df.loc[i, 'Keyword'] = term
                                df.loc[i, 'Sentence'] = sentence
                                i += 1
                except UnicodeDecodeError:
                    print('\n\n',(os.path.join(root, file)),'ignored.', '\n\n')
                    continue
                            
                            
df



 test_pdfs/The Revenge Of The Lost Boys.pdf being processed. 



 CLEANING THE TEXT

 CLEANING THE TEXT
male 
 Although mass killers understandably seize our imaginations and dominate the media, and not all dysfunctional young males are violent and not all of them gain the publicity they crave.

 CLEANING THE TEXT
male 
 What they all have in common is their gender (male), their race (most are white), and their youth (almost all under 30 at their peak destructiveness).

 CLEANING THE TEXT
male 
 Some of them kill, but others lash out in other, more creative http://thefederalist.com/2015/07/09/the-revenge-of-the-lost-boys/ 1/9 3/24/2018 The Revenge Of The Lost Boys ways: whether its Edward Snowden deciding only he could save America from the scourge of surveillance, or Bowe Bergdahl walking away from his post to personally solve the war in Afghanistan, the combination of immaturity and grandiosity among these young males is jaw-dropping in its scale even when it is not expressed throu


 CLEANING THE TEXT

 CLEANING THE TEXT
correlat 
 xeex80x82ize of the dotxeex80x84 correlatexeex80x84 to the numxeex80x83er of victimxeex80x84.

 CLEANING THE TEXT
gun 
 We do not track incidentxeex80x84 in which the onlxeex80x86 xeex80x84hotxeex80x84 fired were from a perxeex80x84on authorized to carrxeex80x86 a gun and who did xeex80x84o in their official capacitxeex80x86.


 test_pdfs/._The School Shootings of 2018_ What's Behind the Numbers - Education Week.pdf ignored. 




 test_pdfs/WhatisSchoolViolenceHenryPre-publicationdraftANNALSv5672000pp.16-29.pdf being processed. 



 CLEANING THE TEXT

 CLEANING THE TEXT
correlat 
 However, the correlation between these three dimensions certainly is not perfect, and .

 CLEANING THE TEXT
correlat 
 Indeed, it is just such harm perpetrated by the structurally powerful in schools that is correlated with high levels of incidence of violence by students.

 CLEANING THE TEXT
male 
 For example, gender discrimination has been shown to create 


 CLEANING THE TEXT
adolescent 
 Indeed, as Garbarino (1999) says when the perpetrators of lethal school violence are middle-class, white teenagers from small towns or suburbs, their crimes make national and international news, but, Most of us never heard about the adolescents who shot and killed other kids in the inner city neighborhoods of Houston, Chicago, New York, Los Angeles and Detroit.

 CLEANING THE TEXT
gun 
 Yet, like airline crashes, any analysis of school violence that simply looks at one factor, such as human fallibility, gun availability or cultural toxicity, is in grave danger of missing the point, and risks failing to prevent future disasters.

 CLEANING THE TEXT
gun 
 For instance, student reports of drug availability, street gang presence, and gun presence at school were all related to student reports of having experienced violent victimization at school (Bureau of Justice 1998b, 1).

 CLEANING THE TEXT
gun 
 Level 4 Violence: State and national educational policy on


 CLEANING THE TEXT


 test_pdfs/._Educational_Data_Mining_Application_for_Estimating.pdf ignored. 




 test_pdfs/EU SCHOOL VIOLENCER.pdf being processed. 



 CLEANING THE TEXT

 CLEANING THE TEXT
race 
 In other countries, vulnerable students have been issued with alarm bracelets, which can be activated if threatened or attacked (e.g.

 CLEANING THE TEXT
ethn 
 population, major regional areas, major languages, major ethnic groups and minorities); school system (i.e.

 CLEANING THE TEXT
ethn 
 student-student, student-teacher, teacherstudent), age trends and gender differences; effects of factors such as ethnicity, socioeconomic status and special needs; and information relating to variations by school type and school ethos.

 CLEANING THE TEXT
ethn 
 Many reports discussed the inxefxacx82uence on school violence of factors such as the region of the country and socio-economic circumstances; the type of school; and student characteristics such as age, sex, ethnicity, social class, fa


 CLEANING THE TEXT
race 
 Hierarchical Linear and Nonlinear Modeling (HLM) will be utilized to analyze the multilevel relationships between school social bonds, race, ethnicity, gender, immigration, violence, and disorder.

 CLEANING THE TEXT
race 
 To employ and evaluate the EBH-CRP intervention across the continuum of grade levels, it was determined that "feeder patterns" within BCPS that are matched on key demographic variables, including race/ethnicity, poverty, and current EBH supports, will be randomized to receive the EBH-CRP intervention or participate in the control condition.

 CLEANING THE TEXT
race 
 We will conduct formative research with 36 students, especially from vulnerable groups defined by race/ethnicity, disability status, and sexual orientation, 36 school personnel, and 36 parents and community stakeholders.

 CLEANING THE TEXT
race 
 The sample is relatively homogeneous with respect to race and ethnicity but generally consistent with the makeup of the two-county 


 CLEANING THE TEXT
ethn 
 The PHDCN includes three waves of child, parent, and community data for multiple cohorts (ages 3, 6, 9, 12, and 15 to be used here; n = 16% White, 35% African American, 45% Latino/a, 4% other race/ethnicity).

 CLEANING THE TEXT
ethn 
 This project also builds on a years worth of fieldwork and relationships with gang members to collect ethnographic evidence of the scope of human COMPENDIUM OF RESEARCH ON CHILDREN EXPOSED TO VIOLENCE: 2010-2015 75 trafficking from the facilitators of human trafficking themselves.

 CLEANING THE TEXT
ethn 
 Sixth and 7th grade students came from four public middle schools in one school district in Central Illinois, and the sample is both ethnically and economically diverse (31.5% White, 60.4% Black, 2.6% Asian, 5.1% Hispanic, .4% Native American; 69.3% considered low-income).

 CLEANING THE TEXT
ethn 
 Other than differences by age and gender, ARA rates were consistent by race/ethnicity, geographic region, urbanicity, and house

will be used 
 Data from these interviews will be used to inform the current profile of the scope of human trafficking in San Diego County as well as to determine avenues for future research.

 CLEANING THE TEXT
will be used 
 Multilevel regression analyses will be used to assess intervention effects on specified individual-level, clinic-level, and school-level outcomes.

 CLEANING THE TEXT
will be used 
 NORCs Computer Assisted-Telephone Interviewing (CATI) and related systems for screening and randomly selecting cases into the sample will be used, conducting a 15-minute Parent/Caregiver Survey and a 60-minute Adolescent Survey, overseeing interviewer performance, and monitoring sampling goals.

 CLEANING THE TEXT
will be used 
 Data collected at previous waves (e.g., parental risk factors, the parent-child relationship, family context, childhood selfregulation and social competence, and adolescent risky behaviors) will be used to test a dynamic cascade model of development for TDV, u


 CLEANING THE TEXT
youth 
 We will also explore whether the program benefits some youth (such as those who are highly engaged with MLMC services) more than for others.

 CLEANING THE TEXT
youth 
 The relevance of the proposed research to human trafficking policy, practice and theory is that it will provide new evidence about whether it is possible to prevent at-risk youth from being trafficked domestically within a year of prevention group services, and whether the theoretical basis of MLMC (i.e., resilience theory, mentorship) produces successful results for minors who have been trafficked.

 CLEANING THE TEXT
youth 
 We will also compare outcomes for youth who are in the survivor advocacy services group and for youth who are in the at-risk prevention group.

 CLEANING THE TEXT
youth 
 We will follow all youth from baseline to 12 months with a six month interim assessment.

 CLEANING THE TEXT
youth 
 The research team is ideally prepared and highly qualified to carry out the proposed


 CLEANING THE TEXT
weapon 
 These assaults occur in a context where the landscape that students navigate each day often includes bullying, substance use, and weapon carrying.

 CLEANING THE TEXT
weapon 
 At the same time, each subject will be asked to describe his or her activities sequentially during that period, including companions and weapon carrying, and site-line features of each location (prospect, refuge, and escape) that indicate the subjects ability to see their surroundings clearly or the potential for someone to be concealed and hiding nearby.

 CLEANING THE TEXT
weapon 
 This technique represents a top-down approach that will use data on total cash spending derived from federal data bases to first estimate the total unlawful economy and then determine the portion of the unlawful economy due to unlawful commercial sex, illegal weapons, illegal drugs, or other means (including both legal uses and other illegal uses such as under-the-table employment).

 CLEANING THE TEXT
we

adolescent 
 The final intervention, to be used by parents and adolescents together, is based on the empirical literature linking emotion regulation deficits to violent behavior as well as studies showing that parental involvement is crucial to offset dating violence risk.

 CLEANING THE TEXT
adolescent 
 Research has also shown that game playing is the most popular internet activity for early adolescent boys; thus interactive, web-based games and videos are ideal to engage young males in dating violence programming.

 CLEANING THE TEXT
adolescent 
 Analysis: We will test whether the proposed program promotes reductions in boys attitudes supporting dating violence and frequency of aggressive acts (DV perpetration and victimization) as well as increases in emotion regulation skills and parent-adolescent communication.

 CLEANING THE TEXT
adolescent 
 Products, Reports, and Data Archiving: Findings will have important implications for developing and disseminating dating violence programm

 CLEANING THE TEXT
 caus 
 As a cross-sectional survey, NSCH is not appropriate to use to draw causal inferences of this sort.

 CLEANING THE TEXT
 caus 
 Relationship of childhood abuse and household dysfunction to many of the leading causes of death in adults.


 test_pdfs/._Adverse Family Experiences.pdf ignored. 




 test_pdfs/Father's involvement.pdf being processed. 



 CLEANING THE TEXT

 CLEANING THE TEXT
correlat 
 Additional research with NSFG could focus on some of the measures of father involvement not included in this report (see Introduction), including involvement in the last 12 months, and correlates of involvement such as work hours and schedules and payment of child support.

 CLEANING THE TEXT
male 
 Nurturing fatherhood: Improving data and research on male fertility, family formation, and fatherhood.

 CLEANING THE TEXT
race 
 Differences in fathers involvement with their children were also found by the fathers age, marital or cohabiting status, education, and His


 CLEANING THE TEXT
race 
 Black or African American, single race .

 CLEANING THE TEXT
race 
 Does not live with one or more of his children Hispanic origin and race Hispanic or Latino .

 CLEANING THE TEXT
race 
 Not Hispanic or Latino White, single race .

 CLEANING THE TEXT
race 
 Black or African American, single race .

 CLEANING THE TEXT
race 
 1 Includes fathers of other or multiple-race and origin groups, not shown separately.

 CLEANING THE TEXT
race 
 4,278 2,002 100.0 100.0 54.8 (3.05) 47.7 (4.86) 30.3 (2.63) 34.1 (4.57) 12.3 (1.86) 14.5 (3.35) 2.6 (0.98) 3.7 (1.62) 1,769 100.0 71.8 (3.72) 15.0 (2.76) 11.3 (3.13) 1.9 (0.70) 2,702 1,336 100.0 100.0 43.3 (4.71) 45.9 (3.43) 38.0 (4.50) 40.7 (3.17) 14.5 (2.34) 11.3 (2.06) 4.2 (1.84) 2.1 (0.98) Hispanic origin and race Hispanic or Latino .

 CLEANING THE TEXT
race 
 Not Hispanic or Latino White, single race .

 CLEANING THE TEXT
race 
 Black or African American, single race .

 CLEANING THE TEXT
race 
 Does not live with one or 

male 
 (residual) All races and origins, female, 35-44 years ... 1 2 3 4 5 6 7 8 9 10 ... All causes .

 CLEANING THE TEXT
male 
 An asterisk (*) preceding a cause-of-death code indicates that the code is not included in the International Classification of Diseases, Tenth Revision (ICD-10)] Rank1 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age Number2,3 Percent of total Death deaths rate2,3 Rank1 All races and origins, female, 45-54 years ... 1 2 3 4 5 6 7 8 9 10 ... All causes .

 CLEANING THE TEXT
male 
 (residual) 100.0 31.7 15.1 315.0 100.0 47.4 7,135 10.3 32.6 3,109 2,407 2,341 4.5 3.5 3.4 14.2 11.0 10.7 2,340 2,328 1,217 881 14,929 3.4 3.4 1.8 1.3 21.7 10.7 10.6 5.6 4.0 68.2 140,159 51,942 23,764 8,452 100.0 37.1 17.0 6.0 662.3 245.4 112.3 39.9 6,055 5,480 5,146 4.3 3.9 3.7 28.6 25.9 24.3 4,202 2,671 3.0 1.9 19.9 12.6 2,269 1.6 10.7 2,043 28,135 1.5 20.1 9.7 132.9 All races and origins, female, 65 years and over ... 1 2 3 4 5 6 7 8 9 10 ... All causes .



male 
 Number2,3 Percent of total Death deaths rate2,3 Hispanic, male, all ages4 Hispanic, both sexes, 75-84 years ... 1 2 3 4 5 6 7 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age All causes .

 CLEANING THE TEXT
male 
 An asterisk (*) preceding a cause-of-death code indicates that the code is not included in the International Classification of Diseases, Tenth Revision (ICD-10)] Rank1 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age Number2,3 Percent of total Death deaths rate2,3 Rank1 Hispanic, male, 10-14 years ... 1 2 3 4 5 6 7 8 9 10 ... All causes .

 CLEANING THE TEXT
male 
 (residual) 5,268 100.0 111.8 2,129 40.4 45.2 674 12.8 14.3 612 361 318 11.6 6.9 6.0 13.0 7.7 6.8 126 84 2.4 1.6 2.7 1.8 78 69 38 38 741 1.5 1.3 0.7 0.7 14.1 1.7 1.5 0.8 0.8 15.7 6,385 100.0 152.8 1,729 810 712 27.1 12.7 11.2 41.4 19.4 17.0 473 7.4 11.3 434 6.8 10.4 413 203 189 6.5 3.2 3.0 9.9 4.9 4.5 164 66 1,192 2.6 1.0 18.7 3.9 1.6 28.5 11,230 2,139 2,088 10


 CLEANING THE TEXT
race 
 bNational Vital Statistics Reports Volume 66, Number 5 November 27, 2017 Deaths: Leading Causes for 2015 by Melonie Heron, Ph.D., Division of Vital Statistics Abstract Introduction Objectives-This report presents final 2015 data on the 10 leading causes of death in the United States by age, sex, race, and Hispanic origin.

 CLEANING THE TEXT
race 
 Differences in the rankings are evident by age, sex, race, and Hispanic origin.

 CLEANING THE TEXT
race 
 This report presents final 2015 data on leading causes of death in the United States by age, sex, race, and Hispanic origin.

 CLEANING THE TEXT
race 
 Data by race and Hispanic origin This report was redesigned and shows different race and ethnicity categories than those shown in previous reports.

 CLEANING THE TEXT
race 
 Specifically, this report presents combined race and Hispanicorigin categories as follows: non-Hispanic white, non-Hispanic black, non-Hispanic American Indian or Alaska Native (AIAN), non

 CLEANING THE TEXT
race 
 Data for Hispanic persons are not tabulated by race; data for non-Hispanic persons are tabulated by race.

 CLEANING THE TEXT
race 
 Data for racial and ethnic groups other than non-Hispanic white and non-Hispanic black should be interpreted with caution because of misreporting of Hispanic origin and race on the death certificate.

 CLEANING THE TEXT
race 
 An asterisk (*) preceding a cause-of-death code indicates that the code is not included in the International Classification of Diseases, Tenth Revision (ICD-10)] Rank1 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age Number2,3 Percent of total Death deaths rate2,3 Rank1 Non-Hispanic white, both sexes, all ages4 ... 1 2 3 4 5 6 7 8 9 10 ... 2 3 4 5 6 7 8 9 10 ... All causes .

 CLEANING THE TEXT
race 
 (residual) 534,263 Non-Hispanic white, both sexes, 1-4 years ... 1 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age 1.8 1.7 1.4 1.0 24.0 0.2 0.2 * * 2.7 5 6 ... 


 CLEANING THE TEXT
race 
 Persons of Hispanic origin may be of any race.

 CLEANING THE TEXT
race 
 Data for Hispanic persons are not tabulated by race; data for non-Hispanic persons are tabulated by race.

 CLEANING THE TEXT
race 
 Data for racial and ethnic groups other than non-Hispanic white and non-Hispanic black should be interpreted with caution because of misreporting of Hispanic origin and race on the death certificate.

 CLEANING THE TEXT
race 
 An asterisk (*) preceding a cause-of-death code indicates that the code is not included in the International Classification of Diseases, Tenth Revision (ICD-10)] Rank1 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age Number2,3 Percent of total Death deaths rate2,3 Rank1 Non-Hispanic black, female, 85 years and over ... 1 2 3 4 5 6 7 8 9 10 ... All causes .

 CLEANING THE TEXT
race 
 (residual) Non-Hispanic American Indian or Alaska Native, both sexes, 10-14 years Non-Hispanic American Indian or Alaska Native, 

race 
 Persons of Hispanic origin may be of any race.

 CLEANING THE TEXT
race 
 Data for Hispanic persons are not tabulated by race; data for non-Hispanic persons are tabulated by race.

 CLEANING THE TEXT
race 
 Data for racial and ethnic groups other than non-Hispanic white and non-Hispanic black should be interpreted with caution because of misreporting of Hispanic origin and race on the death certificate.

 CLEANING THE TEXT
race 
 An asterisk (*) preceding a cause-of-death code indicates that the code is not included in the International Classification of Diseases, Tenth Revision (ICD-10)] Rank1 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age Number2,3 Percent of total Death deaths rate2,3 Rank1 Non-Hispanic Asian or Pacific Islander, female, 35-44 years ... 1 2 3 4 5 6 7 8 9 10 ... All causes .

 CLEANING THE TEXT
race 
 Number2,3 Percent of total Death deaths rate2,3 Non-Hispanic Asian or Pacific Islander, female, 65 years and over Non-Hispanic Asian or


 CLEANING THE TEXT
race 
 Infant, neonatal, and postneonatal deaths, percentage of total deaths, and mortality rates for the 10 leading causes of infant death, by race and Hispanic origin and sex: United States, 2015-Con.

 CLEANING THE TEXT
race 
 Persons of Hispanic origin may be of any race.

 CLEANING THE TEXT
race 
 Data for Hispanic persons are not tabulated separately by race; data for non-Hispanic persons are tabulated by race.

 CLEANING THE TEXT
race 
 Data for racial and ethnic groups other than non-Hispanic white and non-Hispanic black should be interpreted with caution because of inconsistencies between reporting Hispanic origin and race on birth and death certificates.

 CLEANING THE TEXT
race 
 Deaths are based on race and Hispanic origin of decedent; live births are based on race and Hispanic origin of mother.

 CLEANING THE TEXT
race 
 An asterisk (*) preceding a cause-of-death code indicates that the code is not included in the International Classification of Disease


 CLEANING THE TEXT
ethn 
 racial and ethnic differences .

 CLEANING THE TEXT
ethn 
 Data by race and Hispanic origin This report was redesigned and shows different race and ethnicity categories than those shown in previous reports.

 CLEANING THE TEXT
ethn 
 However, racial or ethnic misclassification should not have a major impact on the cause-of-death rankings for the race and Hispanic-origin groups, or prevent comparisons of relative mortality burden across groups, because there is no reason to expect that racial or ethnic misclassification varies by cause of death.

 CLEANING THE TEXT
ethn 
 Data for racial and ethnic groups other than non-Hispanic white and non-Hispanic black should be interpreted with caution because of misreporting of Hispanic origin and race on the death certificate.

 CLEANING THE TEXT
ethn 
 The Hispanic mortality advantage and ethnic misclassification on U.S. death certificates.

 CLEANING THE TEXT
ethn 
 Data for racial and ethnic groups other than non-Hi


 CLEANING THE TEXT
female 
 (N40) Inflammatory diseases of female pelvic organs .

 CLEANING THE TEXT
female 
 Differences by sex group, but it was the sixth leading cause for females, accounting for 4.0% of deaths (Figure 1 and Table 1).

 CLEANING THE TEXT
female 
 CLRD ranked fourth for males, accounting for 5.3% of deaths, but it ranked third for females, accounting for 6.2% of deaths.

 CLEANING THE TEXT
female 
 Stroke ranked fifth for males but fourth for females.

 CLEANING THE TEXT
female 
 Females had a higher relative burden of mortality from stroke, which accounted for 6.1% of all deaths to females but 4.2% of all deaths to males.

 CLEANING THE TEXT
female 
 Diabetes ranked sixth for males (3.1% of deaths) but seventh for females (2.7% of deaths).

 CLEANING THE TEXT
female 
 Alzheimers disease ranked eighth for males, accounting for 2.5% of deaths, but it ranked fifth for females, accounting for 5.7% of deaths.

 CLEANING THE TEXT
female 
 Influenza and pneumonia ranked 

female 
 (residual) Neonates, all races and origins, female ... 1 2 3 4 5 6 7 8 9 10 ... All causes .

 CLEANING THE TEXT
female 
 (residual) 4,477 100.0 760.0 1,464 32.7 248.5 613 13.7 104.1 455 10.2 77.2 266 202 162 113 96 5.9 4.5 3.6 2.5 2.1 45.2 34.3 27.5 19.2 16.3 90 78 938 2.0 1.7 21.0 15.3 13.2 159.2 2,442 100.0 815.9 817 33.5 273.0 320 13.1 106.9 266 10.9 88.9 156 102 93 58 54 44 6.4 4.2 3.8 2.4 2.2 1.8 52.1 34.1 31.1 19.4 18.0 14.7 42 490 1.7 20.1 14.0 163.7 2,035 100.0 702.4 647 31.8 223.3 293 14.4 101.1 189 9.3 65.2 110 100 69 55 5.4 4.9 3.4 2.7 38.0 34.5 23.8 19.0 48 42 34 448 2.4 2.1 1.7 22.0 16.6 14.5 11.7 154.6 Neonates, non-Hispanic black, male ... 1 2 3 4 5 6 7 8 9 10 ... Neonates, non-Hispanic white, female ... 1 Percent of total Mortality Number2 deaths rate2 Neonates, non-Hispanic black, both sexes 6,722 Neonates, non-Hispanic white, male ... 1 Cause of death (based on ICD-10), age, race and Hispanic origin, and sex All causes .

 CLEANING THE TEXT
female 
 (residua

 caus 
 Additionally, some of the 10 leading causes of death were unique to either population in 2015.

 CLEANING THE TEXT
 caus 
 Suicide ranked seventh for males (2.5% of deaths), but was not ranked among the 10 leading causes for females.

 CLEANING THE TEXT
 caus 
 Chronic liver disease and cirrhosis was the 10th leading cause for males (1.9% of deaths), but it was not in the top 10 for females.

 CLEANING THE TEXT
 caus 
 Kidney disease ranked 9th (1.8% of deaths) and Septicemia ranked 10th (1.6% of deaths) for females, but neither was among the 10 leading causes of death for males.

 CLEANING THE TEXT
 caus 
 The rank order of the 10 leading causes of death for males and females remained unchanged from 2014 to 2015 (12).

 CLEANING THE TEXT
 caus 
 Differences by age In 2015, the leading cause of death varied by age group (Figure 2).

 CLEANING THE TEXT
 caus 
 The leading cause of death for the population aged 1-44 was unintentional injuries.

 CLEANING THE TEXT
 caus 
 The rela


 CLEANING THE TEXT
 caus 
 Analytical potential for multiple cause-of-death data.

 CLEANING THE TEXT
 caus 
 Mortality multiple causeof-death public use record, 2003 [provisional] .

 CLEANING THE TEXT
 caus 
 Deaths, percentage of total deaths, and death rates for the 10 leading causes of death in selected age groups, by race and Hispanic origin and sex: United States, 2015 .

 CLEANING THE TEXT
 caus 
 Infant, neonatal, and postneonatal deaths, percentage of total deaths, and mortality rates for the 10 leading causes of infant death, by race and Hispanic origin and sex: United States, 2015 .

 CLEANING THE TEXT
 caus 
 Deaths, percentage of total deaths, and death rates for the 10 leading causes of death in selected age groups, by race and sex: United States, 2015 I-2.

 CLEANING THE TEXT
 caus 
 Infant, neonatal, and postneonatal deaths, percentage of total deaths, and mortality rates for the 10 leading causes of infant death, by race and sex: United States, 2015 17 62 17 National

 caus 
 (A40-A41) 21,388 All other causes .

 CLEANING THE TEXT
 caus 
 All causes .

 CLEANING THE TEXT
 caus 
 (G20-G21) All other causes .

 CLEANING THE TEXT
 caus 
 (residual) All races and origins, female, all ages4 All races and origins, male, 75-84 years ... 1 2 3 4 5 6 7 Number2,3 Percent of total Death deaths rate2,3 All races and origins, male, 85 years and over All races and origins, male, 65-74 years ... 1 2 3 4 5 6 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age All causes .

 CLEANING THE TEXT
 caus 
 (P00-P96) All other causes .

 CLEANING THE TEXT
 caus 
 Deaths, percentage of total deaths, and death rates for the 10 leading causes of death in selected age groups, by race and Hispanic origin and sex: United States, 2015-Con.

 CLEANING THE TEXT
 caus 
 An asterisk (*) preceding a cause-of-death code indicates that the code is not included in the International Classification of Diseases, Tenth Revision (ICD-10)] Rank1 Cause of death (based on IC


 CLEANING THE TEXT
 caus 
 (N00-N07,N17-N19,N25-N27) All other causes .

 CLEANING THE TEXT
 caus 
 Deaths, percentage of total deaths, and death rates for the 10 leading causes of death in selected age groups, by race and Hispanic origin and sex: United States, 2015-Con.

 CLEANING THE TEXT
 caus 
 An asterisk (*) preceding a cause-of-death code indicates that the code is not included in the International Classification of Diseases, Tenth Revision (ICD-10)] Rank1 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age Number2,3 Percent of total Death deaths rate2,3 Rank1 Non-Hispanic white, male, 65 years and over ... 1 2 3 4 5 6 7 8 9 10 ... All causes .

 CLEANING THE TEXT
 caus 
 (G20-G21) All other causes .

 CLEANING THE TEXT
 caus 
 (residual) 7 8 9 10 ... All causes .

 CLEANING THE TEXT
 caus 
 (A40-A41) All other causes .

 CLEANING THE TEXT
 caus 
 (residual) 763,873 204,890 180,159 53,339 36,742 28,269 22,312 20,953 17,993 15,660 14,393 169,163 100.0 4,573

 caus 
 (D50-D64) All other causes .

 CLEANING THE TEXT
 caus 
 (residual) 100.0 24.9 86 22.3 5.5 60 34 33 15.5 8.8 8.5 3.9 2.2 2.1 33 17 8.5 4.4 2.1 * 5 6 15 8 6 5 89 3.9 2.1 1.6 1.3 23.1 * * * * 5.7 7 8 9 10 ... 1,795 100.0 110.1 901 50.2 55.3 ... 1 2 ... 1 406 22.6 24.9 3 151 55 50 8.4 3.1 2.8 9.3 3.4 3.1 4 5 23 18 13 9 8 161 1.3 1.0 0.7 0.5 0.4 9.0 1.4 * * * * 9.9 2 3 4 2 3 4 5 6 7 8 9 10 ... All causes .

 CLEANING THE TEXT
 caus 
 (Y35,Y89.0) All other causes .

 CLEANING THE TEXT
 caus 
 (residual) 6 7 8 9 10 ... 3,691 100.0 203.3 1,843 49.9 101.5 7,229 100.0 243.1 2,592 35.9 87.2 1,602 596 22.2 8.2 53.9 20.0 491 268 6.8 3.7 16.5 9.0 220 131 76 68 67 1,118 3.0 1.8 1.1 0.9 0.9 15.5 7.4 4.4 2.6 2.3 2.3 37.6 820 22.2 45.2 301 139 88 8.2 3.8 2.4 16.6 7.7 4.8 43 37 27 26 23 344 1.2 1.0 0.7 0.7 0.6 9.3 2.4 2.0 1.5 1.4 1.3 18.9 All causes .

 CLEANING THE TEXT
 caus 
 (I10,I12,I15) All other causes .

 CLEANING THE TEXT
 caus 
 All causes .

 CLEANING THE TEXT
 caus 
 (J09-J18) All ot


 CLEANING THE TEXT
 caus 
 (residual) 2,344 579 304 160 151 129 86 78 100.0 9,792.4 24.7 2,418.8 13.0 1,270.0 6.8 668.4 6.4 630.8 5.5 538.9 3.7 359.3 3.3 325.9 76 3.2 317.5 58 2.5 242.3 50 673 2.1 208.9 28.7 2,811.5 Non-Hispanic American Indian or Alaska Native, male, all ages4 3,498 927 708 299 259 148 100.0 2,000.9 26.5 530.3 20.2 405.0 8.5 171.0 7.4 148.2 4.2 84.7 113 3.2 64.6 112 3.2 64.1 88 76 75 693 2.5 2.2 2.1 19.8 50.3 43.5 42.9 396.4 Non-Hispanic American Indian or Alaska Native, both sexes, 75-84 years ... 1 2 3 4 5 6 7 Number2,3 Percent of total Death deaths rate2,3 Non-Hispanic American Indian or Alaska Native, both sexes, 85 years and over Non-Hispanic American Indian or Alaska Native, both sexes, 65-74 years ... 1 2 3 4 5 6 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age 3,224 748 705 237 206 161 120 85 82 77 58 745 100.0 4,531.4 23.2 1,051.3 21.9 990.9 7.4 333.1 6.4 289.5 5.0 226.3 3.7 168.7 2.6 119.5 2.5 115.3 2.4 108.2 1.8 81.5 23.1 1,047.1 ..


 CLEANING THE TEXT
 caus 
 (J09-J18) All other causes .

 CLEANING THE TEXT
 caus 
 Deaths, percentage of total deaths, and death rates for the 10 leading causes of death in selected age groups, by race and Hispanic origin and sex: United States, 2015-Con.

 CLEANING THE TEXT
 caus 
 An asterisk (*) preceding a cause-of-death code indicates that the code is not included in the International Classification of Diseases, Tenth Revision (ICD-10)] Rank1 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age Number2,3 Percent of total Death deaths rate2,3 Rank1 Non-Hispanic Asian or Pacific Islander, both sexes, 65 years and over ... 1 2 3 4 5 6 7 8 9 10 ... All causes .

 CLEANING THE TEXT
 caus 
 (I10,I12,I15) All other causes .

 CLEANING THE TEXT
 caus 
 (residual) 6 7 8 9 10 ... All causes .

 CLEANING THE TEXT
 caus 
 (A40-A41) All other causes .

 CLEANING THE TEXT
 caus 
 (residual) 47,558 11,296 11,145 3,880 2,159 2,048 1,959 1,706 100.0 2,241.2 23.8 532.3 23.4 52


 CLEANING THE TEXT
 caus 
 (J09-J18) All other causes .

 CLEANING THE TEXT
 caus 
 (residual) Hispanic, both sexes, 10-14 years 100.0 21.1 20.3 Hispanic, both sexes, 1-4 years ... 1 Number2,3 Percent of total Death deaths rate2,3 Hispanic, both sexes, 5-9 years Hispanic, both sexes, all ages4 ... 1 2 3 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age 3 4 5 6 7 8 9 9 ... All causes .

 CLEANING THE TEXT
 caus 
 (E10-E14) All other causes .

 CLEANING THE TEXT
 caus 
 Deaths, percentage of total deaths, and death rates for the 10 leading causes of death in selected age groups, by race and Hispanic origin and sex: United States, 2015-Con.

 CLEANING THE TEXT
 caus 
 An asterisk (*) preceding a cause-of-death code indicates that the code is not included in the International Classification of Diseases, Tenth Revision (ICD-10)] Rank1 Cause of death (based on ICD-10), race and Hispanic origin, sex, and age Number2,3 Percent of total Death deaths rate2,3 Rank1 Hispani


 CLEANING THE TEXT
 caus 
 (residual) 56,403 13,538 10,574 4,358 4,350 2,835 2,165 1,420 100.0 2,632.5 24.0 631.9 18.7 493.5 7.7 203.4 7.7 203.0 5.0 132.3 3.8 101.0 2.5 66.3 1,365 2.4 63.7 1,109 2.0 51.8 1,070 13,619 1.9 24.1 49.9 635.6 Hispanic, female, 65-74 years ... 1 2 3 4 5 6 7 8 9 10 ... All causes .

 CLEANING THE TEXT
 caus 
 (A40-A41) All other causes .

 CLEANING THE TEXT
 caus 
 (residual) 13,022 4,105 2,441 936 671 100.0 1,048.7 31.5 330.6 18.7 196.6 7.2 75.4 5.2 54.0 448 403 3.4 3.1 36.1 32.5 386 3.0 31.1 283 250 244 2,855 2.2 1.9 1.9 21.9 22.8 20.1 19.6 229.9 Hispanic, female, 75-84 years ... All causes .

 CLEANING THE TEXT
 caus 
 (K70,K73-K74) All other causes .

 CLEANING THE TEXT
 caus 
 Deaths, percentage of total deaths, and death rates for the 10 leading causes of death in selected age groups, by race and Hispanic origin and sex: United States, 2015-Con.

 CLEANING THE TEXT
 caus 
 An asterisk (*) preceding a cause-of-death code indicates that the code is not in


 CLEANING THE TEXT
 caus 
 (P20-P21) All other causes .

 CLEANING THE TEXT
 caus 
 (residual) 359 100.0 262.0 84 23.4 61.3 2 73 20.3 53.3 3 36 10.0 26.3 4 20 17 13 5.6 4.7 3.6 14.6 * * 8 8 8 2.2 2.2 2.2 * * * 5 6 7 8 9 10 4 1.1 * 4 4 1.1 1.1 * * ... 1 4 76 1.1 21.2 * 55.5 2 ... All causes .

 CLEANING THE TEXT
 caus 
 (P20-P21) All other causes .

 CLEANING THE TEXT
 caus 
 (residual) 3,443 100.0 372.6 904 26.3 97.8 811 23.6 87.8 338 9.8 36.6 190 123 97 86 74 58 5.5 3.6 2.8 2.5 2.1 1.7 20.6 13.3 10.5 9.3 8.0 6.3 55 707 1.6 20.5 6.0 76.5 1,893 100.0 401.5 479 25.3 101.6 455 24.0 96.5 207 10.9 43.9 102 68 53 50 41 33 5.4 3.6 2.8 2.6 2.2 1.7 21.6 14.4 11.2 10.6 8.7 7.0 24 381 1.3 20.1 5.1 80.8 1,550 100.0 342.5 425 27.4 93.9 356 23.0 78.7 131 8.5 28.9 88 55 44 36 33 5.7 3.5 2.8 2.3 2.1 19.4 12.2 9.7 8.0 7.3 31 2.0 6.8 27 324 1.7 20.9 6.0 71.6 Neonates, Hispanic, male 3 4 276 100.0 214.2 63 22.8 48.9 55 19.9 42.7 46 16.7 35.7 19 14 9 6 6.9 5.1 3.3 2.2 * * * * 6 5 2.2 1.8 * * 4 1.4 * 3 4 


 CLEANING THE TEXT

 CLEANING THE TEXT
male 
 Life expectancy decreased from 2014 to 2015 for non-Hispanic white males (0.2 year), non-Hispanic white females (0.1), non-Hispanic black males (0.4), non-Hispanic black females (0.1), Hispanic males (0.1), and Hispanic females (0.2).

 CLEANING THE TEXT
male 
 Life expectancy for females was 4.9 years higher than for males.

 CLEANING THE TEXT
male 
 In 2015 compared with 2014, life expectancy decreased for non-Hispanic white males (0.2 year), non-Hispanic white females (0.1), non-Hispanic black males (0.4), nonHispanic black females (0.1), Hispanic males (0.1), and Hispanic females (0.2).

 CLEANING THE TEXT
male 
 From 2014 to 2015, age-adjusted death rates increased for non-Hispanic white males (1.0%), non-Hispanic white females (1.6%), and non-Hispanic black males (0.9%) (Tables A and 1).

 CLEANING THE TEXT
male 
 Observed changes in age-adjusted rates for non-Hispanic black female, Hispanic male, and Hispanic female populations were


 CLEANING THE TEXT
male 
 The rate for alcohol-induced causes increased 5.4% for males and 8.7% for females in 2015 from 2014.

 CLEANING THE TEXT
male 
 The age-adjusted death rate for non-Hispanic white males was 34.0% higher than for non-Hispanic black males and 18.3% lower than for Hispanic males.

 CLEANING THE TEXT
male 
 The rate for non-Hispanic white females was 43.6% higher than for non-Hispanic black females and 60.0% higher than for Hispanic females.

 CLEANING THE TEXT
male 
 Among the major race-ethnicity-sex groups, the ageadjusted rate for alcohol-induced death increased significantly in 2015 from 2014 for non-Hispanic white males (6.2%), nonHispanic white females (9.8%), and non-Hispanic black females (14.7%).

 CLEANING THE TEXT
male 
 The rates for non-Hispanic black males, Hispanic males, and Hispanic females did not change significantly.

 CLEANING THE TEXT
male 
 For males in 2015, the age-adjusted death rate for injury by firearms was 6.1 times the rate for fema


 CLEANING THE TEXT
male 
 That is, taking into account random variability, non-Hispanic API females aged 1-4 have a death rate significantly lower than that for non-Hispanic AIAN females of the same age.

 CLEANING THE TEXT
race 
 National Vital Statistics Reports Volume 66, Number 6 November 27, 2017 Deaths: Final Data for 2015 by Sherry L. Murphy, B.S., Jiaquan Xu, M.D., Kenneth D. Kochanek, M.A., Sally C. Curtin, M.A., and Elizabeth Arias, Ph.D., Division of Vital Statistics Abstract Highlights Objectives-This report presents final 2015 data on U.S. deaths, death rates, life expectancy, infant mortality, and trends, by selected characteristics such as age, sex, Hispanic origin and race, state of residence, and cause of death.

 CLEANING THE TEXT
race 
 These data provide information on mortality patterns among residents of the United States by such variables as age, sex, Hispanic origin and race, state of residence, and cause of death.

 CLEANING THE TEXT
race 
 Differences in deat


 CLEANING THE TEXT
race 
 19 ages 15 and over, by marital status and sex: United States, 2015 Number of deaths, death rates, and age-adjusted death rates for ages 25-64, by educational attainment and sex: Total of 46 reporting states and District of Columbia using 2003 version of U.S. Standard Certificate of Death and total of 2 reporting states using 1989 version of U.S. Standard Certificate of Death, 2015 Percent distribution of deaths by educational attainment: Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, District of Columbia, Florida, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, South Carolina, Tennessee, Texas, Utah, Vermont, Virginia, Washington, Wisconsin, and Wyoming, 2002 and 2015 Number of deaths, death rates, and age-

race 
 1 Includes races and origins not shown separately.

 CLEANING THE TEXT
race 
 2 Multiple-race data reported according to 1997 OMB standards were bridged to the single-race categories of 1977 OMB standards.

 CLEANING THE TEXT
race 
 For more information on areas reporting multiple race, see Technical Notes.

 CLEANING THE TEXT
race 
 5 Life expectancies by Hispanic origin were revised using updated adjustment factors to correct for race and Hispanic-origin misclassification.

 CLEANING THE TEXT
race 
 Number of deaths from selected causes, by race and Hispanic origin, and sex: United States, 2015 [Includes selected causes of deaths; therefore, subcategories may not add to totals; see Technical Notes.

 CLEANING THE TEXT
race 
 Data for specified race or Hispanic-origin groups other than non-Hispanic white and non-Hispanic black should be interpreted with caution because of inconsistencies in reporting these items on death certificates and surveys, although misclassification is v


 CLEANING THE TEXT
race 
 In contrast, race and Hispanic origin in the census or the U.S. Census Bureaus American Community Survey (ACS) is obtained while the person is alive; in these cases, race and ethnicity is self-reported or reported by another member of the household familiar with the person and, therefore, may be considered more valid.

 CLEANING THE TEXT
race 
 A high level of agreement between the death certificate and the census or survey report is essential to assure unbiased death rates by race and ethnicity.

 CLEANING THE TEXT
race 
 Year that state started reporting multiple race, and year that state began using revised standard certificate of death: Each state, 2003-2015 Area Year1 state began reporting multiple race Alabama .

 CLEANING THE TEXT
race 
 Year1 state began reporting multiple race 2003 2005 2008 9 2004 2004 2006 2003 2014 2008 2007 2004 2006 2012 2006 2005 2004 2012 2006 2005 4 2008 11 2014 2004 ... 2003 2004 Year state began using the 2003 standard cert


 CLEANING THE TEXT
ethn 
 Differences in death rates among various demographic subpopulations, including race and ethnicity groups, may reflect subpopulation differences in factors such as socioeconomic status, access to medical care, and the prevalence of specific risk factors in a particular subpopulation.

 CLEANING THE TEXT
ethn 
 Death rates by race and Hispanic origin In 2015, age-adjusted death rates for the major race and ethnicity groups (Table 1) were: .

 CLEANING THE TEXT
ethn 
 Race and ethnicity-For the total non-Hispanic white population in 2015 compared with 2014, age-specific death rates increased significantly for age groups 5-14, 15-24, 25-34, 35-44, 55-64, 65-74, and 85 and over (Tables A and 2).

 CLEANING THE TEXT
ethn 
 Age-adjusted death rates, by race and Hispanic origin: United States, 2000-2015 Other observed changes from 2014 to 2015 in age-specific rates by race and ethnicity and sex were not statistically significant.

 CLEANING THE TEXT
ethn 
 Death rate


 CLEANING THE TEXT
gun 
 1 Indicates year in which National Center for Health Statistics first received multiple race data from each state, although the state may have begun collecting such data at an earlier date.

 CLEANING THE TEXT
 caus 
 National Vital Statistics Reports Volume 66, Number 6 November 27, 2017 Deaths: Final Data for 2015 by Sherry L. Murphy, B.S., Jiaquan Xu, M.D., Kenneth D. Kochanek, M.A., Sally C. Curtin, M.A., and Elizabeth Arias, Ph.D., Division of Vital Statistics Abstract Highlights Objectives-This report presents final 2015 data on U.S. deaths, death rates, life expectancy, infant mortality, and trends, by selected characteristics such as age, sex, Hispanic origin and race, state of residence, and cause of death.

 CLEANING THE TEXT
 caus 
 The 15 leading causes of death in 2015 remained the same as in 2014.

 CLEANING THE TEXT
 caus 
 The 15 leading causes of death in 2015 were: 1.

 CLEANING THE TEXT
 caus 
 The 10 leading causes of infant death were: 1.


 caus 
 Comparability of cause-of-death between ICD revisions.

 CLEANING THE TEXT
 caus 
 Comparability of cause of death between ICD-9 and ICD-10: Preliminary estimates.

 CLEANING THE TEXT
 caus 
 Instructions for classifying the underlying cause of death.

 CLEANING THE TEXT
 caus 
 Instructions for classifying the multiple causes of death.

 CLEANING THE TEXT
 caus 
 ICD-10 ACME decision tables for classifying underlying causes of death.

 CLEANING THE TEXT
 caus 
 TRANSAX: The NCHS system for producing multiple cause-of-death statistics, 1968-78.

 CLEANING THE TEXT
 caus 
 Analytical potential for multiple cause-of-death data.

 CLEANING THE TEXT
 caus 
 ICD-10 cause-of-death lists for tabulating mortality statistics (updated March 2011 to include WHO updates to ICD-10 for data year 2011).

 CLEANING THE TEXT
 caus 
 ICD-10 cause-of-death querying, 2013.

 CLEANING THE TEXT
 caus 
 Trends in maternal mortality by sociodemographic characteristics and cause of death in 27 states a


 CLEANING THE TEXT
 caus 
 Codes in parentheses after causes of death are categories of the International Classification of Diseases, Tenth Revision (ICD-10).

 CLEANING THE TEXT
 caus 
 The asterisks (*) preceding cause-of-death codes indicate they are not part of ICD-10; see Technical Notes] Accidental poisoning and exposure to noxious substances (X40-X49) Motor vehicle accidents3 Area Number Rate 328 12 9 2 1 9.4 * * * * Puerto Rico .

 CLEANING THE TEXT
 caus 
 Number of infant deaths and infant mortality rates for selected causes, by race and Hispanic origin: United States, 2015 [Rates are infant deaths (under 1 year) per 100,000 live births in specified group.

 CLEANING THE TEXT
 caus 
 Race and Hispanic-origin categories are consistent with 1977 Office of Management and Budget (OMB) standards] Number1 Rate Total2 NonHispanic white3 NonHispanic black3 Hispanic Total2 NonHispanic white3 All causes .

 CLEANING THE TEXT
 caus 
 Number of infant deaths and infant mortality rates f

Unnamed: 0,1,Article,Keyword,Sentence
0,,test_pdfs/The Revenge Of The Lost Boys.pdf,male,Although mass killers understandably seize our...
1,,test_pdfs/The Revenge Of The Lost Boys.pdf,male,What they all have in common is their gender (...
2,,test_pdfs/The Revenge Of The Lost Boys.pdf,male,"Some of them kill, but others lash out in othe..."
3,,test_pdfs/The Revenge Of The Lost Boys.pdf,male,They turn into what German writer Hans Enzensb...
4,,test_pdfs/The Revenge Of The Lost Boys.pdf,male,"Jihadis, of course, are the object lesson in t..."
5,,test_pdfs/The Revenge Of The Lost Boys.pdf,male,"It narcissism among seems unarguable, however,..."
6,,test_pdfs/The Revenge Of The Lost Boys.pdf,male,"Like their white brethren, dangerous black mal..."
7,,test_pdfs/The Revenge Of The Lost Boys.pdf,male,"Likewise, the media and the public, for a vari..."
8,,test_pdfs/The Revenge Of The Lost Boys.pdf,male,"Very few Americans serve in the military, yet ..."
9,,test_pdfs/The Revenge Of The Lost Boys.pdf,male,A Failure to Mature Out of Social Confusion In...


# Save to File

I included a copy of the output to the file just to give you an idea of what it should come out looking like. 

In [6]:
# Save the file

df.to_csv('KeywordExtract.csv')