#                                         Automated News article Production 
###### Sanchari Chowdhuri (sanchari@umd.edu)

## Overview of the code
The goal of this code is to write a template that can produce stories automatically that might contribute to ongoing coverage of the issue of fatal shootings by police officers in the US. 

### Working of the algorithm
The algorithm will take data from the Washington Post Fatal Police Shooting database (Dataset2.csv),perform entry specific analysis on it and output a written story describing an overview, specifics of a single row of data, and any additional context. 

### Features of the algorithm

-The story produced by this algorithm is distinct based on the row of data input  also based on the context of the entire dataset, there is use of conditional logic to alter the story text based on data values.

-Each row of data produces a variant of the story.

-The story written by this algorithm includes aspects of an overview of the data set, as well as specifics about the row of data. 

-There are derived or aggregated data columns from the dataset that enhances the story (e.g. averages, trends, counts etc).

-Also I have used synonym sets (synsets) to add variability to the writing.


### Import  pandas and jinja Libraries 

In [2]:
import pandas as pd
import jinja2 as jj

### Dataset familiarity

In [3]:
df1=pd.read_csv("Dataset2.csv")
df1.head(5)

Unnamed: 0,id,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera
0,3,Tim Elliot,"Friday, January 2, 2015",shot,gun,53.0,Male,Asian,Shelton,Washington,True,attack,Not fleeing,False
1,4,Lewis Lee Lembke,"Friday, January 2, 2015",shot,gun,47.0,Male,White,Aloha,Oregon,False,attack,Not fleeing,False
2,5,John Paul Quintero,"Saturday, January 3, 2015",shot and Tasered,unarmed,23.0,Male,Hispanic,Wichita,Kansas,False,other,Not fleeing,False
3,8,Matthew Hoffman,"Sunday, January 4, 2015",shot,toy weapon,32.0,Male,White,San Francisco,California,True,attack,Not fleeing,False
4,9,Michael Rodriguez,"Sunday, January 4, 2015",shot,nail gun,39.0,Male,Hispanic,Evans,Colorado,False,attack,Not fleeing,False


### Start of main code.  

In [15]:
#Basic Info
def row_story(row):
    synset1=["shootings","acts of violence","incidents"]
    synset2=["incident took place","incident was reported","shooting happened","incident happened","shooting was reported","shooting took place"]
    template = jj.Template("On {{ variables.date }} a {{variables.age}} years old {{variables.gender}} named {{variables.name}} was fatally {{variables.manner_of_death}} .The {{variables_synset2|random}} in the city of {{variables.city}}  located at {{variables.state}}") # use ".column_name" to access the column of data
    return template.render(variables_synset1=synset1,variables_synset2=synset2 ,variables = row)

#determining race
def race(row):
    template_string =""" {%if variables.race=='Black' or variables.race=='Asian' or variables.race=='White' or variables.race=='Hispanic'or variables.race=='Native American' %} The victim's race was identified as {{variables.race}}{%else%}The victim's race still remains unidentified {%endif%}"""
    template = jj.Template(template_string)
    return template.render(variables=row)

#armed
def armed(row):
    synset3=["armed with","having","possessing"]
    template_string_armed = """ According to local police reports {% if variables.armed == 'undetermined'or variables.armed==''%} its yet not confirmed if {{variables.name}} was armed or not{% elif variables.armed =='unarmed'%} {{variables.name}} was unarmed {%elif variables.armed =='unknown weapon'%} {{variables.name}} was armed with an unknown weapon {%elif variables.armed=='ax' or variables.armed=='oar'%}{{variables.name}} was {{variables_synset3|random}} an {{variables.armed}} {%else%} {{variables.name}} was {{variables_synset3|random}} a {{variables.armed}}{%endif%}"""
    template = jj.Template(template_string_armed)
    return template.render(variables_synset3=synset3,variables=row)

#determines fleeing
def flee(row):
    template_string = """ The victim was reported to be {% if variables.flee == 'Car' or variables.flee=='Foot' %}fleeing {% elif variables.flee == 'Not fleeing' %} not fleeing {% endif %}"""
    template = jj.Template(template_string)
    return template.render(variables=row)

#determines threat
def threat(row):
    template_string = """{% if variables.threat_level == 'attack' or variables.threat_level=='other' %}There was a certain level of threat associated with the victim {% elif variables.threat_level == 'undetermined' %} The level of threat associated with the victim was undetermined {% endif %}"""
    template = jj.Template(template_string)
    return template.render(variables=row)

#determines range of shooting occurences 
def shooting(number_of_shootings):
    if number_of_shootings>=100:
        return"The shooting statistics in %s puts this state among very high range of shooting occurences"%(state_name)
    elif number_of_shootings<100 and number_of_shootings >= 50:
        return"The number of shootings in %s puts this state among high range of shooting occurences"%(state_name)
    elif number_of_shootings<50 and number_of_shootings >= 20:
        return"The number of shootings in %s puts this state among medium range of shooting occurences"%(state_name)
    elif number_of_shootings<20 and number_of_shootings >= 0:
        return"The number of shootings in %s puts this state among low range of shooting occurences"%(state_name)

##information of shootings occured on same day
def same_day_shooting(day_df):
    temp=[]
    for n in range(len(day_df)):
        name=day_df.iat[n,1]
        state=day_df.iat[n,9]
        query= name+" was shot in " +state
        temp.append(query)
    return(' and '.join(temp))


#percetage ofvictims based on  race in specific state
def race_percent(race_name):
    
    if race_name=="Black":
        number_of_blacks=state_race_df.ix['Black','name']
        percentage_of_blacks= 100*(float(number_of_blacks)/float(number_of_shootings))
        #truncating floats
        percentage_of_blacks=str(float("{0:.2f}".format(percentage_of_blacks)))
        return"Blacks consists of %s percentage of total shootings in %s"%(percentage_of_blacks ,state_name)
    elif race_name=="White":
        number_of_whites=state_race_df.ix['White','name']
        percentage_of_whites= 100*(float(number_of_whites)/float(number_of_shootings))
        #truncating floats
        percentage_of_whites=str(float("{0:.2f}".format(percentage_of_whites)))
        return"The percentage of victims who are whites is %s percentage in %s"%(percentage_of_whites,state_name)
    elif race_name=="Asian":
        number_of_asians=state_race_df.ix['Asian','name']
        percentage_of_asians= 100*(float(number_of_asians)/float(number_of_shootings))
        percentage_of_asians=str(float("{0:.2f}".format(percentage_of_asians)))
        return" Of all the shootings in %s, the percentage of victims who belong to the asian race is %s percentage"%(state_name,percentage_of_asians)   
    elif race_name=="Hispanic":
        number_of_hispanic=state_race_df.ix['Hispanic','name']
        percentage_of_hispanic= 100*(float(number_of_hispanic)/float(number_of_shootings))
        percentage_of_hispanic=str(float("{0:.2f}".format(percentage_of_hispanic)))
        return"The percentage of victims who are hispanics is %s percentage"%(percentage_of_hispanic)
    elif race_name=="Native American":
        number_of_natives=state_race_df.ix['Native American','name']
        percentage_of_natives= 100*(float(number_of_natives)/float(number_of_shootings))
        percentage_of_natives= str(float("{0:.2f}".format(percentage_of_natives)))
        return"The percentage of victims who are natives is %s percentage"%(percentage_of_natives)
    elif race_name=="Not determined":
        number_of_nd=state_race_df.ix['Not determined','name']
        percentage_of_nd= 100*(float(number_of_nd)/float(number_of_shootings))
        percentage_of_nd= str(float("{0:.2f}".format(percentage_of_nd)))
        return"The percentage of victims who are natives is %s percentage"%(percentage_of_nd)



#Reading csv file
df = pd.read_csv("Dataset2.csv")
row_num=int(raw_input("Enter row number ="))
# change this value to test with a different row of data
row_as_dict = df.iloc[row_num].to_dict()
print row_as_dict, "\n"

##state specific calculations
#gives state name
state_name= str(df.loc[row_num,'state'])
#creating dataframe based on state
new_df=df.loc[df['state'] == state_name]
#mean of age of particular state.
state_age_mean=new_df["age"].mean()
state_age_mean="%.1f" %state_age_mean
#total number of shootings in the state
number_of_shootings=len(new_df)

#percentage of males among victims
state_gender_df=new_df.groupby(['gender']).count()
number_of_male=state_gender_df.ix['Male','name']
percentage_of_male= 100*(float(number_of_male)/float(number_of_shootings))
percentage_of_male=str("%.1f" %percentage_of_male)


#Race 
race_name= str(df.loc[row_num,'race'])
state_race_df=new_df.groupby(['race']).count()

#gives day name
day= str(df.loc[row_num,'date'])
#creating dataframe based on day
day_df=df.loc[df['date'] == day]
incidents_sameday=len(day_df)

#number of incidents in same city
import random
c= str(df.loc[row_num,'city'])
#creating dataframe based on city
c_df= df.loc[df['city'] == c]
incidents_samecity=len(c_df)
#randomising
city1 = ['Till now the city of ', 'The city of']
inci=['incidents','acts of violence','shootings']
see=['has seen','has witnessed','is tormented by','is shocked by']


#Bodycam calculation
bc= str(df.loc[row_num,'body_camera'])
bc_df=df.groupby(['body_camera']).count()
false_bc=bc_df.iat[0,0]
false_bc
total_incidents=str(len(df))
percent_bodycam=100*(float(false_bc)/len(df))
percent_bodycam= str("%.1f" %percent_bodycam)
percent_bodycam

#randomising
word1=['The absence of','Lack of']
word2=['ascertain','determine']
word3=['credibility','proof']
word4=['police shootings','killings','acts of violence']
#statements
age_statement="The average age of  these shooting victims in %s is around %s years"%(state_name,state_age_mean)
total_shooting_statement="Since 1st January 2015 a total of %s such incidents have taken place in %s"%(number_of_shootings,state_name)
gender_statement="The percentage of male victims among these %s shootings is %s percent in %s"%(number_of_shootings,percentage_of_male,state_name)
numberofincidents="%s such incidents have taken place on this date all over USA"%(incidents_sameday)
city_incidents= "%s %s %s %s such %s "%(random.choice(city1),c,random.choice(see), incidents_samecity,random.choice(inci))
body_cam="Among all the %s police shootings in which a police officer, in the line of duty, shot and killed a civilian in US %s of the police officers were not wearing a body camera. "%(total_incidents,percent_bodycam)
line="%s body camera makes it very difficult to %s the reason for the shooting and hence doesn't lend %s to the entire situation and adds it as random %s." %(random.choice(word1),random.choice(word2),random.choice(word3),random.choice(word4))




print  row_story(row_as_dict)+"."+ race(row_as_dict)+"."+armed(row_as_dict)+"."+flee(row_as_dict)+"."+threat(row_as_dict)+"."+age_statement+"."+ total_shooting_statement+"."+gender_statement+"."+race_percent(race_name)+"." +numberofincidents+"."+same_day_shooting(day_df)+"."+city_incidents+"."+"States like California, Texas and Florida have very high occurrence of such police shootings"+"."+shooting(number_of_shootings)+"."+body_cam+"."+line

Enter row number =41
{'flee': 'Not fleeing', 'city': 'Fort Worth', 'name': 'Daniel Brumley', 'gender': 'Male', 'age': 27.0, 'body_camera': False, 'manner_of_death': 'shot', 'state': 'Texas', 'race': 'Hispanic', 'signs_of_mental_illness': False, 'date': 'Saturday, January 17, 2015', 'threat_level': 'attack', 'armed': 'knife', 'id': 78} 

On Saturday, January 17, 2015 a 27.0 years old Male named Daniel Brumley was fatally shot .The incident was reported in the city of Fort Worth  located at Texas.  The victim's race was identified as Hispanic. According to local police reports  Daniel Brumley was possessing a knife. The victim was reported to be  not fleeing .There was a certain level of threat associated with the victim .The average age of  these shooting victims in Texas is around 35.5 years.Since 1st January 2015 a total of 180 such incidents have taken place in Texas.The percentage of male victims among these 180 shootings is 96.1 percent in Texas.The percentage of victims who are hi

**Examples of Automated text generated for news article **

--**Automated text generated With input row as 93**-

On Sunday, February 8, 2015 a 33.0 years old Male named John Martin Whittaker was fatally shot .The shooting took place in the city of Anchorage  located at Alaska.  The victim's race was identified as White. According to local police reports  John Martin Whittaker was having a gun. The victim was reported to be fleeing .There was a certain level of threat associated with the victim .The average age of  these shooting victims in Alaska is around 34.6 years.Since 1st January 2015 a total of 11 such incidents have taken place in Alaska.The percentage of male victims among these 11 shootings is 90.9 percent in Alaska.The percentage of victims who are whites is 45.45 percentage in Alaska.5 such incidents have taken place on this date all over USA.John Martin Whittaker was shot in Alaska and Sawyer Flache was shot in Texas and Vincent Cordaro was shot in New York and Joseph Paffen was shot in Florida and Larry Hostetter was shot in Texas.Till now the city of  Anchorage has seen 3 such incidents .States like California, Texas and Florida have very high occurrence of such police shootings.The number of shootings in Alaska puts this state among low range of shooting occurences.Among all the 1954 police shootings in which a police officer, in the line of duty, shot and killed a civilian in US 89.2 of the police officers were not wearing a body camera. .Lack of body camera makes it very difficult to ascertain the reason for the shooting and hence doesn't lend proof to the entire situation and adds it as random killings.

**Automated text generated With input row as 78**

On Monday, February 2, 2015 a 42.0 years old Male named Francis Murphy Rose III was fatally shot .The shooting took place in the city of Apple Valley  located at California.  The victim's race was identified as White. According to local police reports  Francis Murphy Rose III was possessing a gun. The victim was reported to be  not fleeing .There was a certain level of threat associated with the victim .The average age of  these shooting victims in California is around 34.8 years.Since 1st January 2015 a total of 327 such incidents have taken place in California.The percentage of male victims among these 327 shootings is 94.8 percent in California.The percentage of victims who are whites is 29.97 percentage in California.3 such incidents have taken place on this date all over USA.Jacob Haglund was shot in Michigan and David Kassick was shot in Pennsylvania and Francis Murphy Rose III was shot in California.Till now the city of  Apple Valley has seen 1 such shootings .States like California, Texas and Florida have very high occurrence of such police shootings.The shooting statistics in California puts this state among very high range of shooting occurences.Among all the 1954 police shootings in which a police officer, in the line of duty, shot and killed a civilian in US 89.2 of the police officers were not wearing a body camera. .Lack of body camera makes it very difficult to determine the reason for the shooting and hence doesn't lend credibility to the entire situation and adds it as random police shootings.

**Automated text generated With input row as 41**

On Saturday, January 17, 2015 a 27.0 years old Male named Daniel Brumley was fatally shot .The incident was reported in the city of Fort Worth  located at Texas.  The victim's race was identified as Hispanic. According to local police reports  Daniel Brumley was possessing a knife. The victim was reported to be  not fleeing .There was a certain level of threat associated with the victim .The average age of  these shooting victims in Texas is around 35.5 years.Since 1st January 2015 a total of 180 such incidents have taken place in Texas.The percentage of male victims among these 180 shootings is 96.1 percent in Texas.The percentage of victims who are hispanics is 28.89 percentage.3 such incidents have taken place on this date all over USA.Terence Walker was shot in Oklahoma and Pablo Meza was shot in California and Daniel Brumley was shot in Texas.The city of Fort Worth has witnessed 7 such shootings .States like California, Texas and Florida have very high occurrence of such police shootings.The shooting statistics in Texas puts this state among very high range of shooting occurences.Among all the 1954 police shootings in which a police officer, in the line of duty, shot and killed a civilian in US 89.2 of the police officers were not wearing a body camera. .The absence of body camera makes it very difficult to determine the reason for the shooting and hence doesn't lend credibility to the entire situation and adds it as random acts of violence.









#####   Please visit github repository for more of my projects in R and python involving data analysis and machine learning https://github.com/SanchariChowdhuri 