# I am going to scrape all the police deaths data which occured in USA from 1791 to Present including both Human Unit and K9 Unit which are registered on this website https://www.odmp.org/
# This data then can be used for data analytics.
# I have chosen Jupyter Notebook over VS Code because i think i can express my program here more comfortably than VS Code.

# I have written all the necessary comments and tried to make this program beginner friendly as much as possible.
# Feel free to check the program and I hope you will be able to understand this program and try to practice or use it as a reference.
# The dataset that i have collected in this project you can find it on my Kaggle as well as Github Profile.
# Tip :- If you want to understand the program well I can only say that separate the program into sections and run it individually and also you can do some debugging on it.
# For any doubts and query you can connect with me on Linkedin or Email me.
# Email :- kolimayuresh450@gmail.com
# Linkedin :- https://www.linkedin.com/in/mayuresh45/
# Github :- https://github.com/MayureshKoli45
# Kaggle :- https://www.kaggle.com/mayureshkoli
# ThankYou for your time and efforts.

# Importing required libraries.

In [1]:
import requests # For Webscrapping.
from bs4 import BeautifulSoup # To beautify scrapped data (This makes html more readable).
import pandas as pd # To make dataframe, data manipulation, and csv extraction.
from dateutil import parser # To make datetime data in date format.
import numpy as np # For some data manipulation.

# HUMAN UNIT DATA ->
# I will be gathering 13 Features / Variables / Columns of data :-
# 1) Rank -> Rank assigned or achieved by the police throughout their tenure.
# 2) Name -> The name of the person.
# 3) Age -> Age of the person.
# 4) End_Of_Watch -> The death date on which the the person declared as dead.
# 5) Day_Of_Week -> The day of the week [Sunday, Monday, etc.].
# 6) Cause -> The cause of the death.
# 7) Department -> The department's name where the person works.
# 8) State -> The state where the department is situated.
# 9) Tour -> The Duration of there Tenure.
# 10) Badge -> Badge of the person.
# 11) Weapon -> The Weapon by which the officer has been killed.
# 12) Offender -> Offender / Killer this says what happened to the offender after the incident was he/she [Arrested, Killed, etc.].
# 13) Summary -> Summary of the police officer and also the summary of the incident of what happened ? How he/she died ?, etc.

# 

# K9 UNIT DATA ->
# I will be gathering 14 Features / Variables / Columns of data :-
# 1) Rank -> Rank assigned or achieved by the K9 throughout their tenure.
# 2) Name -> The name of the K9.
# 3) Breed -> Breed of the K9.
# 4) Gender -> Gender of the K9.
# 5) Age -> Age of the K9.
# 6) End_Of_Watch -> The death date on which the the person declared as dead.
# 7) Day_Of_Week -> The day of the week [Sunday, Monday, etc.].
# 8) Cause -> The cause of the death.
# 9) Department -> The department's name where the K9 was assigned.
# 10) State -> The state where the department is situated.
# 11) Tour -> The Duration of there Tenure.
# 12) Weapon -> The Weapon by which the officer has been killed.
# 13) Offender -> Offender / Killer this says what happened to the offender after the incident was he/she [Arrested, Killed, etc.].
# 14) Summary -> Summary of the K9 dog and also the summary of the incident of what happened ? How he/she died ?, etc.

# 

# Partial data extraction function works on specific officer dedicated page.

In [2]:
def gathering_officer_details(url):
    '''
    All the action in this function will take place on specific officer dedicated page. 
    How we will collect the url of every officer dedicated pages ? I have elaborated it further in the main loop. 
    This function takes url and returns 2 lists (keys and values) and 1 soup object.
    It is partial data extraction function.
    I am calling this partial data extraction function because it will extract only this data mentioned in this list
    ['Rank', 'Name', 'End_Of_Watch', 'Day_Of_Week', 'Department', 'State', 'Summary']
    which is common in both human unit and k9 unit.
    We have to collect rest of the features on the go because there are some features which are different in human and k9 unit.
    Which you will understand further or refer above Markdowns or visit the website to find differences. 
    This function returns two list and one object (keys list, values list and also return soup object).
    We will use the soup object to scrape the remaining data.
    We will merge keys and values list with other set of keys and values.
    merging will result in 13 keys and values for Human Unit and 14 keys and values for K9 Unit.
    '''
    
    # Making keys list 
    keys = ['Rank', 'Name', 'End_Of_Watch', 'Day_Of_Week', 'Department', 'State', 'Summary']
    
    # Making empty list of values in which we will append data as keys respectively.
    values = []
    
    # Sending request to website and gathering the data html.
    page = requests.get(url)
    
    # Making the soup object in which the html or scrapped data is readable and ready for extraction.
    soup = BeautifulSoup(page.content, 'html.parser')
    
    # Gathering the required data which is hidden in "div" tag which has class_name of "officer-short-details".
    officer_short_details = soup.find("div", class_="officer-short-details")
    
    # "Strong" tag contains Name and Rank.
    name_rank_element = officer_short_details.find("strong") # output -> "Police Officer Christopher Gibson".

    # "p" tag which has class_name of "officer-agency" contains department and state.
    dept_element = officer_short_details.find("p", class_="officer-agency")

    # "p" tag which has class_name of "officer-eow" contains death date.
    eow_element = officer_short_details.find("p", class_="officer-eow")
    
    
    # Cleaning the end of watch date.
    eow_date = eow_element.text # Text extraction. output -> "End of Watch Sunday, January 2, 2022".
    eow_date = eow_date.replace("End of Watch", "") # String replacement. output -> " Sunday, January 2, 2022".
    eow_date = eow_date.strip() # Stripping extra spaces. output -> "Sunday, January 2, 2022".
    eow_date = eow_date.split(",") # Splitting the text into list on "," which gives [Sunday, January 2, 2022].

    
    # Extracting day of week of incident.
    eow_day_of_week = eow_date[0] #  output -> Sunday, Monday, etc.

    eow_date.pop(0) # Here our list loo like this because we removed its first item [January 2, 2022].

    new_eow_date = "" # Empty string
    new_eow_date = new_eow_date.join(eow_date) # Rejoining the list into string format "January 2 2022".
    new_eow_date = new_eow_date.strip() # Stripping extra spaces.
    new_eow_date = new_eow_date.replace(" ","-") # Replacing spaces with "-" output -> "January-2-2022".
    
    
    # Conversion of "January-2-2022" date from string to datetime datatype.
    new_eow_date = parser.parse(new_eow_date).date() 

    
    # Cleaning department and state data
    department = dept_element.text # Text extraction. output -> "Dallas Police Department, Texas".
    state = department.split(",") # Splitting the text into list on "," which gives [Dallas Police Department, Texas].
    state = state[-1].strip() # Extracting state in string format output -> "Texas".
    
    
    # "div" tag which has class_name of "col-md-6" contains summary of incident and some info about police officer.
    officer_incident_details = soup.find("div", class_="col-md-6")
    
    
    # Now the name of the officer is in this format "Police Officer Christopher Gibson" which we named as name_rank_element.
    # We want to separate Rank and Name like Rank -> "Police Officer" and Name -> "Christopher Gibson".
    # In officer_incident_details soup  the "h3" tag contain only the name of the officer "Christopher Gibson".
    name_element = officer_incident_details.find("h3") 
    name = name_element.text # Here we extracted the name "Christopher Gibson" and store it into name variable.

    
    # Now we are ready to remove the name from name_rank_element variable.
    rank = name_rank_element.text # output -> "Police Officer Christopher Gibson".
    rank = rank.replace(name, "") # Here output -> "Police Officer " because we replace the name with "" blank space
    rank = rank.strip() # Here we have stripped the extra space output -> "Police Officer"..

    
    # "p" tag in officer_incident_details soup contains summary of incident and some info about police officer.
    # Here we are simply extracting the paragraph and doing some cleaning.
    incident_summary = officer_incident_details.find("p")
    incident_summary = incident_summary.text # Text Extraction.
    incident_summary = incident_summary.replace('\n',"") # Replacing "\n" with "" black space.
    
    
    # Now we will append and the data that we have extracted in the values list respectively as keys order.
    values.append(rank)
    values.append(name)
    values.append(new_eow_date)
    values.append(eow_day_of_week)
    values.append(department)
    values.append(state)
    values.append(incident_summary)
    
    
    # Returning keys, values and soup object because we will need it further.
    return keys, values, soup

# 

# Making Two Dataframes for Human Unit and for K9 Unit.

In [3]:
# Column names Human Unit
human_feature_columns = ['Rank', 'Name', 'End_Of_Watch', 'Day_Of_Week', 'Department',
                         'State', 'Summary','Age', 'Tour', 'Badge', 'Cause', 'Weapon', 'Offender']

# Column names K9 Unit
k9_feature_columns = ['Rank', 'Name', 'End_Of_Watch', 'Day_Of_Week', 'Department',
                      'State', 'Summary', 'Breed', 'Gender', 'Age', 'Tour', 'Cause', 'Weapon', 'Offender']

# Making empty dataframes we will concat new dataframes to this dataframes for every year iteration and populate our dataframes.
police_deaths_df = pd.DataFrame(columns=human_feature_columns)

k9_deaths_df = pd.DataFrame(columns=k9_feature_columns)

# 

# Here I have made a year list which contains all years from 1791 to 2022 we will iterate rhrough each year so that we will be able to gather data from every year. 

In [4]:
# If you run the loop for this range it will take time (approximately 3 to 4 hours maybe).
# So for test purpose comment out second years_list list and uncomment first years_list .

# years_list = [2021,2022] # This list will gather data for year 2021 and 2022.

years_list = [i for i in range(1791, 2023)] # This list will gather all the data which is available on the website.

# 

# Our main data collection loop starts from here ->

In [5]:
# I have used try and except block for exception handling
# Throughout the program you will see try and except block which is for exception handling because anything can happen.
# For eg:-
# No page are founded, no links are founded, etc.
# Any problems might appear but our program should not stop running.
try:
    for year in years_list:
        # making url
        url = "https://www.odmp.org/search/year?year="+str(year) 
        
        # This statement is for tracking purpose; will tell on which url you are
        print(f"Year: {year} -> url") # output Year: 2022 -> url
        
        
        # Strategy -> First we will visit the page in which there is a list of dead police officer and k9 police.
        #             Which will be on this type url for eg: https://www.odmp.org/search/year/2022 .
        #             From this url we be able to fetch all officers dedicated url pages.
        #             Then we will extract the data of each officer one by one.
        #             To make the sense of it visit this websit https://www.odmp.org/search/year/2022 .
        
        # Sending request and scrapping the data.
        page = requests.get(url)
        
        
        # Making soup object and more data readable.
        soup = BeautifulSoup(page.content, 'html.parser')

        # "article" tags which have class name of "officer-profile-condensed fixed-width-100" contains all officers pages links.
        links_soup = soup.find_all("article", class_="officer-profile-condensed fixed-width-100")

        # Making empty list to collect all urls.
        links_list = []

        
        # There are urls in links_soup and we want to fetch every url which are available on the specific year page.
        # Suppose there are 10 links on a page.
        # This loop will go through the soup and find the tags where the urls are set.
        for link_element in links_soup:
            links = link_element.find_all("a") # This "a" tag contain html of url.
        
            # Extracting all the urls from "a" tag one by one.
            for url_tag in links:
                link_url = url_tag["href"] # "href" tags contain urls.
                links_list.append(link_url) # Here we are appending urls in the list one by one.
        
        
        # Since a single page contains both Human unit and K9 unit data and we want to seperate it.
        # So, for that I have made two more list for Human and K9.
        human_url_list = []
        k9_url_list = []

        
        # To separate them in K9 url string there is a specific text "k9".
        # By which we will be able to separate it with Human url links.
        # Here we will go through every url that we found on a page.
        # If "k9" text appears we will append it in k9_url_list else human_url_list.
        for url in links_list:
            if "k9" in url: 
                k9_url_list.append(url)
            else:
                human_url_list.append(url)
                
        
        # A page has two dedicated url for each police officer one on image and one on name and our program collects both.
        # So in the end we have 2 links for each officer in the list.
        # Thats why I had to use "set" function to remove duplicate links.
        # First the "list" is converted into a "set" and then again into a "list".
        final_human_links = list(set(human_url_list)) 
        final_k9_links = list(set(k9_url_list))

        
        # This print shows how urls we found out on a page for both Human unit and K9 unit.
        # For eg:-
        # 192 - human police urls found in page 2022
        # 23 - k9 police dogs urls found in page 2022
        print(f"{len(final_human_links)} - human police urls found in page {year}")
        print(f"{len(final_k9_links)} - k9 police dogs urls found in page {year}")
        print("\n")
        
        
        # Now here first we will gather data for Human unit.
        # The plan is to convert our keys and values list into a dictionary.
        # And this "human_data" list will be the collection of dictionaries.
        human_data = []
        
        # If we didn't found any links on a page it print the following statement or else it will collect data from each url.
        if len(final_human_links) == 0:
            print(f"No human deaths are registered for this {year}")
        else:
            try:
                for url in final_human_links: 
                    # Here we used our function that we made.
                    # To know how this function works refer the working of the fuction above.
                    keys, values, soup = gathering_officer_details(url)
    
                    
                    # Now that we have collected this data.
                    # ['Rank', 'Name', 'End_Of_Watch', 'Day_Of_Week', 'Department', 'State', 'Summary'].
                    # We need the rest of the data.
                    # And why are we doing this things separately ? 
                    # Because as I have mentioned it above there some differences of features in Human unit and K9 unit.
                    # For eg:-
                    # Human don't have breed, Dogs have.
                    # Collecting bio and cause details from officer bio section.
                    officer_bio = soup.find('section', class_="officer-bio")
    
    
                    # Collecting span tags because in this span tags contains all the other information that we needed.
                    bio_elements = officer_bio.find_all('span')
    
    
                    # In this section will collect this features.
                    # For K9 there will be some differences.
                    age = []
                    tour = []
                    badge = []
                    cause = []
                    weapon = []
                    offender = []

                    
                    # All this info is in bio_element I am just extracting text from the html tag.
                    # Removing extra spaces and appending in each dedicated list.
                    # Rest are normal IF ELSE statements. 
                    # If you have understand my program till here then you might be able to understand this.
                    # Now a question arises why I have include continue ?
                    # Because in span tag there are also some information that don't have any value.
                    # And it might disrupt the dataframe structure.
                    # So if there are other labels rather than mentioned labels then we will not include it.
                    for i in bio_elements:
                        if "officer-age-label" in str(i):
                            age.append(i.text.strip()) # output -> age = ['Age', '31'].
    
                        elif "officer-tour-label" in str(i):
                            tour.append(i.text.strip()) # output -> tour = ['Tour', '4 years'].
        
                        elif "officer-badge-label" in str(i):
                            badge.append(i.text.strip()) # output -> badge = ['Badge', '706'].
        
                        elif "officer-cause-label" in str(i):
                            cause.append(i.text.strip()) # output -> cause = ['Cause', 'Gunfire'].
        
                        elif "officer-weapon-label" in str(i):
                            weapon.append(i.text.strip()) # output -> weapon = ['Weapon', 'Gun; Unknown type'].
        
                        elif "officer-offender-label" in str(i):
                            offender.append(i.text.strip()) # output -> offender = ['Offender', 'Committed suicide'].   
        
                        else:
                            continue
             
            
                    # Now here we are making another section of keys and values.
                    # Which we will merge/append/concat with previous keys and values variables.
                    # Here it will contains the rest of the info that we needed.
                    keys_2 = ['Age', 'Tour', 'Badge', 'Cause', 'Weapon', 'Offender']
                    values_2 = []

                    # Here suppose there is no specific info available then we will fill with np.nan values.
                    # For eg:-
                    # If an officer is died with some disease like covid 19, heart attack, etc.
                    # Then there will no section offender and weapon.
                    # So, in that case I decided to append null values.
                    if age == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(age[-1])
    
                    if tour == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(tour[-1])
    
                    if badge == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(badge[-1])    

                    if cause == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(cause[-1])    
    
                    if weapon == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(weapon[-1])  
        
                    if offender == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(offender[-1])  
        
        
                    # Here we are replacing "not available" string data with np.nan.
                    # Because some data is like this [offender, "Not available"].
                    for i in range(len(values_2)):
                        if values_2[i] == "Not available":
                            values_2[i] = np.nan
                        else:
                            continue  
            
            
                    # Merging new keys and values with previous keys and values  .      
                    for i in keys_2:
                        keys.append(i)
    
                    for i in values_2:
                        values.append(i)   
        
                    # Making a dictionary of those two lists . 
                    main_dict = dict(zip(keys, values))
    
                    # Now i will append this dictionary into a data list.
                    human_data.append(main_dict)
        
            except:
                print("\nNo urls found\n")
             
            
        # Now the steps we did for Human_unit the same steps we will do for K9 unit.
        # Follow the above comments and you are good to go.
        k9_data = []
        
        if len(final_human_links) == 0:
            print(f"No k9 deaths are registered for this {year}")
        else:
            try:
                for url in final_k9_links:
                    keys, values, soup = gathering_officer_details(url)
    
                    officer_bio = soup.find('section', class_="officer-bio")
    
                    bio_elements = officer_bio.find_all('span')
    
                    breed = []
                    gender = []
                    age = []
                    tour = []
                    cause = []
                    weapon = []
                    offender = []

                    for i in bio_elements:
                        if "officer-breed-label" in str(i):
                            breed.append(i.text.strip())
    
                        elif "officer-gender-label" in str(i):
                            gender.append(i.text.strip())
        
                        elif "officer-age-label" in str(i):
                            age.append(i.text.strip())
        
                        elif "officer-tour-label" in str(i):
                            tour.append(i.text.strip())
        
                        elif "officer-cause-label" in str(i):
                            cause.append(i.text.strip())    
        
                        elif "officer-weapon-label" in str(i):
                            weapon.append(i.text.strip())
        
                        elif "officer-offender-label" in str(i):
                            offender.append(i.text.strip())    
        
                        else:
                            continue
            
                    keys_2 = ['Breed', 'Gender', 'Age', 'Tour', 'Cause', 'Weapon', 'Offender']
                    values_2 = []

                    if breed == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(breed[-1])
    
                    if gender == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(gender[-1])    

                    if age == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(age[-1])
    
                    if tour == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(tour[-1])  

                    if cause == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(cause[-1])    
    
                    if weapon == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(weapon[-1])  
    
                    if offender == []:
                        values_2.append(np.nan)
                    else:
                        values_2.append(offender[-1]) 
        
                    for i in range(len(values_2)):
                        if values_2[i] == "Not available":
                            values_2[i] = np.nan
                        else:
                            continue 
            
                    for i in keys_2:
                        keys.append(i)
    
                    for i in values_2:
                        values.append(i)    
        
                    # Making a dictionary of those two lists. 
                    main_dict = dict(zip(keys, values))    
    
                    # Now i will append this dictionary into a data list.
                    k9_data.append(main_dict) 
        
            except:
                print("No urls found")
                
         
        
        # Here if no data is found i.e. if our human_data list is empty then just continue to next page.
        # Else we will concat it with our dataframe.
        # Same for K9.
        if len(human_data) == 0:
            continue
        else:    
            human_df = pd.DataFrame(human_data)
            police_deaths_df = pd.concat([police_deaths_df, human_df], ignore_index=True)
            
        if len(k9_data) == 0:
            continue
        else:    
            k9_df = pd.DataFrame(k9_data)
            k9_deaths_df = pd.concat([k9_deaths_df, k9_df], ignore_index=True)    

            
except:
    print("Warning!!!")

    
# When all data gathering will be successfull this print statement will run.
print("Data gathering loop ran successfully\n")
    

# This step needed to be included because some department have no state.
# Because they are directly connected to US Government, Tribal Police or worken for multiple departments.
# That's why I needed to collect this state list.
states_list = ['New York', 'South Carolina', 'Pennsylvania', 'North Carolina', 'Kentucky', 'Maryland', 
               'Maine', 'Vermont', 'Tennessee', 'Ohio', 'Virginia', 'Indiana', 'Massachusetts', 'Alabama',
               'Rhode Island', 'Connecticut', 'Missouri', 'Texas', 'Georgia', 'Florida', 'Wisconsin',
               'Arkansas', 'California', 'Louisiana', 'Delaware', 'Utah', 'Illinois', 'New Jersey', 'Washington',
               'Michigan', 'Nevada', 'Colorado', 'Idaho', 'Oregon', 'Arizona', 'Iowa', 'Kansas', 'Nebraska',
               'West Virginia', 'New Mexico', 'District of Columbia', 'Minnesota', 'New Hampshire', 'Wyoming',
               'Montana', 'Mississippi', 'North Dakota', 'South Dakota', 'Oklahoma', 'Hawaii', 'Puerto Rico',
               'Alaska', 'Panama Canal Zone', 'Virgin Islands', 'Guam', 'American Samoa', 'Northern Mariana Islands']


# Now we will run a loop through "State" column and fix those exceptions.
# If any of this exception occurs we will directly declare "State" as "United States" or you can change as you want.
# Same for K9.
for i in range(len(police_deaths_df)):
    if police_deaths_df['State'][i] in states_list:
        continue
    else:
        police_deaths_df['State'][i] = "United States"

for i in range(len(k9_deaths_df)):
    if k9_deaths_df['State'][i] in states_list:
        continue
    else:
        k9_deaths_df['State'][i] = "United States" 


# Here I am rearranging the columns nothing more.        
human_columns_order_list = ['Rank', 'Name', 'Age', 'End_Of_Watch', 'Day_Of_Week', 'Cause', 'Department', 'State', 
                            'Tour', 'Badge', 'Weapon', 'Offender', 'Summary']

k9_columns_order_list = ['Rank', 'Name', 'Breed', 'Gender', 'Age', 'End_Of_Watch', 'Day_Of_Week', 'Cause', 
                         'Department', 'State', 'Tour', 'Weapon', 'Offender', 'Summary']

police_deaths_df = police_deaths_df.reindex(columns=human_columns_order_list)
k9_deaths_df = k9_deaths_df.reindex(columns=k9_columns_order_list)


# Here I am extracting the dataframe into csv file.
police_deaths_df.to_csv('police_deaths_USA.csv',index=False)
k9_deaths_df.to_csv('k9_deaths_USA.csv',index=False)

# Final print statement.
print("All actions are preformed successfully")

Year: 2021 -> url
650 - human police urls found in page 2021
21 - k9 police dogs urls found in page 2021


Year: 2022 -> url
193 - human police urls found in page 2022
23 - k9 police dogs urls found in page 2022


Data gathering loop ran successfully

All actions are preformed successfully


# 

# Here I have ran this program for year 2021 and 2022 to show how our final dataframe will look like this is only to show you.
# I have ran for all those years loop and you will find the full dataset on my Kaggle profile where I have named this dataset as "Police deaths in USA from 1791 to 2022" you can search on Kaggle Datasets with this name.

# 

# Tip :- After scrapping or Collecting the data always make sure to check whether data is added correctly.
# There are 3 data points K9 dataset which are Humans but added in K9 Dataframe because ther rank was K9 Officer and we have the program in a way that if there K9 in url it consider as K9 url to make this correction I am providing another file in which I have handled that exception.

# 

# Police deaths dataframe sample.

In [6]:
police_deaths_df.sample(10)

Unnamed: 0,Rank,Name,Age,End_Of_Watch,Day_Of_Week,Cause,Department,State,Tour,Badge,Weapon,Offender,Summary
60,Sergeant,Frank Rodriguez,39,2021-09-29,Wednesday,COVID19,"Midwest City Police Department, Oklahoma",Oklahoma,12 years,170.0,,,Sergeant Frank Rodriguez died from complicatio...
24,Lieutenant,Danny James Guynes,57,2021-09-13,Monday,COVID19,"Monroe County Sheriff's Office, Arkansas",Arkansas,26 years,9711.0,,,Lieutenant Danny Guynes died from complication...
825,Police Officer,Freddie Wilson,61,2022-03-10,Thursday,Heart attack,Detroit Public Schools Community District Poli...,Michigan,20 years,68.0,,,Police Officer Freddie Wilson died after colla...
271,Corporal,John Joseph Wojciechowski,57,2021-12-21,Tuesday,COVID19,"Wayne County Sheriff's Office, Michigan",Michigan,25 years,1287.0,,,Corporal John Wojciechowski died from complica...
494,Sergeant,Dominic Eugene Guida,43,2021-11-09,Tuesday,Heart attack,"Bunnell Police Department, Florida",Florida,19 years,5111.0,,,Sergeant Dominic Guida suffered a fatal heart ...
513,Police Officer,Robert Troy Joiner,62,2021-09-05,Sunday,COVID19,Ector County Independent School District Polic...,Texas,44 years,1234.0,,,Police Officer Robert Joiner died from complic...
121,Police Officer,Michelle Beth Gattey,44,2021-09-16,Thursday,COVID19,"Georgetown Police Department, Texas",Texas,9 months,,,,Police Officer Michelle Gattey died from compl...
16,Special Agent,Laura Ann Schwartzenberger,43,2021-02-02,Tuesday,Gunfire,United States Department of Justice - Federal ...,United States,15 years,,Rifle,Deceased,Special Agent Laura Schwartzenberger and Speci...
256,Senior Officer,David Bryant Saavedra,33,2021-09-02,Thursday,COVID19,United States Department of Homeland Security ...,United States,10 years,,,,Senior Officer David Saavedra died from compli...
126,Sergeant,John Richard Burright,61,2021-05-04,Tuesday,Struck by vehicle,"Oregon State Police, Oregon",Oregon,14 years,,,,Sergeant John Burright succumbed to injuries s...


# K9 deaths dataframe sample.

In [7]:
k9_deaths_df.sample(10)

Unnamed: 0,Rank,Name,Breed,Gender,Age,End_Of_Watch,Day_Of_Week,Cause,Department,State,Tour,Weapon,Offender,Summary
37,K9,Drago,Belgian Malinois,Male,5,2022-06-30,Thursday,Gunfire,"Floyd County Sheriff's Office, Kentucky",Kentucky,"4 years, 6 months",Gun; Unknown type,Charged with murder,K9 Drago was shot and killed in Allen by a sub...
1,K9,Max,German Shepherd,Male,3,2021-06-30,Wednesday,Gunfire,"St. Joseph Police Department, Missouri",Missouri,,Gun; Unknown type,Arrested,K9 Max was shot and killed by a suspect while ...
20,K9,Khan,Belgian Malinois,Male,4,2021-05-14,Friday,Heatstroke,"Monroe County Sheriff's Office, Georgia",Georgia,3 years,,,K9 Khan died of heatstroke when he was acciden...
11,K9,Duke,German Shepherd,Male,5,2021-08-30,Monday,Heatstroke,"Virginia State Police, Virginia",Virginia,5 years,,,K9 Duke died after suffering heatstroke while ...
4,K9,Joker,Belgian Malinois,Male,1,2021-04-21,Wednesday,Training accident,"Indian River County Sheriff's Office, Florida",Florida,3 months,,,K9 Joker died when he accidentally choked on a...
19,K9,Kitt,Belgian Malinois,Male,12,2021-06-04,Friday,Gunfire,"Braintree Police Department, Massachusetts",Massachusetts,12 years,Handgun,Shot and killed,K9 Kitt was shot and killed when a domestic vi...
5,K9,Byrd,Labrador Retriever,Male,6,2021-08-03,Tuesday,Heatstroke,Texas Department of Public Safety - Texas High...,Texas,2 years,,,K9 Byrd died of heatstroke when the cooling sy...
36,K9,Exo,Belgian Malinois,Male,2,2022-06-23,Thursday,Gunfire,"Pascagoula Police Department, Mississippi",Mississippi,,Handgun,Shot and wounded,K9 Exo was shot and killed in the Helena area ...
28,K9,Ciro,German Shepherd,Male,5,2022-03-03,Thursday,Fire,"Humphreys County Sheriff's Office, Tennessee",Tennessee,4 years,,,K9 Ciro died after his handler's patrol car ex...
14,K9,Jaeger,German Shepherd,Male,6,2021-05-14,Friday,Fall,"Stephens County Sheriff's Office, Oklahoma",Oklahoma,,,,K9 Jaeger suffered a serious spinal injury whe...


# Police deaths dataframe info.

In [8]:
police_deaths_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 843 entries, 0 to 842
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Rank          843 non-null    object
 1   Name          843 non-null    object
 2   Age           833 non-null    object
 3   End_Of_Watch  843 non-null    object
 4   Day_Of_Week   843 non-null    object
 5   Cause         843 non-null    object
 6   Department    843 non-null    object
 7   State         843 non-null    object
 8   Tour          818 non-null    object
 9   Badge         487 non-null    object
 10  Weapon        183 non-null    object
 11  Offender      174 non-null    object
 12  Summary       843 non-null    object
dtypes: object(13)
memory usage: 85.7+ KB


# K9 deaths dataframe info.

In [9]:
k9_deaths_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44 entries, 0 to 43
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Rank          44 non-null     object
 1   Name          44 non-null     object
 2   Breed         44 non-null     object
 3   Gender        44 non-null     object
 4   Age           36 non-null     object
 5   End_Of_Watch  44 non-null     object
 6   Day_Of_Week   44 non-null     object
 7   Cause         44 non-null     object
 8   Department    44 non-null     object
 9   State         44 non-null     object
 10  Tour          33 non-null     object
 11  Weapon        19 non-null     object
 12  Offender      20 non-null     object
 13  Summary       44 non-null     object
dtypes: object(14)
memory usage: 4.9+ KB


# If you have checked this program till here I hope you have learned something.
# THANK YOU FOR YOUR TIME AND ALL THE BEST IN YOUR LIFE.