
## Pavel Makarov
## Data Wrangling Projects Part 3 :
## Data Exctraction from API source
### 2024-02-10

#### This API offers access to historical data from the CDE website, encompassing crime rates and types categorized by location. The data can be formatted to facilitate ranking each state. Subsequently, analyzing the change in rank over the years allows me to track the specific changes in the crime ranking for Massachusetts. This data set will be generated by obtaining inforamtion from each city police department on property related crimes


##### Link - https://cde.ucr.cjis.gov/LATEST/webapp/#/pages/docApi

In [1]:
# Import all necessary libriries
import pandas as pd 
import requests
import json

In [2]:
# Open the json file containing the API keys
with open('Config.json') as config_file:
    config = json.load(config_file)
    api_key = config['Key']
    


In [3]:
# Get a reuqest using URL and API key. Check the response status
api_endpoint = f'https://api.usa.gov/crime/fbi/cde/agency/byStateAbbr/MA?API_KEY={api_key}'
response_citites = requests.get(api_endpoint)
print(response_citites)

<Response [200]>


In [4]:
# Convert the response into json and then into data frame using Pandas
data_citites = response_citites.json()
data_citites = pd.DataFrame(data =data_citites)
data_citites.head()

Unnamed: 0,ori,agency_name,agency_id,state_name,state_abbr,division_name,region_name,region_desc,county_name,agency_type_name,nibrs,nibrs_start_date,latitude,longitude
0,MA0010100,Barnstable Police Department,7415,Massachusetts,MA,New England,Northeast,Region I,BARNSTABLE,City,True,2000-10-01T00:00:00.000Z,41.798819,-70.211083
1,MA0010200,Bourne Police Department,7416,Massachusetts,MA,New England,Northeast,Region I,BARNSTABLE,City,True,1995-01-01T00:00:00.000Z,41.798819,-70.211083
2,MA0010300,Brewster Police Department,7417,Massachusetts,MA,New England,Northeast,Region I,BARNSTABLE,City,True,1999-01-01T00:00:00.000Z,41.744106,-70.08096
3,MA0010400,Chatham Police Department,7418,Massachusetts,MA,New England,Northeast,Region I,BARNSTABLE,City,True,2003-01-01T00:00:00.000Z,41.685635,-69.96251
4,MA0010500,Dennis Police Department,7419,Massachusetts,MA,New England,Northeast,Region I,BARNSTABLE,City,True,1998-01-01T00:00:00.000Z,41.703323,-70.155136


### Step 1 - Drop unnacessary columns. Drop the columns that are unnecesary for the list of police departments and their codes generation

In [5]:
# Create a cleaned df by dropping unnecessary columns
data_citites_cleaned = data_citites.drop(columns = ['agency_id','state_name','state_abbr','division_name','region_name','region_desc','county_name','agency_type_name','nibrs','nibrs_start_date','latitude','longitude',])

In [6]:
# Check the results
data_citites_cleaned.head(10)

Unnamed: 0,ori,agency_name
0,MA0010100,Barnstable Police Department
1,MA0010200,Bourne Police Department
2,MA0010300,Brewster Police Department
3,MA0010400,Chatham Police Department
4,MA0010500,Dennis Police Department
5,MA0010600,Eastham Police Department
6,MA0010700,Falmouth Police Department
7,MA0010800,Harwich Police Department
8,MA0010900,Mashpee Police Department
9,MA0011000,Orleans Police Department


### Step 2 - Create dictionary of ori code and corresponding agency name. This dictionary will be used to map the code of the police department to its name for column naming

In [7]:
# Convert the df into a dictionary using ori codes as keys
names_dict = data_citites_cleaned.set_index('ori')['agency_name'].to_dict()


### Step 3 - Generate a list of ori codes to iterate through for API calls. The idea is to go through all MA police department codes to make an API calls on each and extract the data for each town

In [8]:
# Generate a list of codes to insert into the URL for iterative API calls
ori = list(data_citites_cleaned['ori'])

In [9]:
# Check random item from the list
ori[31]

'MA0022300'

### Step 4 - This function contains multiple data frame transforamtions - including column renamiing, column drops and data frame renaming

In [10]:
'''Generate a function to exctract desired information from the data set by applying series of modififcations. Then returning 
the formated df with a different name.
'''

def Crime_Exctractor_v1(df):
    df['Department'] = df['ori'].replace(names_dict)
    df.rename(columns={'actual': df['offense'][0] + ' '+ df['Department'][0]}, inplace=True)
    columns_to_remove = ['offense', 'ori', 'Department', 'cleared']
    name = df['Department'][0]
    df.name = f'df_{name}'
    for column in columns_to_remove:
        df.drop((column), axis=1, inplace=True)
    return df
    

In [11]:
# Iterate through the list of agencies andapply data extractor function from above to obtain all data frames for property crimes
# Populate the list of df names and a list of data frames
df_list_name = []
df_list = []
for department in ori:
    api_endpoint = f'https://api.usa.gov/crime/fbi/cde/summarized/agency/{department}/property-crime?from=2010&to=2019&API_KEY={api_key}'
    response = requests.get(api_endpoint)
    try:
        data = response.json()
        if data:  # Check if data is not empty
            data = pd.DataFrame(data=data)
            Crime_Exctractor_v1(data)
            df_list_name.append(data.name)
            df_list.append(data)
            
    except Exception as e:
        print(f"An error occurred for department {department}: {e}")

In [12]:
# Check random data frame from the list

df_list[10]

Unnamed: 0,data_year,property-crime Provincetown Police Department
0,2010,215
1,2011,98
2,2012,117
3,2013,105
4,2014,93
5,2015,76
6,2016,105
7,2017,127
8,2018,96
9,2019,51


### Step 5 - Merge on common column. In this case it is 'data_year' column. Iterate through the lsit of data frame and merge the m together.

In [15]:
# Combine all data frames on common column corresponding to year
combined_df = df_list[0]

# Iterate over the remaining DataFrames in the list and merge with the combined DataFrame
for df in df_list[1:]: 
    # Merge on column 'data_year'
    combined_df = pd.merge(combined_df, df, on= 'data_year', how ='outer')
combined_df = pd.DataFrame(data = combined_df)
    

combined_df.head()

Unnamed: 0,data_year,property-crime Barnstable Police Department,property-crime Bourne Police Department,property-crime Brewster Police Department,property-crime Chatham Police Department,property-crime Dennis Police Department,property-crime Eastham Police Department,property-crime Falmouth Police Department,property-crime Harwich Police Department,property-crime Mashpee Police Department,...,property-crime State Police: Franklin County,property-crime State Police: Hampden County,property-crime State Police: Hampshire County,property-crime State Police: Middlesex County,property-crime State Police: Nantucket County,property-crime State Police: Norfolk County,property-crime State Police: Plymouth County,property-crime State Police: Suffolk County,property-crime State Police: Worcester County,property-crime Wampanoag Tribe of Gay Head
0,2010,1327,661,224,173,637,73,1168,271,368,...,1,51,2,7.0,,5,4.0,,0,
1,2011,1280,537,223,126,539,156,1069,283,331,...,0,19,2,3.0,,3,2.0,,2,
2,2012,1269,542,168,170,527,127,1064,249,293,...,0,28,3,0.0,,1,1.0,,1,3.0
3,2013,1190,465,132,123,505,100,899,251,284,...,1,12,0,1.0,,5,,14.0,1,
4,2014,1083,309,115,120,361,88,760,205,237,...,0,5,5,,,0,,,2,


In [16]:
# Save the data frame into a csv file
combined_df.to_csv('cleaned_propert_crimes_2010_2019.csv')

### One of the most significant ethical concerns when working with crime report numbers is the potential for stigmatization and bias against the communities from which these reports originate. Publicly presenting property crime reports can significantly damage the reputation of these towns, leading to adverse effects on community trust and potentially discouraging investments in real estate within these areas. Moreover, the decision to exclude closed cases related to property crimes in this study raises issues of selective reporting, which can have detrimental effects on the perception of law enforcement agencies. This practice not only risks damaging the reputation of towns with high crime rates but also undermines the recognition of effective crime detection efforts by police departments. Such selective reporting can erode public confidence in the fairness and transparency of crime data analysis and reporting processes, thereby exacerbating existing biases and misconceptions about crime and safety in these communities. It is essential to address these ethical considerations with transparency, integrity, and a commitment to promoting equitable representation and understanding of crime dynamics within communities.