# Brown Scholars Internship 2019-2020 - Urban Wildlife in NYC

New York City is home to many diverse species of wildlife that arrived or existed long before humans settled here.
In October 2016, Mayor Bill de Blasio launched WildlifeNYC, a citywide education and awareness campaign teaching New Yorkers how to live safely and responsibly alongside wild animals including deer, raccoons, and coyotes.

Urban wildlife is any wild animal that lives in an urban environment, such as New York City. Urban wildlife includes birds, mammals, reptiles, fish and amphibians. Some urban wildlife is native, like eastern grey squirrels, while some are non-native, like mute swans. Domesticated and companion animals, like dogs, exotic pets, and farm animals are not considered urban wildlife. Domesticated but feral animals like pigeons and stray cats are also not considered urban wildlife.

Using information about requests for animal assistance, relocation, and/or rescue completed by the Urban Park Rangers, we will attempt to get a better understanding of the different species of wildlife that call the Big Apple home.

Data source: https://data.cityofnewyork.us/Environment/Urban-Park-Ranger-Animal-Condition-Response/fuhs-xmg2

In [1]:
# First we'll start by importing packages we'll use

import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 1000)

In [2]:
# Then import the data. For now, the csv file should be in the same directory as the notebook.
# Notice that we are importing the date and time info as type 'datetime'

data = pd.read_csv('Urban_Park_Ranger_Animal_Condition_Response.csv',
                   parse_dates = ['Date and Time of initial call', 'Date and time of Ranger response'])

Note: if you want to export the data, use df.to_csv(filename), where df is the name of your dataframe and filename is the name of the file where you want to save the data. The csv file will get created in the same directory as the notebook.

In [18]:
data.head()

Unnamed: 0,Date and Time of initial call,Date and time of Ranger response,Borough,Property,Location,Species Description,Call Source,Species Status,Animal Condition,Duration of Response,...,311SR Number,Final Ranger Action,# of Animals,PEP Response,Animal Monitored,Rehabilitator,Hours spent monitoring,Police Response,ESU Response,ACC Intake Number
0,2019-06-12 09:20:00,2019-06-12 09:20:00,Manhattan,Washingtom Square Park,on Sidewalk accross from the park near 10 Wash...,Red-tailed Hawk,Other,Native,,0.5,...,,Advised/Educated others,1.0,False,False,,,False,False,
1,2019-06-11 16:15:00,2019-06-11 16:20:00,Bronx,Van Cortlandt Park,Adjacent to VC Golf House,Canada Goose,Public,Native,Injured,0.5,...,1-1-1733837211,Unfounded,1.0,False,False,,,False,False,
2,2019-06-10 13:00:00,2019-06-10 13:30:00,Brooklyn,Irving Square Park,Northwest corner of the park,Parrot,Public,Exotic,,1.5,...,,Unfounded,1.0,False,False,,,False,False,
3,2019-06-09 09:30:00,2019-06-09 10:00:00,Brooklyn,Parade Ground,Prospect Park Parade Grounds near Tennis Center,Chicken,Central,Domestic,Healthy,3.0,...,1-1-1730643971,ACC,1.0,False,False,,,False,False,65352
4,2019-06-09 12:50:00,2019-06-09 12:55:00,Staten Island,Silver Lake Park,Bridge,Red-Eared Slider,Employee,Invasive,Injured,2.0,...,1-1-1724490913,ACC,2.0,True,False,,,False,False,65379 65380


## Inspecting and cleaning the data

Now that the data is loaded, we want to get familiar with it. To learn more about what the data looks like we can try any of the following commands:
- data.head( ) - to look at the first 5 rows
- data.tail( ) - to look at the last 5 rows
- data.shape - to get the number of rows and columns
- data.info( ) - to get the names of the columns, how many non null pieces of data is in each column, and the type of data in each column
- data.nunique( ) - to get how many unique values are in each column
- data.max() - to get the highest value in each column
- data.min() to get the lowest value in each column
- data['col'].value_counts() - to get how many unique values are in a particular column

In [93]:
# Let's figure out how much data we have
data.shape
data.count()

In [13]:
# Let's figure out what kind of data we have
data.info()

In [94]:
# Let's figure out what kind of data we have
data.nunique()

Date and Time of initial call       903
Date and time of Ranger response    943
Borough                               5
Property                            193
Location                            867
Species Description                 138
Call Source                           6
Species Status                        4
Animal Condition                      4
Duration of Response                 29
Age                                   7
Animal Class                         18
311SR Number                        533
Final Ranger Action                   7
# of Animals                         10
PEP Response                          2
Animal Monitored                      2
Rehabilitator                        10
Hours spent monitoring               14
Police Response                       2
ESU Response                          2
ACC Intake Number                   399
dtype: int64

In [21]:
# It seems like we have some null values, but do we also have repeat values?
#Yes, for example: data["Borough"].value_counts() shows a repeating number of borogh names
data["Borough"].value_counts()

Manhattan        475
Brooklyn         218
Queens           137
Staten Island     84
Bronx             68
Name: Borough, dtype: int64

In [7]:
# It seems like all columns have repeat vaules, going from the names of the columns that makes sense.
# For example, there are 982 data in the Borough column but only 5 unique values.
# This is congruent with our knowledge that there are only 5 boroughs in NYC:
# Bronx, Brooklyn, Manahattan, Queens, Staten Island
# Make sure our assumption is correct

By now, we also have a sense of which columns may have null values. It may be or not be ok for a column to have null values. One way to replace null values with some other value is using data.fillna(x) where x is the value we want instead of the null. It doesn't seems like this is necessary in this case.

In addition, the data may not be in 'standard' form, that is for example, having the strings 'yes', 'YES', and 'Yes' all be values contained in the same column. To verify that the data in a column is in 'standard' form, we can use data['column_name'].unique(). For example, what happens when we try data['Species Description'].unique()? What happens when we try data['Species Status'].unique()? To replace values, we can use data['column name'].replace('yes','Yes') to replace all 'yes' values with 'Yes' values (for example).

In [95]:
# Let's find out what the unique values are for "Property". We will sort them first so we can id misspellings
# hint: use data['Property'].sort_values().unique()
data['Property'].sort_values().unique()

array(['5 East 102nd St', '851 Fairmont Pl', 'Abingdon Square',
       'Alley Pond Park', 'Alley Pond Park Tennis Courts Near Court 16',
       'Allison Pond Park', 'Altamont House', 'Annadale Green',
       'Astoria Park', 'Baisley Pond Park', 'Bartel-Pritchard Square',
       'Battery Park City', 'Bedford Playground', 'Bellevue South Park',
       'Blood Root Valley', 'Bloomingdale Park', 'Blue Heron Park',
       'Bowling Green', 'Bowne Park', 'Bradys Pond Park', 'Bronx Park',
       'Brooklyn Bridge Park', 'Brookville Park', 'Bryant Park',
       'Bushwick Inlet Park', 'Cadman Plaza Park', 'Calvert Vaux Park',
       'Cambria Playground', 'Canarsie Park', 'Captain Tilly Park',
       'Carl Schurz Park', 'Cenral Park', 'Centrail Park',
       'Central  Park', 'Central Park', 'Chelsea Park',
       'Churchill School', 'Clove Lakes Park', 'Coffey Park',
       'College Point Park', 'Colonel Charles Young Playground',
       'Concrete Plant Park', 'Coney Island Beach &amp; Boardwalk',


In [28]:
# Seems like we need to make several corrections
# Here is the first one, you do the others
data['Property'].replace('Bushwick Inlet',"Bushwick Inlet Park",inplace=True)

In [33]:
# check to see corrections took place, there should only be 188 records now
data['Property'].sort_values().unique()

In [40]:
# Let's find out what the unique values are for "Species Description". We will sort them first so we can id misspelings
# hint: do something similar to what you did to find out the unique values for "Property"
data['Species Description'].sort_values().unique()

array(['2 black back gulls and 1 pigeon', 'Alligator snapping turtle',
       'American Goldfinch', 'American Oystercatcher', 'American Robin',
       'Banded Rock Pigeon', 'Bat', 'Bearded dragon', 'Big Brown Bat',
       'Bird', 'Black Racer Snake', 'Black Skimmer', 'Black backed gull',
       'Blue Jay', 'Boa Constrictor Snake', 'Brant', 'Brant Goose',
       'Canada Goose', 'Canada goose', 'Cat', 'Chicken',
       'Chinese Silky Chicken', 'Cockatiel', 'Common Snapping Turtle',
       "Cooper's Hawk", 'Coopers Hawk', 'Cormorant', 'Corn Snake',
       'Coyote', 'Deer', 'Dog', 'Dolphin', 'Domestic Dove',
       'Domestic Duck', 'Domestic Rabbit', 'Double crested cormorant',
       'Dove', 'Downy woodpecker', 'Duck', 'Duck (species unknown)',
       'Eastern Gray Squirrel', 'Egret', 'Falcon',
       'Fledgling (possibly Starling)', 'Freshwater Fish and Turtles',
       'Frog', 'Gerbil', 'Goose - White Pygmy', 'Gray squirrel',
       'Green Heron', 'Groundhog', 'Guinea Pigs', 'Guineafowl

In [58]:
# Seems like we need to make several corrections
# Here is the first one, you do the others
data['Species Description'].replace('Bird/Unspecified Species',"Bird",inplace=True)
data['Species Description'].replace('RACCOON',"Raccoon",inplace=True)
data['Species Description'].replace('Racoon',"Raccoon",inplace=True)
data['Species Description'].replace('raccoon',"Raccoon",inplace=True)
data['Species Description'].replace('Squirrel',"Squirrel",inplace=True)
data['Species Description'].replace('squirrel',"Squirrel",inplace=True)
data['Species Description'].replace('squirrel',"Squirrel",inplace=True)
data['Species Description'].replace('Turtle/ Unspecified species',"Turtle",inplace=True)
data['Species Description'].replace('Squirrells',"Squirrel",inplace=True)
data['Species Description'].replace('Common Snapping Turtle',"Turtle",inplace=True)



In [97]:
# check to see corrections took place, there should only be 110 records now
data['Species Description'].sort_values().unique()

array(['2 black back gulls and 1 pigeon', 'Alligator snapping turtle',
       'American Goldfinch', 'American Oystercatcher', 'American Robin',
       'Banded Rock Pigeon', 'Bat', 'Bearded dragon', 'Big Brown Bat',
       'Bird', 'Black Racer Snake', 'Black Skimmer', 'Black backed gull',
       'Blue Jay', 'Boa Constrictor Snake', 'Brant', 'Brant Goose',
       'Canada Goose', 'Canada goose', 'Cat', 'Chicken',
       'Chinese Silky Chicken', 'Cockatiel', "Cooper's Hawk",
       'Coopers Hawk', 'Cormorant', 'Corn Snake', 'Coyote', 'Deer', 'Dog',
       'Dolphin', 'Domestic Dove', 'Domestic Duck', 'Domestic Rabbit',
       'Double crested cormorant', 'Dove', 'Downy woodpecker', 'Duck',
       'Duck (species unknown)', 'Eastern Gray Squirrel', 'Egret',
       'Falcon', 'Fledgling (possibly Starling)',
       'Freshwater Fish and Turtles', 'Frog', 'Gerbil',
       'Goose - White Pygmy', 'Gray squirrel', 'Green Heron', 'Groundhog',
       'Guinea Pigs', 'Guineafowl', 'Gull', 'Harbor Porpois

In [99]:
# Finally, Let's find out what the unique values are for "Species Status".
data["Species Status"].nunique()
data["Species Status"].unique()

array(['Native', 'Exotic', 'Domestic', 'Invasive', nan], dtype=object)

At this point we could find answers to some basic questions we may have such as:
- NYC is home to what species of animals?
- NYC is home to what types of animals (species status)?
- Where in NYC could we find animals?

## Exploring the data

But what if we wanted to answer more complex questions such as,
- are there more native or invasive animals living in Manahttan?
- do all types ('species status') of animals live in all 5 boroughs of NYC?
- do different species of animals live in different properties?

To answer more complex questions we will try filtering and grouping the data. The decisions that we make when doing this can be based on our knowledge of the topic, our curiosity to learn from the data, as well as informed by what we learn from the data (or all three!).

### Filtering data : are there more native or invasive animals living in Manahattan?

To filter data, the following commands are useful:

- data[col] - to work only with one column
- data[data.col == 'value'] - to extract rows that meet a particular criteria
- data[(data[col] > value) & (data[col] < value)] - to extract rows that meet more than one criteria

In [78]:
# We'll create a new dataframe with only the data for Manahattan
manhattan = data[data.Borough == 'Manhattan']
manhattan.shape
#manhattan.head()


(475, 22)

In [84]:
# What kind of species are in Manahattan
manhattan["Species Status"].unique()

array(['Native', 'Invasive', nan, 'Domestic', 'Exotic'], dtype=object)

In [17]:
# We only want to work with the species in Manhattan that are Native or Invasive
man_nat_inv = manhattan[(manhattan['Species Status'] == 'Native') | (manhattan['Species Status'] == 'Invasive')]
man_nat_inv.shape

(444, 22)

".value_counts()" shows us what the fequency of a value within each category

In [18]:
# To answer our question we want to know what number of spcies is native or invasive. What is the answer?
man_nat_inv['Species Status'].value_counts()

Native      441
Invasive      3
Name: Species Status, dtype: int64

### Grouping data : do all types (species status) live in all 5 boroughs?

To group data, the following commands are useful:
- data[[col1, col2]] - to work with only some columns
- data.groupby(col) - To group the data based on the values in one column
- data.groupby([col1,col2]) - To group the data based on the values in more than one column
- If we want to find out how big each group is, we can use use .size() to count the number of rows in each group.

In [66]:
# We'll create a new dataframe with only the data for 'Borough' and Species Status'
species_status = data[['Borough','Species Status']]
species_status.shape
species_status.head()

Unnamed: 0,Borough,Species Status
0,Manhattan,Native
1,Bronx,Native
2,Brooklyn,Exotic
3,Brooklyn,Domestic
4,Staten Island,Invasive


In [68]:
q = species_status.groupby('Borough')['Species Status'].value_counts()
q

Borough        Species Status
Bronx          Native             51
               Domestic           10
               Invasive            3
               Exotic              2
Brooklyn       Native            191
               Domestic           15
               Invasive            7
               Exotic              2
Manhattan      Native            441
               Domestic           18
               Exotic              7
               Invasive            3
Queens         Native            111
               Domestic           23
               Exotic              2
               Invasive            1
Staten Island  Native             71
               Domestic            9
               Invasive            1
Name: Species Status, dtype: int64

In [70]:
# Try switching 'Borough" and "Species Status" using the same command as above and see what happens.
# What answers can you infer from the analysis?
r = species_status.groupby('Species Status')['Borough'].value_counts()
r
#The code allows us to see the frequency of Domestic vs. Exotic vs. Invasive vs. Native within
#various boroughs; thus, one can infer, for example, that domestic species reside more in
#Queens and natives species prove most dominant in Manhattan.

Species Status  Borough      
Domestic        Queens            23
                Manhattan         18
                Brooklyn          15
                Bronx             10
                Staten Island      9
Exotic          Manhattan          7
                Bronx              2
                Brooklyn           2
                Queens             2
Invasive        Brooklyn           7
                Bronx              3
                Manhattan          3
                Queens             1
                Staten Island      1
Native          Manhattan        441
                Brooklyn         191
                Queens           111
                Staten Island     71
                Bronx             51
Name: Borough, dtype: int64

### Grouping data : do different species live in different properties (boroughs)?

In [72]:
# We'll create a new dataframe with only the data for 'Borough', Property' and 'Species Description'
species_desc = data[['Borough','Property','Species Description']]
species_desc.nunique()

Borough                  5
Property               193
Species Description    138
dtype: int64

In [74]:
g = species_desc.groupby(['Borough','Species Description'])['Property'].value_counts()
#g

In [76]:
h = species_desc.groupby(['Borough','Property'])['Species Description'].value_counts()
#h

In [85]:
# Try switching 'Borough", "Property" and "Species Description" using the same command as above and see what happens.
# What answers can you infer from the analysis?
a = species_desc.groupby(['Property','Borough'])['Species Description'].value_counts()
b = species_desc.groupby(['Borough','Property'])['Species Description'].value_counts()
#c = species_desc.groupby(['Borough','Property'])['Species Description'].value_counts()

In [89]:
a #a = group 'Species Description' by its corresponding values, 
  #within the 'Property' and 'Borough' columns

Property                                     Borough        Species Description            
5 East 102nd St                              Manhattan      woodcock                             1
851 Fairmont Pl                              Bronx          Rooster                              1
Abingdon Square                              Manhattan      Raccoon                              1
Alley Pond Park                              Queens         Raccoon                             12
                                                            Cat                                  1
                                                            Chicken                              1
                                                            Red-Eared Slider                     1
Alley Pond Park Tennis Courts Near Court 16  Queens         Raccoon                              1
Allison Pond Park                            Staten Island  Snapping turtle                      1
Altamont House   

In [90]:
b

Borough        Property                                     Species Description            
Bronx          851 Fairmont Pl                              Rooster                              1
               Bronx Park                                   Chicken                              1
               Concrete Plant Park                          Raccoon                              1
               Crotona Park                                 Deer                                 1
                                                            Green Heron                          1
                                                            Mallard Duck                         1
                                                            Red-Tailed Hawk                      1
                                                            Red-tailed Hawk                      1
               Ferry Point Park                             Kestrel                              1
               Gr