# Brown Scholars Internship 2019-2020 - Urban Wildlife in NYC

New York City is home to many diverse species of wildlife that arrived or existed long before humans settled here.
In October 2016, Mayor Bill de Blasio launched WildlifeNYC, a citywide education and awareness campaign teaching New Yorkers how to live safely and responsibly alongside wild animals including deer, raccoons, and coyotes.

Urban wildlife is any wild animal that lives in an urban environment, such as New York City. Urban wildlife includes birds, mammals, reptiles, fish and amphibians. Some urban wildlife is native, like eastern grey squirrels, while some are non-native, like mute swans. Domesticated and companion animals, like dogs, exotic pets, and farm animals are not considered urban wildlife. Domesticated but feral animals like pigeons and stray cats are also not considered urban wildlife.

Data source: https://data.cityofnewyork.us/Environment/Urban-Park-Ranger-Animal-Condition-Response/fuhs-xmg2

First we'll start by importing packages we'll use

In [54]:
import pandas as pd

and then import the data. For now, the csv file should be in the same directory as the notebook. Notice that we are importing the date and time info as type 'datetime'

In [55]:
df = pd.read_csv('Urban_Park_Ranger_Animal_Condition_Response.csv',
                   parse_dates = ['Date and Time of initial call', 'Date and time of Ranger response'])

Note: if you want to export the data, use df.to_csv(filename), where df is the name of your dataframe and filename is the name of the file where you want to save the data. The csv file will get created in the same directory as the notebook.

#### Step 1: Viewing and inspecting the data

Now that the data is loaded, let's check it out. To learn more about what the data looks like we can try the following commands:
- data.head( ) - to look at the first 5 rows
- data.tail( ) - to look at the last 5 rows
- data.shape - to get the number of rows and columns
- data.info( ) - to get the names of the columns, how many non null pieces of data is in each column, and the type of data in each column
- data.nunique( ) - to get how many unique values are in each column
- data.max() - to get the highest value in each column
- data.min() to get the lowest value in each column
- data['col'].value_counts() - to get how many unique values are in a particular column

In [56]:
df.head()

Unnamed: 0,Date and Time of initial call,Date and time of Ranger response,Borough,Property,Location,Species Description,Call Source,Species Status,Animal Condition,Duration of Response,...,311SR Number,Final Ranger Action,# of Animals,PEP Response,Animal Monitored,Rehabilitator,Hours spent monitoring,Police Response,ESU Response,ACC Intake Number
0,2019-06-12 09:20:00,2019-06-12 09:20:00,Manhattan,Washingtom Square Park,on Sidewalk accross from the park near 10 Wash...,Red-tailed Hawk,Other,Native,,0.5,...,,Advised/Educated others,1.0,False,False,,,False,False,
1,2019-06-11 16:15:00,2019-06-11 16:20:00,Bronx,Van Cortlandt Park,Adjacent to VC Golf House,Canada Goose,Public,Native,Injured,0.5,...,1-1-1733837211,Unfounded,1.0,False,False,,,False,False,
2,2019-06-10 13:00:00,2019-06-10 13:30:00,Brooklyn,Irving Square Park,Northwest corner of the park,Parrot,Public,Exotic,,1.5,...,,Unfounded,1.0,False,False,,,False,False,
3,2019-06-09 09:30:00,2019-06-09 10:00:00,Brooklyn,Parade Ground,Prospect Park Parade Grounds near Tennis Center,Chicken,Central,Domestic,Healthy,3.0,...,1-1-1730643971,ACC,1.0,False,False,,,False,False,65352
4,2019-06-09 12:50:00,2019-06-09 12:55:00,Staten Island,Silver Lake Park,Bridge,Red-Eared Slider,Employee,Invasive,Injured,2.0,...,1-1-1724490913,ACC,2.0,True,False,,,False,False,65379 65380


In [57]:
#df.tail()

In [58]:
#df.shape

In [93]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 982 entries, 0 to 981
Data columns (total 22 columns):
Date and Time of initial call       982 non-null datetime64[ns]
Date and time of Ranger response    982 non-null datetime64[ns]
Borough                             982 non-null object
Property                            982 non-null object
Location                            918 non-null object
Species Description                 969 non-null object
Call Source                         982 non-null object
Species Status                      968 non-null object
Animal Condition                    758 non-null object
Duration of Response                982 non-null float64
Age                                 982 non-null object
Animal Class                        982 non-null object
311SR Number                        573 non-null object
Final Ranger Action                 982 non-null object
# of Animals                        974 non-null float64
PEP Response                        9

In [60]:
#df.nunique()

In [61]:
#data.max()

In [62]:
#data.min()

In [63]:
#df["311SR Number"].value_counts()

#### Step 2: Cleaning the data

By now, we should have a sense of which columns may have null values. It may be or not be ok for a column to have null values. One way to replace null values with some other value is using, use data.fillna(x) where x is the value we want instead of the null.

In addition, the data may not be in 'standard' form, that is for example, having the strings 'yes', 'YES', and 'Yes' all be values contained in the same column. To verify that the data in a column is in 'standard' form, we can use data['column_name'].unique(). For example, what happens when we try data['Species Description'].unique()? What happens when we try data['Species Status'].unique()? To replace values, we can use data['column name'].replace('yes','Yes') to replace all 'yes' values with 'Yes' values (for example).

In [65]:
#meanie = df["Duration of Response"].mean()

In [69]:
#df["Duration of Response"].fillna(meanie)

In [73]:
df["Species Description"].unique()

array(['Red-tailed Hawk', 'Canada Goose', 'Parrot', 'Chicken',
       'Red-Eared Slider', 'Rd-tailed Hawk', 'Cormorant', 'Raccoon',
       'Rooster', 'Dove', 'Snapping turtle', 'Monk Parakeet',
       'Mallard Duck', 'Gull', 'Snake', 'Snapping Turtle', 'Opossum',
       'Red-eyed Vireo', 'turtle', 'Squirrel', 'American Robin',
       'Domestic Duck', 'Pigeon', 'Bat', 'sparrow', nan,
       'Domestic Rabbit', 'Fledgling (possibly Starling)', 'Guineafowl',
       'American Goldfinch', 'Bird/Unspecified Species',
       'Freshwater Fish and Turtles', 'Turtle/ Unspecified species',
       'Cat', 'silver-haired bat', 'Seal', 'Egret', 'Skunk',
       'Painted Turtle', 'Northern Gannet', 'Parakeet', 'Cockatiel',
       'Saw whet owl', 'Dog', 'Robin', 'Big Brown Bat', 'Corn Snake',
       'Mute Swan', 'Coopers Hawk', 'Harbor Porpoise',
       'Boa Constrictor Snake', 'Turkey', 'Deer', 'Dolphin', 'Coyote',
       'Swan', 'Harbor Seal', 'Frog', 'Woodcock', 'Brant Goose', 'Falcon',
       "Cooper

In [76]:
#df["Species Status"].unique

#### Step 3: Exploring the data

Once our data is in the shape that we need it to be, we can start exploring it. To learn more about what the data can tell us we'll try filtering and grouping it, also computating some basic statistics and making graphs. The decisions that we make when doing all this can be based on our knowledge of the topic, our curiosity to learn from the data, as well as informed by what we learn from the data (or all three!).

##### Filtering data

To filter data, the following commands are useful:

- data[col] - to work only with one column
- data[data[col] > 7] - to extract rows that meet a particular criteria
- data[(data[col] > 0.5) & (data[col] < 0.7)] - to extract rows that meet more than one criteria

In [89]:
"Duration of Response"
data[(data["Duration of Response"] > 0.5) & (data["Duration of Response"] < 0.7)]

Unnamed: 0,Date and Time of initial call,Date and time of Ranger response,Borough,Property,Location,Species Description,Call Source,Species Status,Animal Condition,Duration of Response,...,311SR Number,Final Ranger Action,# of Animals,PEP Response,Animal Monitored,Rehabilitator,Hours spent monitoring,Police Response,ESU Response,ACC Intake Number


##### Grouping data

To group data, the following commands are useful:
- data[[col1, col2]] - to work with only some columns
- data.groupby(col) - To group the data based on the values in one column
- data.groupby([col1,col2]) - To group the data based on the values in more than one column
- If we want to find out how big each group is, we can use use .size() to count the number of rows in each group.

In [95]:
#df[["Duration of Response", "Date and time of Ranger response"]].head()

In [105]:
Testing = df.groupby(["Species Description", "Age"])

In [106]:
Testing.size()

Species Description              Age             
2 black back gulls and 1 pigeon  Adult                1
Alligator snapping turtle        Adult                1
American Goldfinch               Adult                1
American Oystercatcher           Adult                1
American Robin                   Adult                1
                                 Infant               1
Banded Rock Pigeon               Adult                1
Bat                              Adult                4
Bearded dragon                   Adult                1
Big Brown Bat                    Adult                1
Bird/Unspecified Species         Infant               1
Black Racer Snake                Adult                1
Black Skimmer                    Juvenile             1
Black backed gull                Adult                1
Blue Jay                         Adult                2
                                 Juvenile             1
Boa Constrictor Snake            Adult                

##### Basic statistics

To compute some basic statistics we can use:
- data.describe() - summary statistics for numerical columns
- data.mean() - mean of all columns
- data.median() - median of each column
- data.std() - standard deviation of each column
- data.corr() - to get the correlation between columns

##### Making graphs

To visualize categorical data we can use:
- g = data['col'].value_counts()
- g.plot(x=g.index, y=g.values, kind = 'bar')or g.plot.pie(y='Borough')