<img src="../../predictioNN_Logo_JPG(72).jpg" width=200>

---

### Amur Leopard Analysis

### Working with Structured Data
### Introduction to Data Science

### Last Updated: 12/10/2022

In [42]:
import pandas as pd

In [97]:
# set path to datafile
PATH_TO_DATA = '../datasets/amur_leopards.csv'
PATH_TO_OUTFILE = './amur_leopards_final.csv'

**Source of data**

Transboundary cooperation improves endangered species monitoring and conservation actions: A case study of the global population of Amur leopards. Vitkalova et. al. Conservation Letters. 2018;11:e12574.

**Notes about data**

The data represents the counts and sex of Amur leopards captured by camera trap surveys in China and Russia. In some cases, the China and Russia totals exceed China & Russia, and this is because some leopards were observed in both countries. The importance of understanding what the data means cannot be overstated.

### Load data from CSV

In [44]:
df = pd.read_csv(PATH_TO_DATA)
df

Unnamed: 0,year,location_captured,females,males,cubs,unknown_sex,total
0,2014,China,14,11,2,0,27
1,2014,Russia,25,21,3,2,51
2,2014,China & Russia,33,24,5,2,64
3,2015,China,10,12,0,0,22
4,2015,Russia,24,20,8,3,55
5,2015,China & Russia,31,25,8,3,67


Extract the df pieces to illustrate

In [58]:
df.values

array([[2014, 'China', 14, 11, 2, 0, 27],
       [2014, 'Russia', 25, 21, 3, 2, 51],
       [2014, 'China & Russia', 33, 24, 5, 2, 64],
       [2015, 'China', 10, 12, 0, 0, 22],
       [2015, 'Russia', 24, 20, 8, 3, 55],
       [2015, 'China & Russia', 31, 25, 8, 3, 67]], dtype=object)

In [59]:
type(df.values)

numpy.ndarray

In [60]:
df.columns

Index(['year', 'location_captured', 'females', 'males', 'cubs', 'unknown_sex',
       'total'],
      dtype='object')

In [61]:
type(df.columns)

pandas.core.indexes.base.Index

In [62]:
list(df.columns)

['year',
 'location_captured',
 'females',
 'males',
 'cubs',
 'unknown_sex',
 'total']

In [50]:
df.index

RangeIndex(start=0, stop=6, step=1)

In [51]:
type(df.index)

pandas.core.indexes.range.RangeIndex

#### Subsetting

In [63]:
df.loc[0:2]

Unnamed: 0,year,location_captured,females,males,cubs,unknown_sex,total
0,2014,China,14,11,2,0,27
1,2014,Russia,25,21,3,2,51
2,2014,China & Russia,33,24,5,2,64


In [64]:
df['year']

0    2014
1    2014
2    2014
3    2015
4    2015
5    2015
Name: year, dtype: int64

In [65]:
df[['year','location_captured']]

Unnamed: 0,year,location_captured
0,2014,China
1,2014,Russia
2,2014,China & Russia
3,2015,China
4,2015,Russia
5,2015,China & Russia


In [66]:
df[df.location_captured == 'China']

Unnamed: 0,year,location_captured,females,males,cubs,unknown_sex,total
0,2014,China,14,11,2,0,27
3,2015,China,10,12,0,0,22


In [67]:
df.location_captured == 'China'

0     True
1    False
2    False
3     True
4    False
5    False
Name: location_captured, dtype: bool

In [72]:
type(True)

bool

How can we determine which locations captured less than 15 females in 2015?
We can filter on multiple conditions to answer this question.

In [84]:
df[(df.females < 15) & (df.year == 2015)]

Unnamed: 0,year,location_captured,females,males,cubs,unknown_sex,total
3,2015,China,10,12,0,0,22


Return only the column we want

In [88]:
df[(df.females < 15) & (df.year == 2015)]['location_captured']

3    China
Name: location_captured, dtype: object

Create a new column

In [73]:
df['females_and_males'] = df['females'] + df['males']
df

Unnamed: 0,year,location_captured,females,males,cubs,unknown_sex,total,females_and_males
0,2014,China,14,11,2,0,27,25
1,2014,Russia,25,21,3,2,51,46
2,2014,China & Russia,33,24,5,2,64,57
3,2015,China,10,12,0,0,22,22
4,2015,Russia,24,20,8,3,55,44
5,2015,China & Russia,31,25,8,3,67,56


In [91]:
df['females_and_males'] = df.females + df.males
df

Unnamed: 0,year,location_captured,females,males,cubs,unknown_sex,total,females_and_males
0,2014,China,14,11,2,0,27,25
1,2014,Russia,25,21,3,2,51,46
2,2014,China & Russia,33,24,5,2,64,57
3,2015,China,10,12,0,0,22,22
4,2015,Russia,24,20,8,3,55,44
5,2015,China & Russia,31,25,8,3,67,56


In [77]:
df.females_and_males = df.females + df.males

  df.females_and_males = df.females + df.males


#### Sorting

In [94]:
df.sort_values(['females_and_males'], ascending=False, inplace=True)
df

Unnamed: 0,year,location_captured,females,males,cubs,unknown_sex,total,females_and_males
2,2014,China & Russia,33,24,5,2,64,57
5,2015,China & Russia,31,25,8,3,67,56
1,2014,Russia,25,21,3,2,51,46
4,2015,Russia,24,20,8,3,55,44
0,2014,China,14,11,2,0,27,25
3,2015,China,10,12,0,0,22,22


### Output to CSV

In [98]:
df.to_csv(PATH_TO_OUTFILE)