# Biodiversity & Species Conservation in National Parks

The data analysed in this project comes from the National Parks Service and contains information about endangered species in different parks. The general objective is to analyse the conservation statuses of these species and investigate if there are any patterns or themes to the types of species that become endangered. 

The data is divided in two files, which contain the following information:

* **species_info.csv** - contains data about different species and their conservation status
    * **category** - class of animal
    * **scientific_name** - the scientific name of each species
    * **common_name** - the common names of each species
    * **conservation_status** - each species' current conservation status


* **observations.csv** - holds recorded sightings of different species at several national parks for the past 7 days
    * **scientific_name** - the scientific name of each species
    * **park_name** - Park where species were found
    * **observations** - the number of times each species was observed at the park

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
# Load files
species = pd.read_csv('species_info.csv')
obs = pd.read_csv('observations.csv')


In [21]:
# Print first rows of the 'species' dataframe
species.head(10)

Unnamed: 0,category,scientific_name,common_names,conservation_status
0,Mammal,Clethrionomys gapperi gapperi,Gapper's Red-Backed Vole,
1,Mammal,Bos bison,"American Bison, Bison",
2,Mammal,Bos taurus,"Aurochs, Aurochs, Domestic Cattle (Feral), Dom...",
3,Mammal,Ovis aries,"Domestic Sheep, Mouflon, Red Sheep, Sheep (Feral)",
4,Mammal,Cervus elaphus,Wapiti Or Elk,
5,Mammal,Odocoileus virginianus,White-Tailed Deer,
6,Mammal,Sus scrofa,"Feral Hog, Wild Pig",
7,Mammal,Canis latrans,Coyote,Species of Concern
8,Mammal,Canis lupus,Gray Wolf,Endangered
9,Mammal,Canis rufus,Red Wolf,Endangered


In [7]:
# Print first rows of the 'obs' dataframe
obs.head()

Unnamed: 0,scientific_name,park_name,observations
0,Vicia benghalensis,Great Smoky Mountains National Park,68
1,Neovison vison,Great Smoky Mountains National Park,77
2,Prunus subcordata,Yosemite National Park,138
3,Abutilon theophrasti,Bryce National Park,84
4,Githopsis specularioides,Great Smoky Mountains National Park,85


In [13]:
# Explore the data types and potential missing values from the species df
species.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5824 entries, 0 to 5823
Data columns (total 4 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   category             5824 non-null   object
 1   scientific_name      5824 non-null   object
 2   common_names         5824 non-null   object
 3   conservation_status  191 non-null    object
dtypes: object(4)
memory usage: 182.1+ KB


In [15]:
# Number of values in the 'category' variable in species df
species.category.value_counts()

category
Vascular Plant       4470
Bird                  521
Nonvascular Plant     333
Mammal                214
Fish                  127
Amphibian              80
Reptile                79
Name: count, dtype: int64

In [17]:
# Number of values in the 'conservation_status' variable in species df
species.conservation_status.value_counts()

conservation_status
Species of Concern    161
Endangered             16
Threatened             10
In Recovery             4
Name: count, dtype: int64

In [25]:
# Summary statistics from the species df
species.describe(include='all')

Unnamed: 0,category,scientific_name,common_names,conservation_status
count,5824,5824,5824,191
unique,7,5541,5504,4
top,Vascular Plant,Castor canadensis,Brachythecium Moss,Species of Concern
freq,4470,3,7,161
