In this project, we are going to make some data analysis on the preservation conditions of the species and examine if there are any patterns or topics to the types of species that become endangered.
In the context of this study, species are one or more populations of creatures that are reproductively similar and consist of a distinct form of animal or plant. As data scientist, we have been told to interpret data from the National Parks Service about endangered species in different parks.
During this project, we will analyze, prepare, and plot data to answer in a meaningful way some questions such as:
- What the distribution of endangered species on a category-by-category basis and by conservation status, over the entire dataset?
- What is the proportions of protected and not protected species are in the data?
- What category of species are most endangered?
- What is the statistical significance in the variations of the conservation status between species?
- Which species are the most endangered?
- Which Park count the highest number of endangered species?
- import python librairies
- load in the datasets
- Explore the datasets:
- visualize the data
- analyse the graphs
To develop this project, we got two csv files from Codecademy:
- The
species_info.csv
file has 5824 rows and 4 column variables named'category'
,'scientific_name'
,'common_names'
, and'conservation_status'
. - The
observations.csv
file has 23296 rows and 3 column variables named'scientific_name'
,'park_name'
, and'observations'
.