<a href="https://colab.research.google.com/github/DennisDo1708/Data-Analytics-Porfollio/blob/main/B%E1%BA%A3n_sao_c%E1%BB%A7a_3c_%5BLab%5D_US_National_Parks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](https://i.imgur.com/0AUxkXt.png)

#**BIODIVERSITY IN THE U.S. NATIONAL PARKS**

<img src='https://images.unsplash.com/photo-1593069567131-53a0614dde1d?crop=entropy&cs=tinysrgb&fm=jpg&ixlib=rb-1.2.1&q=80&raw _url=true&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2832'>

In this lab session, we will learn to apply basic Pandas sysntax to overview and explore the dataset about biodiversity in the US national parks.

The notebook demonstrates the early stage of a data analysis project. In that stage, we need to read, observe, and drill into several interest insights to see whether they are worth to be further pursued. Specifically, this notebook will observe the overall dataset and from that, compare the **Hawaii Volcanoes National Park** to other in-land, and similar latitude parks.  

**✳︎ DATASET INFORMATION**

The dataset includes two dataframe.

1/ National Parks - "Parks" table includes

- Park Code
- Park Name
- State
- Is Multi States
- Acres
- Latitude
- Longitude


2/ National Parks - "Species" lists provide information on the presence and status of species in our national parks. Each park species record includes:
- species ID,
- park name,
- taxonomic information: scientific name, one or more common names,
- record status,
- occurrence (verification of species presence in park),
- nativeness (species native or foreign to park),
- abundance (presence and visibility of species in park),
- seasonality (season and nature of presence in park),
- conservation status (species classification according to US Fish & Wildlife Service).


*Taxonomic classes have been translated from Latin to English for species categorization; order, family, and scientific name (genus, species, subspecies) are in Latin.*

*These species lists are works in progress and the absence of a species from a list does not necessarily mean the species is absent from a park.*

**Import libraries**

In [None]:
import pandas as pd

## 1. Overview 〉Parks

In [None]:
# Get overview data for the national parks in the USA
# Starting off small with this data. There is not much data, but enough interesting things to look at.

df_parks = pd.read_csv(
    "https://docs.google.com/spreadsheets/d/e/2PACX-1vTpm0N8rPD7xELxLoMo7cM74-HwSHVc1g-GgeDS2DFesbvtZKXVnzeCKBqj7NaSW5AgR_1WFzTcPcJK/pub?output=csv"
)

### Task - Print the information of the `park` dataframe.

In [None]:
# YOUR CODE HERE
df_parks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56 entries, 0 to 55
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Park Code        56 non-null     object 
 1   Park Name        56 non-null     object 
 2   State            56 non-null     object 
 3   Is Multi States  56 non-null     bool   
 4   Acres            56 non-null     int64  
 5   Latitude         56 non-null     float64
 6   Longitude        56 non-null     float64
dtypes: bool(1), float64(2), int64(1), object(3)
memory usage: 2.8+ KB


### Question 1

What is the average size in acres of all national parks?

Round your answer to 6 decimal places.

In [None]:
#YOUR CODE HERE
round(df_parks["Acres"].mean(),6)

927929.142857

### Question 2

Select all the parks inside the state of Florida (state code FL) from the parks DataFrame.

In [None]:
#YOUR CODE HERE
df_parks[df_parks["State"] == "FL"]

Unnamed: 0,Park Code,Park Name,State,Is Multi States,Acres,Latitude,Longitude
4,BISC,Biscayne National Park,FL,False,172924,25.65,-80.08
16,DRTO,Dry Tortugas National Park,FL,False,64701,24.63,-82.87
17,EVER,Everglades National Park,FL,False,1508538,25.32,-80.93


### Question 3

What is the name of the national park with park code "OLYM"?

In [None]:
#YOUR CODE HERE
df_parks[df_parks["Park Code"] == "OLYM"]["Park Name"].values[0]

'Olympic National Park'

### Question 4

What is the state code with the highest number of parks?

Exclude parks that belong to multiple states.

In [None]:
#YOUR CODE HERE
df_parks["State"].value_counts().index[0]

'AK'

### Question 5

How many parks have an area of 1,000,000 acres or larger?

In [None]:
#YOUR CODE HERE
len(df_parks[df_parks["Acres"]>100000])

36

### Question 6

How many parks have an area of 10,000 acres or smaller?

In [None]:
#YOUR CODE HERE
len(df_parks[df_parks["Acres"]<10000])

1

### Question 7

How many states have only one park? Exclude parks that belong to multiple states.

In [None]:
#YOUR CODE HERE
result = df_parks[df_parks["Is Multi States"] == False]["State"].value_counts()
print(result)

State
AK    8
CA    7
UT    5
CO    4
FL    3
WA    3
AZ    3
HI    2
SD    2
TX    2
ME    1
ND    1
VA    1
KY    1
MI    1
AR    1
MT    1
WY    1
NV    1
OH    1
OR    1
SC    1
NM    1
MN    1
Name: count, dtype: int64


In [None]:
result[result == 1]

Unnamed: 0_level_0,count
State,Unnamed: 1_level_1
ME,1
ND,1
VA,1
KY,1
MI,1
AR,1
MT,1
WY,1
NV,1
OH,1


In [None]:
result1 = df_parks[df_parks['Is Multi States'] == False]
result1 = df_parks["Is Multi States"].value_counts()[df_parks["State"].value_counts() == 1]
print(result)

Unnamed: 0,Park Code,Park Name,State,Is Multi States,Acres,Latitude,Longitude
0,ACAD,Acadia National Park,ME,False,47390,44.35,-68.21
1,ARCH,Arches National Park,UT,False,76519,38.68,-109.57
2,BADL,Badlands National Park,SD,False,242756,43.75,-102.5
3,BIBE,Big Bend National Park,TX,False,801163,29.25,-103.25
4,BISC,Biscayne National Park,FL,False,172924,25.65,-80.08
5,BLCA,Black Canyon of the Gunnison National Park,CO,False,32950,38.57,-107.72
6,BRCA,Bryce Canyon National Park,UT,False,35835,37.57,-112.18
7,CANY,Canyonlands National Park,UT,False,337598,38.2,-109.93
8,CARE,Capitol Reef National Park,UT,False,241904,38.2,-111.17
9,CAVE,Carlsbad Caverns National Park,NM,False,46766,32.17,-104.44


### Question 8

The Mississippi river is a famous river in the USA that runs from North to South into the Gulf of Mexico. It has historically been considered the boundary between two parts of the country: Eastern USA and Western USA.

The Mississippi river's longitude is about -90, and any location west of the river has a more negative longitude (for example, -95), and any location east of the river has a more positive longitude (for example, -85).

Which region, East or West of the Mississippi, has a larger number of national parks, and by how many?

A. Western USA has more parks than Eastern USA by 3 parks

B. Western USA has fewer parks than Eastern USA by 3 parks

C. Western USA has more parks than Eastern USA by 16 parks

D. Western USA has fewer parks than Eastern USA by 16 parks

E. Western USA has more parks than Eastern USA by 36 parks

F. Western USA has fewer parks than Eastern USA by 36 parks

G. None of the above.

In [None]:
#YOUR CODE HERE
a = len(df_parks[df_parks["Longitude"] > -90])
b = len(df_parks[df_parks["Longitude"] < -90])
print(a,b)

10 46


In [None]:
# @title Run to reveal answer
print("Correct answer is: E")

Correct answer is: E


## 2. Overview 〉Species

In [None]:
# Get data for species in each national park
# This can take a little while to load. There is a huge amount of data after all.
import pandas as pd
df_species = pd.read_csv(
    "https://docs.google.com/spreadsheets/d/e/2PACX-1vRufFu1z95VNMNzhupSRkRMYd-HM1XHo4VHw5gs67w7gKV31L3lR_-jlzwhOcQ1glYzfPv4rZoLLigb/pub?gid=649140552&single=true&output=csv"
)

### Question 9

How many unique categories of species are there?

In [None]:
#YOUR CODE HERE
len(df_species['Category'].unique())
df_species['Category'].nunique()

14

### Question 10

List all the parks and how many species there are in each one.

Sort by number of species from highest to lowest.

In [None]:
#YOUR CODE HERE
df_species.groupby('Park Name').size().sort_values(ascending= False)

Unnamed: 0_level_0,0
Park Name,Unnamed: 1_level_1
Great Smoky Mountains National Park,6623
Redwood National Park,6310
Shenandoah National Park,4655
Death Valley National Park,4439
Yellowstone National Park,3966
Crater Lake National Park,3760
North Cascades National Park,3363
Hawaii Volcanoes National Park,3298
Rocky Mountain National Park,3152
Great Basin National Park,2653


In [None]:
# @title Your answer should look like this:

Park Name
Great Smoky Mountains National Park    6623
Redwood National Park                  6310
Shenandoah National Park               4655
Death Valley National Park             4439
Yellowstone National Park              3966
dtype: int64

In [None]:
#YOUR CODE HERE

### Question 11

There is one species that is classified as "Extinct". What is its common name?

In [None]:
#YOUR CODE HERE
df_species[df_species['Conservation Status'] == 'Extinct']['Common Names'].values[0]

'Blue Pike'

### Question 12

List out top 10 parks with the highest number of species that have their conservation status as "Species of Concern", "Endangered", or "Threatened".

Include each park's number of species belonging to these categories as well.

In [None]:
# @title Your answer should look like this:

Park Name
Death Valley National Park             217
Redwood National Park                  156
Channel Islands National Park          136
Big Bend National Park                 132
Grand Canyon National Park             129
Great Smoky Mountains National Park    123
Hawaii Volcanoes National Park         114
Joshua Tree National Park              106
Carlsbad Caverns National Park         100
Zion National Park                      97
dtype: int64

In [None]:
#YOUR CODE HERE
df_species[df_species['Conservation Status'].isin(['Species of Concern' ,'Endangered','Threatened'])].groupby('Park Name').size().sort_values(ascending= False).head(10)
df_species[df_species['Conservation Status'].isin(['Species of Concern' ,'Endangered','Threatened'])].groupby('Park Name').size().sort_values(ascending= False)[:10]

Unnamed: 0_level_0,0
Park Name,Unnamed: 1_level_1
Death Valley National Park,217
Redwood National Park,156
Channel Islands National Park,136
Big Bend National Park,132
Grand Canyon National Park,129
Great Smoky Mountains National Park,123
Hawaii Volcanoes National Park,114
Joshua Tree National Park,106
Carlsbad Caverns National Park,100
Zion National Park,97


### Question 13

In the "Nativeness" column, convert all entries different from "Native" and "Not Native" to "Unknown".

Afterwards, what percentage of the data is listed "Unknown"? Round your answer to the nearest integer.

bước 1: trong cột Nativeness, chuyển != Native và Not Native => Unknown
bước 2: tính % Unknown / total => làm tròn tới số nguyên gần nhất.

In [None]:
#YOUR CODE HERE
df_species['Nativeness'].unique()

array(['Native', 'Not Native', 'Unknown', nan, 'Not Confirmed', 'Present'],
      dtype=object)

In [None]:
df_species_1= df_species.copy()

In [None]:
df_species_1[~df_species_1['Nativeness'].isin(['Native','Not Native'])] # HIện ra bảng
df_species_1[~df_species_1['Nativeness'].isin(['Native','Not Native'])]['Nativeness'] # chỉ hiện 1 cột Nativeness

Unnamed: 0,Species ID,Park Name,Category,Order,Family,Scientific Name,Common Names,Record Status,Occurrence,Nativeness,Abundance,Seasonality,Conservation Status
4,ACAD-1004,Acadia National Park,Mammal,Carnivora,Canidae,Vulpes vulpes,"Black Fox, Cross Fox, Eastern Red Fox, Fox, Re...",Approved,Present,Unknown,Common,Breeder,
10,ACAD-1010,Acadia National Park,Mammal,Carnivora,Mustelidae,Mustela,Weasel,In Review,Not Confirmed,Unknown,,,
63,ACAD-1063,Acadia National Park,Bird,Accipitriformes,Accipitridae,Buteo swainsoni,Swainson's Hawk,Approved,Not Confirmed,Unknown,,,
75,ACAD-1075,Acadia National Park,Bird,Anseriformes,Anatidae,Anas platyrhynchos,"Common Mallard, Mallard",Approved,Present,Unknown,Common,Breeder,
78,ACAD-1078,Acadia National Park,Bird,Anseriformes,Anatidae,Anser albifrons,"Common White-Fronted Goose, Greater White-Fron...",Approved,Present,Unknown,Occasional,Vagrant,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
117296,YOSE-2932,Yosemite National Park,Vascular Plant,Ranunculales,Ranunculaceae,Thalictrum fendleri,"Fendler's Meadowrue, Fendler's Meadow-Rue",In Review,,,,,
117342,YOSE-2978,Yosemite National Park,Vascular Plant,Rosales,Rosaceae,Potentilla glandulosa,"Gland Cinquefoil, Sticky Cinquefoil",In Review,,,,,
117347,YOSE-2983,Yosemite National Park,Vascular Plant,Rosales,Rosaceae,Potentilla gracilis,"Graceful Cinquefoil, Northwest Cinquefoil, Sle...",In Review,,,,,
117862,ZION-1410,Zion National Park,Reptile,Testudines,Testudinidae,Gopherus agassizii,Desert Tortoise,Approved,Present,Unknown,Rare,Breeder,Threatened


In [None]:
def chuyen_doi(x):
  if x not in  ['Native','Not Native']:
    return 'Unknown'
  else:
    return x

df_species_1['Nativeness'] = df_species_1['Nativeness'].apply(chuyen_doi)

In [None]:
df_species_1['Nativeness'].unique()

array(['Native', 'Not Native', 'Unknown'], dtype=object)

In [None]:
round(len(df_species_1[df_species_1['Nativeness']=='Unknown'])/len(df_species_1)*100)

27

In [None]:
df_species_1['Nativeness'].isin(['Native','Not Native']) #hiện lên những giá trị là true/false

Unnamed: 0,Nativeness
0,True
1,True
2,True
3,True
4,False
...,...
119243,True
119244,True
119245,True
119246,True


### Question 14 (EXPRESSION - 5 pts)

List the top 10 parks with the highest ratio of non-native to native species.

liệt kê ra 10 parks với số ratio non-native tới native cao nhất
bước 1: xem bảng sau đó xác định cột cần làm
bước 2: tính ratio r sau đó mới lấy top 10

In [None]:
# @title Your answer should look like this:


Park Name
Hawaii Volcanoes National Park    0.767101
Haleakala National Park           0.726528
Dry Tortugas National Park        0.539945
Acadia National Park              0.407906
Everglades National Park          0.337884
Biscayne National Park            0.320905
Shenandoah National Park          0.255593
Redwood National Park             0.235445
Hot Springs National Park         0.227706
Cuyahoga Valley National Park     0.181477
dtype: float64

In [None]:
#YOUR CODE HERE
bang = df_species.groupby(['Park Name','Nativeness']).size().unstack(fill_value = 0)

bang['ti le'] = bang['Not Native']/bang['Native']

bang['ti le'].sort_values(ascending=False).head(10)


Unnamed: 0_level_0,ti le
Park Name,Unnamed: 1_level_1
Hawaii Volcanoes National Park,0.767101
Haleakala National Park,0.726528
Dry Tortugas National Park,0.539945
Acadia National Park,0.407906
Everglades National Park,0.337884
Biscayne National Park,0.320905
Shenandoah National Park,0.255593
Redwood National Park,0.235445
Hot Springs National Park,0.227706
Cuyahoga Valley National Park,0.181477


## 3. Explore 〉The Hawaii Volcanoes National Park

As observed from the overview in Q12 and Q14, the Hawaii Volcanoes National Park is in top 10 parks with high number of species of concern *(including those listed as "Species-of-Concern", "Endangered", and "Threatened")* and a high proportion of non-native species. This can be an interesting lead for our storytelling.

Let us venture further.

<img src='https://travel.home.sndimg.com/content/dam/images/travel/fullset/2014/02/21/6d/horseback-riding_ss_005.rend.hgtvcom.616.347.suffix/1491584202364.jpeg' width=500>

*Hawaii Volcanoes National Park*

tính 2 cái
- tính size của Hawaii Volcanoes National Park
- tính giá trị trung bình của Acres của all parks

### Question 15

How is the size of Hawaii Volcanoes National Park compared to the average size of all parks?

A. Smaller, only by one-third the average size.

B. Huge, by three times the average size.

C. Standard, about the average park size.

D. 0 because climate changes and the ocean will swallow Hawaii soon.

A

In [None]:
#YOUR CODE HERE
df_parks[df_parks['Park Name']=='Hawaii Volcanoes National Park']['Acres'].values[0] < round(df_parks['Acres'].mean(),)

True

In [None]:
round(df_parks['Acres'].mean(),)

927929

In [None]:
# @title Run to reveal answer
print("Correct answer is: A")

One thing to be aware of when working with data is that the *mean* values can be distorted by the existence of *outliers* (extra-large, or tiny values). Imagine, a few employees with super high salary would raise the average salary of the whole company a lot higher. To avoid biased comparisons, *median* is a better metrics. We will learn this in the next chapter ⎯ *Clean Data*. It is the middle most value when the whole data is sorted. Hence, escapes the affect of the outliers and give better comparision.

You can try to compute the median by using `.median()`.

The Hawaii Volcanoes National Park is an average size park with the size slightly above the median. It means, it is slightly larger than 50% of other national parks in the US.

In [None]:
df_parks.describe()

Unnamed: 0,Acres,Latitude,Longitude
count,56.0,56.0,56.0
mean,927929.1,41.233929,-113.234821
std,1709258.0,10.908831,22.440287
min,5550.0,19.38,-159.28
25%,69010.5,35.5275,-121.57
50%,238764.5,38.55,-110.985
75%,817360.2,46.88,-103.4
max,8323148.0,67.78,-68.21


### Question 16

Below is an overview statement about the Hawaii Volcanoes National Park:

Despite being a mid-size national park and accounting for only 2 percent of the data, the Hawaii Volcanoes National Park:

has a large population of species of concern* with <1> species, higher than <2> of all parks in the dataset.
has a significant ratio of non-native to native species (77%). That means among every 100 known species, there are <3> non-native ones.
Fill in the blanks <1> and <2> to finish an overview statement about the Hawaii Volcanoes National Park.

*(species of concern includes those listed as "Species-of-Concern", "Endangered", or "Threatened")*

A. <1>: 123 ⎯ <2>: 67%

B. <1>: 123 ⎯ <2>: 88%

C. <1>: 114 ⎯ <2>: 67%

D. <1>: 114 ⎯ <2>: 88%

In [None]:
bang_Hawaii = df_species[df_species['Conservation Status'].isin(['Species of Concern','Endangered','Threatened'])].groupby(['Park Name'])['Species ID'].count().sort_values(ascending= False)
print(bang_Hawaii) #Hawaii có species ID = 114
#cach2:
df_concern.groupby("Park Name").count()["Species ID"].sort_values(ascending=False)[:10]

Park Name
Death Valley National Park                        217
Redwood National Park                             156
Channel Islands National Park                     136
Big Bend National Park                            132
Grand Canyon National Park                        129
Great Smoky Mountains National Park               123
Hawaii Volcanoes National Park                    114
Joshua Tree National Park                         106
Carlsbad Caverns National Park                    100
Zion National Park                                 97
Great Basin National Park                          96
Everglades National Park                           95
Saguaro National Park                              94
Capitol Reef National Park                         91
Yosemite National Park                             90
Mesa Verde National Park                           88
Yellowstone National Park                          88
Guadalupe Mountains National Park                  87
Shenandoah Nationa

In [None]:
bang_Hawaii.shape #STT của Hawaii ở số 7 => lấy (56-7)/56*100 = 88%

(56,)

In [None]:
# @title Run to reveal answer
print("Correct answer is: D")

### Question 17

Below are two overview statements about the Hawaii Volcanoes National Park:

Despite being a mid-size national park and accounting for only 2 percent of the data, the Hawaii Volcanoes National Park:

has a large population of species of concern* with <1> species, higher than <2> of all parks in the dataset.
has a significant ratio of non-native to native species (77%). That means among every 100 known species, there are <3> non-native ones.
Fill in the blank <3> to finish the second overview statement about the Hawaii Volcanoes National Park.

 *(species of concern includes those listed as "Species-of-Concern", "Endangered", or "Threatened")*

A. 40

B. 43

C. 70

D. 77

native = x
non-native = 0.77*x
native+ non-native = x+0.77x = 100
1.77x = 100 => x = 57
100 - 57 = 43

In [None]:
#YOUR CODE HERE
df_species[(df_species['Conservation Status'].isin(['Species of Concern','Endangered','Threatened']))& (df_species['Park Name']=='Hawaii Volcanoes National Park')]

Unnamed: 0,Species ID,Park Name,Category,Order,Family,Scientific Name,Common Names,Record Status,Occurrence,Nativeness,Abundance,Seasonality,Conservation Status
58536,HAVO-1008,Hawaii Volcanoes National Park,Mammal,Carnivora,Phocidae,Monachus schauinslandi,Hawaiian Monk Seal,Approved,Present,Native,Uncommon,,Endangered
58537,HAVO-1009,Hawaii Volcanoes National Park,Mammal,Chiroptera,Vespertilionidae,Lasiurus cinereus semotus,Hawaiian Hoary Bat,Approved,Present,Native,Unknown,Breeder,Endangered
58544,HAVO-1016,Hawaii Volcanoes National Park,Bird,Accipitriformes,Accipitridae,Buteo solitarius,"��Io, Hawaiian Hawk",Approved,Present,Native,Uncommon,Breeder,Endangered
58546,HAVO-1018,Hawaii Volcanoes National Park,Bird,Anseriformes,Anatidae,Branta sandvicensis,"N_N_, Hawaiian Goose",Approved,Present,Native,Uncommon,Breeder,Endangered
58554,HAVO-1026,Hawaii Volcanoes National Park,Bird,Charadriiformes,Scolopacidae,Numenius tahitiensis,"Kioea, Bristle-Thighed Curlew",Approved,Present,Native,Occasional,Migratory,Species of Concern
...,...,...,...,...,...,...,...,...,...,...,...,...,...
61461,HAVO-3933,Hawaii Volcanoes National Park,Insect,Lepidoptera,Pyralidae,Omiodes iridias,,Approved,Present,Native,Unknown,Breeder,Species of Concern
61463,HAVO-3935,Hawaii Volcanoes National Park,Insect,Lepidoptera,Pyralidae,Omiodes pritchardii,,Approved,Not Confirmed,Native,,,Species of Concern
61553,HAVO-4025,Hawaii Volcanoes National Park,Insect,Odonata,Coenagrionidae,Megalagrion nesiotes,,Approved,Present,Native,Unknown,Breeder,Endangered
61566,HAVO-4038,Hawaii Volcanoes National Park,Insect,Orthoptera,Oecanthidae,Thaumatogryllus cavicola,,Approved,Present,Native,Unknown,Breeder,Species of Concern


In [None]:
df_species[df_species['Park Name']=='Hawaii Volcanoes National Park']['Conservation Status'].value_counts()

Unnamed: 0_level_0,count
Conservation Status,Unnamed: 1_level_1
Species of Concern,65
Endangered,44
Proposed Endangered,10
Threatened,5
Under Review,1


In [None]:
df_species.shape

(119248, 13)

In [None]:
# @title Run to reveal answer
print("Correct answer is: B")

### Task: For better syntax, select the data of species of the Hawaii Volcanoes National Park only and assign to new variable `df_hawaii` for our convenience.

In [None]:
df_hawaii = # YOUR CODE HERE
df_hawaii

### Question 18

The overview report continues:

The Haiwaii Volcanoes National Park is rich in biodiversity. The park is home to many different categories of species. Among them, <4> are the majority.

Fill in the blank <4> to finish the statement.

A. Invertebrates, Fungi, Fish

B. Insects, Vascular Plants, Invertebrates

C. Vascular Plants, Birds, Insects

D. Slug/Snails, Reptiles, Spiders

In [None]:
#YOUR CODE HERE
df_hawaii["Category"].value_counts()

In [None]:
# @title Run to reveal answer
print("Correct answer is: B")

### Task ⎯ Understanding your data is crucial. What is an "Invertebrate"?

A. A type of chameleon that can "invert" the color of its surrounding and reflect it on their skin.

B. A type of crustacean that can convert its shell to multiple form, like Doctor Strange in the Multiverse.

C. A large group of land plants that have lignified tissues for conducting water and minerals throughout the plant.

D. A type of cold-blood species of animals with no backbones.

In [None]:
#YOUR CODE HERE

In [None]:
# @title Run to reveal answer
print("Answer: D")

### Question 19

The overview report continues:

The species of Hawaii reflect the park's southern environment.Compared to Yellowstone National Park, another volcano park but located inland and in a temperate zone, both have a large population of insects and vascular plants.However, the Hawaii Volcanoes National Park is home to many invertebrates, nonvascular plants, spiders, and scorpions, while Yellowstone Park is the land of <5>.

Fill in the blank <5> to finish the statement.

A. Insect and Vascular Plant

B. Mammals and Fish

C. Reptiles and Fungi

D. Birds and Invertebrates

<img src='https://www.gannett-cdn.com/-mm-/b448c4bd0e6486684d5809dd0c1f1ad5460cad27/c=0-254-4928-3026/local/-/media/2017/05/18/USATODAY/usatsports/B9327681055Z.1_20170518174842_000_GCPI2D0G9.1-0.jpg?width=3200&height=1800&fit=crop&format=pjpg&auto=webp' width=500>

*Yellowstone National Park*

In [None]:
#YOUR CODE HERE
df_species[df_species["Park Name"] == "Yellowstone National Park"][
    "Category"
].value_counts()

In [None]:
# @title Run to reveal answer
print("Correct answer is: A")

### Question 20

The overview report continues:

Home to the spiders and scorpions, Hawaii Volcanoes National Park's population of these eight-legged creatures is close to <6>'s, and only behind <7>.All four national parks are located in the Southern states of US. They alone account for more than 64% of all species of spiders and scorpions in the whole USA.

Fill in the blanks <6> and <7> to finish the statement.

A. <6>: North Cascades National Park ⎯ <7>: Death Valley National Park & Cuyahoga Valley National Park

B. <6>: Great Smoky Mountains National Park ⎯ <7>: Grand Canyon National Park & Joshua Tree National Park

C. <6>: Badlands National Park ⎯ <7>: Congaree National Park & Redwood National Park

D. <6>: North Cascades National Park ⎯ <7>: Grand Canyon National Park & Joshua Tree National Park

In [None]:
#YOUR CODE HERE
df_species[df_species["Category"] == "Spider/Scorpion"].groupby(
    "Park Name"
).size().sort_values(ascending=False)

In [None]:
# @title Run to reveal answer
print("Correct answer is: B")

## ✳︎ ⎯ END

This is the end of the lab session. In this lab:
- We have practiced using basic Pandas syntax to load and overview the two dataframe `park` and `species`.
- We have also discover some interesting findings about the Hawaii Volcanoes National Park.

This briefly demonstrates the early stage of making an analysis report. Please feel free to continue this notebook as you wish, to discover more insights about the National Parks in the United States. A recommended way is to continue looking into other parks that lie around the same latitude zone as Hawaii. That way, we can extend our comparison from tropical vs. temperate to tropical island vs. tropical inland.

In [None]:
# The Hawaii National Park is more diversed in term of species,
# compared to other parks that sit on the same latitude.

df_south = df_parks.sort_values(by="Latitude")[:10].sort_values(
    by="Acres", ascending=False
)
df_south

Unnamed: 0,Park Code,Park Name,State,Is Multi States,Acres,Latitude,Longitude
17,EVER,Everglades National Park,FL,False,1508538,25.32,-80.93
3,BIBE,Big Bend National Park,TX,False,801163,29.25,-103.25
28,HAVO,Hawaii Volcanoes National Park,HI,False,323431,19.38,-155.2
4,BISC,Biscayne National Park,FL,False,172924,25.65,-80.08
46,SAGU,Saguaro National Park,AZ,False,91440,32.25,-110.5
26,GUMO,Guadalupe Mountains National Park,TX,False,86416,31.92,-104.87
16,DRTO,Dry Tortugas National Park,FL,False,64701,24.63,-82.87
9,CAVE,Carlsbad Caverns National Park,NM,False,46766,32.17,-104.44
27,HALE,Haleakala National Park,HI,False,29094,20.72,-156.17
11,CONG,Congaree National Park,SC,False,26546,33.78,-80.78


In [None]:
df_south_sp = df_species[df_species["Park Name"].isin(df_south["Park Name"].values)]
df_south_sp.groupby("Park Name").size().sort_values(ascending=False)

Unnamed: 0_level_0,0
Park Name,Unnamed: 1_level_1
Hawaii Volcanoes National Park,3298
Haleakala National Park,2580
Congaree National Park,2321
Big Bend National Park,2269
Everglades National Park,2084
Saguaro National Park,1834
Guadalupe Mountains National Park,1746
Biscayne National Park,1726
Carlsbad Caverns National Park,1536
Dry Tortugas National Park,848


<img src='https://upload.wikimedia.org/wikipedia/commons/f/f6/Canyon%2C_Rio_Grande%2C_Texas.jpeg' width=500>

*Big Bend National Park*

The world is yours now. Continue as you wish.

In [None]:
# YOUR CODE HERE