# Study Review

## Description

In 2021, on the island of Molokai, reserves managed by The Nature Conservancy and the Hawaii Department of Land and Natural Resources; as well as Kalaupapa National Park were surveyed for landbirds and landbird habitat in order to provide information for monitoring long-term trends in forest bird distribution, density, and abundance. Based on the point-transect distance sampling history on Molakai, a 3,527-ha core area was defined to assess long-term population trends since 1979, when the first surveys were conducted. Areas of Kalaupapa National Park were excluded from the core area because of an intermittent survey schedule. However, this dataset provides all of the 2021 survey data including areas inside and outside the core area, which is denoted by “Core_Area”. A summary of the 2021 survey results for the core area and for Kalauapapa National Park will be provided in separate products, such as an NPS National Resource Report and a relevant scientific journal. The 2021 survey effort is expected to be repeated in collaboration with PACN, The Nature Conservancy, the Maui Forest Bird Recovery Project, and the Hawaii Department of Land and Natural Resources every 5-6 years.

## Citation 
Judge S and Kozar K. 2023. Pacific Island Network Molokai Landbird Surveys Dataset 2021. National Park Service. Fort Collins CO https://doi.org/10.57830/2300147

## Website for additional Information
https://irma.nps.gov/DataStore/Reference/Profile/2300147

## Info

This is a public dataset from the National Park Service (NPS) regarding Pacific Island Landbirds. This collection is a combination of 14 different CSVs. There is additional data on https://irma.nps.gov/DataStore/Reference/Profile/2300107 regarding years: 2010, 2015/2016, 2019/2020 and partial data for 2011, 2018, 2012, 2017.

In [1]:
# using pandas for cleaning and manipulation
import pandas as pd

In [2]:
# creating a new species table with combined info from species and alt_species tables
species = pd.read_csv('Pacific Island Network Landbird Monitoring Dataset/tlu_Species.csv')
alt_species = pd.read_csv('Pacific Island Network Landbird Monitoring Dataset/tbl_Species_Alternate_Names.csv')
# using merge for the new table
full_species = pd.merge(species, alt_species, how="inner", on="Species_ID")

In [3]:
# creating a new events table with combined info from events and events details tables
events = pd.read_csv('Pacific Island Network Landbird Monitoring Dataset/tbl_Events.csv')
events_details = pd.read_csv('Pacific Island Network Landbird Monitoring Dataset/tbl_Event_Details.csv')
full_events = pd.merge(events, events_details, how="inner", on="Event_ID")

## Begin cleaning tables:
    - create new tables with reduced columns as determined by the er diagram ✅
    - remove null records ✅
    - reset indices ✅
    - add CSVs to dictionary ✅

In [4]:
df_dict = {}

In [5]:
# full_species
    # remove Source, TE_Status, BNA_Account, Updated_Date, Update_By, Update_Notes
    # remove null records when Scientific_Name is blank
clean_species = full_species.drop(columns=['Source', 'TE_Status', 'BNA_Account', 'Updated_Date', 'Update_By', 'Update_Notes'])
clean_species = clean_species.dropna(subset=['Scientific_Name'])
clean_species = clean_species[clean_species['Family']!= 'None']
clean_species.reset_index()
df_dict.update({'clean_species':clean_species})

In [6]:
# full_events
    # remove P1:P10, Protocol_Name, Repeat_Sample, Habitat_Date, Event_Notes, Entered_By, Updated_By, Updated_Date, 
        # Verified, Verified_By, Verified_Date, Certified, Certified_By, Certified_Date, QA_Notes
    # remove null records when Start_Date is blank
clean_events = full_events.drop(columns=['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8', 'P9', 'P10', 'Protocol_Name', 'Repeat_Sample', 'Habitat_Date', 'Event_Notes', 'Entered_By', 'Updated_By', 'Updated_Date', 'Verified', 'Verified_By', 'Verified_Date', 'Certified', 'Certified_By', 'Certified_Date', 'QA_notes'])
clean_events = clean_events.dropna(subset=['Start_Date'])
clean_events.reset_index()
df_dict.update({'clean_events':clean_events})

In [7]:
# tbl_Detections
    # remove null records when Detection is blank
tbl_Detections = pd.read_csv('Pacific Island Network Landbird Monitoring Dataset/tbl_Detections.csv')
clean_detections = tbl_Detections.dropna(subset=['Detection'])
clean_detections.reset_index()
df_dict.update({'clean_detections':clean_detections})

In [8]:
# tbl_Habitat
    # remove Canopy_Cover, Canopy_Height, Canopy_Comp, Understory_Comp, Noted_Canopy_Spp_Common, Noted_Canopy_Spp_Scientific 
    # remove null records when Event_ID is blank
tbl_Habitat = pd.read_csv('Pacific Island Network Landbird Monitoring Dataset/tbl_Habitat.csv')
clean_habitat = tbl_Habitat.drop(columns=['Canopy_Cover', 'Canopy_Height', 'Canopy_Comp', 'Understory_Comp', 'Noted_Canopy_Spp_Common', 'Noted_Canopy_Spp_Scientific'])
clean_habitat = clean_habitat.dropna(subset=['Event_ID'])
clean_habitat.reset_index()
df_dict.update({'clean_habitat':clean_habitat})

In [9]:
# tbl_Locations
clean_locations = pd.read_csv('Pacific Island Network Landbird Monitoring Dataset/tbl_Locations.csv')
df_dict.update({'clean_locations':clean_locations})

In [10]:
# tbl_Observations
    # remove null records when Species_ID or Event_ID is blank
tbl_Observations = pd.read_csv('Pacific Island Network Landbird Monitoring Dataset/tbl_Observations.csv')
clean_observations = tbl_Observations.dropna(subset=['Species_ID', 'Event_ID'])
clean_observations.reset_index()
df_dict.update({'clean_observations':clean_observations})

In [11]:
# tbl_Sites
clean_sites = pd.read_csv('Pacific Island Network Landbird Monitoring Dataset/tbl_Sites.csv')
df_dict.update({'clean_sites':clean_sites})

In [12]:
# tbl_Stations
    # remove Lat_Dir, Long_Dir, Geo_Datum, Updated_Date, Updated_by, Updated_notes
    # remove null records when Station is blank
tbl_Stations = pd.read_csv('Pacific Island Network Landbird Monitoring Dataset/tbl_Stations.csv')
clean_stations = tbl_Stations.drop(columns=['Lat_Dir', 'Long_Dir', 'Geo_Datum', 'Updated_Date', 'Updated_by', 'Updated_notes'])
clean_stations = clean_stations.dropna(subset=['Station'])
clean_stations.reset_index()
df_dict.update({'clean_stations':clean_stations})

In [13]:
# tbl_Transects
    # remove Transect_Type, Updated_Date, Updated_By, Updated_Notes
    # remove null records when Location_ID or Transect is blank
tbl_Transects = pd.read_csv('Pacific Island Network Landbird Monitoring Dataset/tbl_Transects.csv')
clean_transects = tbl_Transects.drop(columns=['Transect_Type', 'Updated_Date', 'Updated_By', 'Updated_Notes'])
clean_transects = clean_transects.dropna(subset=['Location_ID', 'Transect'])
clean_transects.reset_index()
df_dict.update({'clean_transects':clean_transects})

## Return out clean CSVs for future use

In [14]:
for csv in df_dict:
    df_dict[csv].to_csv('{}.csv'.format(csv))    