# Clean species list
This notebook is used to read in the original 2021 European Red Species List XLSX data and clean it by removing any unnecessary columns and rows. The cleaned data is then saved as a CSV file.

The original data Excel file was downloaded from the European Red List of Birds 2021 website (see [link](https://www.iucnredlist.org/resources/erlob)).

## Setup

In [1]:
import pandas as pd

In [2]:
raw_species_list_path = (
    "/workspaces/chirpnet/resources/raw_species_list/ERLoB2021_categories.xlsx"
)
cleaned_species_list_path = (
    "/workspaces/chirpnet/resources/species_list/european_species_list.csv"
)

## Clean data

In [3]:
# The raw 2021 European Red List species list has the information on the species on
# sheet "2021_European Red List" of the excel file with the first two rows containing
# unneeded clarifications.
species_list = pd.read_excel(  # type: ignore
    raw_species_list_path,
    sheet_name="2021_European Red List",
    skiprows=2,
)

# We only need the common name and scientific name columns for further processing
species_list = species_list[["Common Name", "Scientific Name"]]

species_list = species_list.sort_values("Common Name", ignore_index=True)  # type: ignore

## Store results

In [4]:
species_list.to_csv(cleaned_species_list_path, index=False)