Import libraries and set path for cleaned CSV file.

In [9]:
import pandas as pd
import numpy as np

OUTPUT_PATH = "../data/cleaned scavenger data.csv"

Read the data, excluding the columns Cara and I used for bookkeeping. I will index by species names rather than by row numbers to make working with the DataFrame more intuitive.

In [10]:
columns = [
    "Species name",
    "Scientific name",
    "Diet",
    "Extent of occurrence",
    "BirdLife Extent of occurrence",
    "Body size",
]

scavengers = pd.read_csv(
    "../data/MacroEco scavenger data.csv", usecols=columns, index_col="Species name"
)
scavengers.head()

Unnamed: 0_level_0,Scientific name,Diet,Extent of occurrence,BirdLife Extent of occurrence,Body size
Species name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Black vulture,Coragyps atratus,Obligate,42593865.0,44300000.0,2200
Turkey vulture,Cathartes aura,Obligate,52285085.0,47100000.0,2500
Lesser yellow headed vulture,Cathartes burrovianus,Obligate,19989926.0,19600000.0,950
Greater yellow headed vulture,Cathartes melambrotus,Obligate,7481821.0,7330000.0,1650
King vulture,Sarcoramphus papa,Obligate,17390218.0,22600000.0,3800


Find and display any rows that have `NaN` (Not a Number) values.

In [11]:
scavengers.loc[scavengers.isnull().any(axis=1)]

Unnamed: 0_level_0,Scientific name,Diet,Extent of occurrence,BirdLife Extent of occurrence,Body size
Species name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Hooded crow,Corvus cornix,Facultative,21136393.72,,650


BirdLife currently lists no data for the extent of occurrence (EOO) for the hooded crow, so let's drop it from our dataset.

In [12]:
scavengers.drop(["Hooded crow"], inplace=True)

Let's add new columns that have the log-transformed values for EOO and body size. We will use the base 10 system, because we are dealing with spatial ranges and it is easier to interpret distances in $log_{10}$ than the natural log, which uses base $e$.

In [13]:
scavengers["log Extent of occurrence"] = np.log10(scavengers["Extent of occurrence"])
scavengers["log Body size"] = np.log10(scavengers["Body size"])

scavengers.head()

Unnamed: 0_level_0,Scientific name,Diet,Extent of occurrence,BirdLife Extent of occurrence,Body size,log Extent of occurrence,log Body size
Species name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Black vulture,Coragyps atratus,Obligate,42593865.0,44300000.0,2200,7.629347,3.342423
Turkey vulture,Cathartes aura,Obligate,52285085.0,47100000.0,2500,7.718378,3.39794
Lesser yellow headed vulture,Cathartes burrovianus,Obligate,19989926.0,19600000.0,950,7.300811,2.977724
Greater yellow headed vulture,Cathartes melambrotus,Obligate,7481821.0,7330000.0,1650,6.874007,3.217484
King vulture,Sarcoramphus papa,Obligate,17390218.0,22600000.0,3800,7.240305,3.579784


Export to CSV.

In [15]:
scavengers.to_csv(OUTPUT_PATH)