Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
292 lines (230 sloc) 14 KB
syncID title description dateCreated authors contributors estimatedTime packagesLibraries topics languagesTool dataProduct code1 tutorialSeries urlTitle
Data Management using National Ecological Observatory Network’s (NEON) Small Mammal Data with Accompanying Lesson on Mark Recapture Analysis
In this lesson and accompanying teaching module, students use small mammal trapping data from the National Ecological Observatory Network to understand necessary steps of data management from field collected data to data analysis. Students explore this in the context of estimating small mammal population size using the Lincoln-Peterson model.
Jim McNeil, Megan A. Jones
ggplot2, plotly
data-analysis, organisms, data-management

This teaching module is an online adaptation of a lesson that is currently in review for publication in Teaching Issues and Experiments in Ecology.

Lesson objectives, instructor notes, and downloadable versions of these materials can be found on the Overview page for this teaching module.

Background Materials

Review the following resources prior to class to prepare for the data management activity:

Data Management is Important!

Data are the backbone of scientific research and exploration, so learning how to collect and process data efficiently is a critical skill for scientific professionals. Most people are not born understanding how collect, record, enter, and analyze data, but with guidance and practice you can learn how to handle information and create world-class datasets.

Scientific organizations, especially large ones, spend a lot of time and effort determining the best ways to process data. The National Ecological Observatory Network (NEON) is one such organization that has made efficient data collection and processing a priority. NEON was designed to collect long-term ecological data on a continental-scale to help researchers address questions related to climate change, land-use, and invasive species. Data are collected at field sites called domains using standardized protocols, which allow for comparisons across large geographic ranges. Data on dozens of different variables and species will be collected every year for over 30 years, yielding a comprehensive look at ecological processes across the entire United States. Regardless of the variables being measured, the general flow for data in these projects progresses from data collection to data files and metadata files as shown in Figure 1.

Figure 1. A workflow from data collection (can be in the field, lab, or other venue) to data collection sheets (paper, app, or entry form) to data files and metadata files. Source: McNeil and Jones. In Prep.

Given the scale and scope of the NEON project, they will create literally terabytes of data every year, so the information needs to be very well organized to be useful. In this activity, you’ll get practice translating field data into a usable format for long-term archiving and then explore how real NEON data can be used to detect ecological patterns, in this case the change in small mammal abundance over a year.

Field Collection & Data Sheets

Small mammals were chosen by NEON to be bioindicators because they are present across the country in a wide variety of habitats. Their small size and short lifespan makes them sensitive to environmental changes, and they are responsible for spreading or maintaining a wide diversity of zoonotic diseases in an environment. They are also easy to safely collect as live specimens using arrays of traps like those described in the Abbreviated NEON Small Mammal Trapping Protocol. Live trapping has the advantage of being able to return the animal to their habitat without having to destructively sample. As you learned in the readings and the YouTube videos, in just a few minutes you can collect a lot of information from an individual animal. Because researchers want to reduce the stress on the animal while it is captured, it’s important to have an efficient framework for recording that data in a timely manner.

**Data Activity:** Take a few minutes now and review the NEON Small Mammal sampling datasheet and Abbreviated Sampling Protocol. See if you can identify what variable is being recorded in each of the column categories. Make sure you know what codes refer to what type of animal being collected.

Now look at the example data sheet. On the data sheet, circle the column headings for the following variables:

  • Plot ID
  • Date of capture
  • Species
  • Individual identification
  • Sex
  • Weight
  • Whether individual is a recapture?

Data Sheets & Data Files

Processing raw data sheets into a data table is only easy if the data table is well designed.

Some best practices for data sheets include:

  • Descriptive file names (with no spaces)
  • Columns (variables) & rows (data)
    • First row of descriptive headers
      • Avoid spaces or starting headers with #s
    • Data disaggregation
      • One cell per variable (e.g., toe length & tail length in separate columns)
    • Each cell has one type of data
      • Cell should only contain numbers or letters.
      • Not “3 eggs” -> Header: EggNumber , Data: 3
    • Plain text
  • Use standardized formats for date/time
    • Date: YYYY-MM-DD (Year-Month-Day)
    • Time: hh:mm:ss (use 24-hour time)
    • Date & Time: YYYY-MM-DDThh:mm:ss
  • Use full taxonomic names
    • Genus and Genus species
    • (Genus species names are italicized in writing but not in data tables in .csv format)
    • If standardized taxon abbreviations are used, make sure to include in the metadata
**Data Activity:**Thinking about the presentation and t he principles described in Borer et al. 2010 and Sutter et al. 2015, work in pairs to create an Excel data file that displays the information from your example data sheet for the variables you identified above. Make sure that your data table adheres to the best practices for data file construction that we talked about.
**Discussion questions**:
  1. Do you think the NEON data sheets are well designed to transfer the information to a data file? What makes the process easier and what makes it challenging?
  2. Imagine you were responsible for entering data from hundreds of data sheets. How would you make sure you were not making mistakes? What types of checks could you do to make sure you were correctly transferring the data?

Other considerations for useful data files:

  • Retain raw data, separate “clean” files for analysis
  • Using easily transferrable file formats & hardware
    • .csv format, not .xls
    • Internet/cloud storage & backup
    • Non-proprietary formats
  • Long-term data storage/archiving

Public Data & NEON

Another hallmark of NEON is that the data are all publically available. NEON has created an online data portal ( that allows access to all of the NEON data from any Domain across the country. This portal will serve as the long-term repository and clearinghouse for all of the NEON data in perpetuity.

We will use a series of data files downloaded from this portal to estimate the abundance of several small mammal species in different seasons (spring, summer, and fall) at NEON’s Smithsonian Conservation Biological Station field site during 2014 and 2015.


As we talked about during the presentation, metadata are another important component of collecting good data. A good metadata file can help someone unfamiliar with a data file interpret the codes and variables presented – and will help you remember what you did when you come back to the data later. It also provides an opportunity to discuss any irregularities in the dataset.

**Discussion questions**: Examine the metadata file for the NEON data file. Briefly discuss with your partner how this file could have helped you interpret the data sheet and create your own data file or perform data analysis. Be prepared to share your observations with the class.

Data Analysis

Once you have a well-designed data file, you can use that information to determine interesting patterns. One of the simplest ways that the NEON small mammal datasets can be used is to calculate abundance estimates for individual species within the plots. There are many ways to estimate abundance. One of the simplest is the Lincoln-Peterson method. This calculation uses data about the recapture of marked individuals of a species to estimate how many individuals of that species are present in a particular habitat. Because NEON small mammal protocols include marking individual animals with unique numerical identifiers, we can easily use NEON datasets to calculate small mammal abundance using Lincoln-Peterson according the following equation.


N = total population size estimate n1 = Number of individuals captured and marked in first sampling bout n2 = Number of individuals captured in second sampling bout m2 = Number of marked individuals in second sampling bout

It’s important to note that there are several assumptions that should be met for this calculation to generate an accurate estimate of population size:

  • Individuals are randomly distributed between captures
  • There is no change in the population (i.e. births, deaths, immigration, emigration) between sampling bouts
  • Marking individuals does not impact their likelihood of being captured again in the future
**Discussion question:** Lincoln-Peterson estimation depends on several assumptions about the population. Knowing what you do based on the sampling methods outlined in the Abbreviated NEON Small Mammal Trapping Protocols document, do you think any of those assumptions have been violated in this dataset? Why and what could be done to address those issues?

Working in pairs, use the workbook – NEONSmallMammalDataAbundanceWorkbook.xlsx – as a guide to calculate the Lincoln-Peterson estimation of population abundance using the following protocol:

  1. Open the data file (NEON.D02.SCBI.DP1.10072.001.mam_pertrapnight.072014to052015.csv) using Excel. You and your partner will perform the analysis for sampling bouts either in the spring (April to May 2015), the summer (July to August 2014), or the fall (September to October 2014) for samples collected in your assigned plot. Record your time frame and plotID below:

Time frame:____________________

Plot ID:____________________

Sort the data by plot ID and then by collectDate. Now you can see when trapping occurred at each plot. You will perform the Lincoln-Peterson calculation for white-footed mice (Peromyscus leucopus, abbreviated as PELE). Therefore, filter your data for the specified taxonID and plotID. Now you see only the data of interest for you Lincoln-Peterson calculation.

Identify unique individuals collected throughout your time frame. Record the following:

Number of individuals captured during the first month of the time frame (n1): _________

Number of individuals captured during the second month of the time frame (n2):_______

Number of individuals from the first month recaptured during the second month (m2):___

Use these numbers to calculate the population abundance of PELE for your site and time frame: Share your results with the class.

**Discussion questions:**

Based on everyone’s data, how does the population abundance change for white-footed mice between plots? What are some hypotheses for why this pattern may exist?

Based on everyone’s data, how does the population abundance change for white-footed mice over the year at this site? What are some hypotheses for why this pattern may exist?


Jim McNeil would like to thank Leah Card, Field Technician II - Mammalogist, Domain 2 Field Operations, for providing the original data to create this activity (this activity now uses data downloaded in September 2017). We also want to thank all members of the 2017 Spring Faculty Mentoring Network Dig Into Data for providing feedback on the concept and scope of the activity for dissemination.

You can’t perform that action at this time.