<a href="https://colab.research.google.com/github/KaraNuss/coding-intro/blob/main/Getting_Started.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Part 1: First Steps

We will use Google Colab Notebooks to explore how we can use computer coding to analyze data. We will work with the programming language R, which is used extensively by biologists for statistics and data visualizations.

Any notebook I share will already be set up for R. If you create your own notebook, you must change this manually. To do so, go to the **Runtime** menu at the top of this screen, select **Change runtime type** and use the dropdown menu to switch from Python 3 to **R**.

The cell below makes available all of the tools we need. Click anywhere in the cell (the light gray box) and **Run** the cell by clicking the **Play arrow** or by holding down **Control/Enter (PC)** or **Command/Enter (Mac)**.

You may get a warning that the Notebook was not authored by Google but by another user (someone with an neiu.edu email address). Click **Run anyway** to continue.

Once the cell runs, you will see some information that might not make much sense yet. That's okay! You may also see a section labeled Conflicts with some red x's. This looks scary but it is fine to ignore in this case.

In [3]:
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.4     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


One of the great things about using R (or any other programming language) is tha we can easily save our code in a file to reuse in the future. The Google Colab Notebooks are easy to work with because they save directly into a folder within your Google Drive.

To be able to share your completed files with me and receive points, please do the following:


1.   In a separate tab, open your Google Drive. You can do this by going to your NEIU email and clicking the symbols of dots in the upper right.
2.   You should now see a folder in Drive called **Colab Notebooks**. Click on that.
1.   Near the top of your screen, you should see Colab Notebooks with a little triangle next to it. Click that triangle and select **Share** in the drop-down menu and click on **Share** in the sub-menu.
2.   In the pop-up box, **enter your instructor's email address** (k-nuss@neiu.edu) and click **Done**.

Great. we are all set. Note the the Google Colab notebook saves automatically as you work. Occaionally, you may want to create a copy of an existing notebook so you can use it for a new purpose. To do that, go to the **File** menu in the top left. Selct the option to **Save a copy in Drive**.





Okay, let's get started with some coding. Soon, we will use R to do complex calculations for us. For now, **run** each cell below to look at some simple examples.

In [None]:
4 * (3 + 2)/2

In [None]:
3 * 2

In [None]:
3 ** 2

In [None]:
3 ** 3

### Question 1
Look at the cells above. What does R do when you use a single asterisk symbol? What happens when you use **?

*Replace this text with your answer.*

**A few of other things to note about R:**


1.   It follows the standard order of operations (PEDMAS) when performing calculations.
2.   We can assign values to a name by using an arrow <- (less then followed by a dash). A name has to start with a letter and cannot have spaces.
1.   We can also use operators to compare values.

Run the cells below to explore.





In [None]:
a <- 5

In [None]:
b <- 10

In [None]:
b > a

answer = a + b

### Question 2: What should answer equal?

You can double-check yourself by creating a new code cell and typing answer so that R **returns** the value assigned to this name.

*Replace this text with the value of answer.*

## Part 2: Working with Data

We want to import some data that we can begin to analyze. The code below will access a data file I posted to a website. Soon, you will learn how to upload your own data files from Excel or Google Sheets. Run the cell below. Note that you won't see any **output** once the cell has run, but you will see a number in the brackets on the left side.

In [None]:
birds <- read.csv(url("https://raw.githubusercontent.com/KaraNuss/coding-intro/refs/heads/main/bird_observations.csv"))

In the cell above, we brought in our dataset and saved it under the name birds. To do that, we used the assignment arrow you saw above. Notice that we can assign individual values or whole datasets to a name. You can now **add** a new cell for Code, by clicking on **+ Code** at the top left. Type birds (make sure not to capitalize) and **run** the cell to see our dataset.

In [9]:
birds2 <- read.csv(url("https://raw.githubusercontent.com/KaraNuss/coding-intro/refs/heads/main/BirdFeederData.csv"))

In [10]:
# Make sure to cut out this  line in the student version.
birds2

Observation,Day,Time,Species,Count
<int>,<int>,<chr>,<chr>,<int>
1,1,am,house finch,11
1,1,am,chickadee,2
1,1,am,house sparrow,13
1,1,am,cardinal,0
2,1,pm,house finch,6
2,1,pm,chickadee,1
2,1,pm,house sparrow,8
2,1,pm,cardinal,0
3,2,am,house finch,18
3,2,am,chickadee,7


Wow, that's cool! This example dataset is pretty simple and small. Often we will have large and complex datasets where we can'st see everything at once. Luckily, there are some great tools to give us an overview of our data.

### Question 3
Run the cell below and then describe what **glimpse** does. What information does it give us?

*Replace this text with your answer.*

In [None]:
glimpse(birds)

Rows: 10
Columns: 4
$ Day         [3m[90m<int>[39m[23m 1, 1, 2, 2, 3, 3, 4, 4, 5, 5
$ Time        [3m[90m<chr>[39m[23m "morning", "afternoon", "morning", "afternoon", "morning",…
$ Woodpeckers [3m[90m<int>[39m[23m 6, 3, 8, 5, 7, 0, 4, 6, 9, 5
$ BlueJays    [3m[90m<int>[39m[23m 1, 2, 4, 6, 3, 7, 2, 8, 0, 2


## Descriptive Statistics
Possibly move this to a separate notebook.

Don't want to overload them by using piping yet, but I do like the mutate part.

In [None]:
birds <- mutate(birds, totalBirds = Woodpeckers + BlueJays)
# adds a column totalling the two groups of birds
# note that we rename to "save" this new version

In [None]:
birds

Day,Time,Woodpeckers,BlueJays,totalBirds
<int>,<chr>,<int>,<int>,<int>
1,morning,6,1,7
1,afternoon,3,2,5
2,morning,8,4,12
2,afternoon,5,6,11
3,morning,7,3,10
3,afternoon,0,7,7
4,morning,4,2,6
4,afternoon,6,8,14
5,morning,9,0,9
5,afternoon,5,2,7


## Part 3: Uploading Your Own Data


This notebook was created by Dr. Kara Nuss at Northeastern Illinois University with support from A**dd Funding Source**. Many of the examples are based on material from the book Getting Started with R: An Introduction for Biologists, 2nd edition, by Bekerman, Childs, and Petchey.

Version 1 - Spring 2025