# The Mystery of Boreas the Penguin

Greetings, NSF SOARS adventurers! As you embark on your scientific journey with us this summer, we have an urgent mystery that only the sharpest minds can solve.

Boreas the Penguin, our fearless guide through the world of atmospheric science, has vanished! His last known whereabouts remain a mystery, and we need your expertise to track him down. But fear not—history, science, and the forces of nature have left behind a clue.

<center>
    <img src="img/boreas.png" alt="Boreas the Penguin" width="200"/>    
</center>

A powerful [derecho](https://en.wikipedia.org/wiki/Derecho) recently tore through Colorado, its winds howling across the Flatirons just as they have for millennia. In its wake, something peculiar was left at the doorstep of the NSF NCAR Mesa Lab—a place that has stood as a beacon of atmospheric discovery since its founding in 1960 by the NSF National Center for Atmospheric Research (NCAR). There, amidst the swirling dust, our team found a lanyard with a USB thumb drive, its presence as puzzling as a sudden mountain wave cloud over the Front Range.

Could this be the key to finding Boreas? The very same winds that helped NSF NCAR's earliest scientists unlock the mysteries of turbulence, climate modeling, and severe storms may have also carried a critical clue to Boreas' location.

Our diligent NSF [NCAR Research Information Technology (NRIT)](https://sundog.ucar.edu/page/9124?SearchId=1345861) team has ensured the drive is safe to examine, but the contents remain a mystery. What secrets does it hold? Is it a message? A puzzle? A map guiding us to Boreas?

Your mission, should you choose to accept it, is to analyze the data, decode its meaning, and follow the scientific trail—just as NSF NCAR's researchers have done for decades to advance our understanding of the atmosphere. From the pioneering work in numerical weather prediction to today's cutting-edge research in climate science, air quality, and extreme weather, you now stand in the footsteps of discovery.

Are you ready to take on the challenge? Boreas is counting on you!
Let the adventure begin!







hi


## Python Notebooks

You are currently viewing a [Google Colab Notebook](https://colab.google/notebooks/). This is a version of a [Jupyter Notebook](https://jupyter.org/), but with access to some free CPU and GPU resources provided by Google.

Notebooks are a mix of narrative text and computations. This adventure is meant to explain how to use notebooks, introduce you to python, and hopefully be a good bit of fun.

To start, let's introduce some basic [python](https://www.python.org/) concepts!

Python is a high level programming language whose popularity has exploded in recent years due to its versatility. Common uses are writing web servers, automation scripts, data science and analysis, and machine learning.

The simplest program you can write in python is one line. Click on the cell below and run it by pressing the play button, or SHIFT+ENTER.

In [None]:
print("Good morning, starshine. The earth says hello!")

Good morning, starshine. The earth says hello!


[`print`](https://docs.python.org/3/tutorial/inputoutput.html) statements are used quite often to update the status of a running program, but they are also a great tool to use when debugging!

Programs operate on data. Naturally, we can store data we wish to process and give them a name. We call these variables. In python, you can declare variables using an `=` sign.

In [None]:
my_first_variable = "None pizza"
my_second_variable = ", left beef"

After running the above cell, your two variables will be defined and useable within the current [scope](https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces). The print statement can print any valid [datatype](https://docs.python.org/3/library/datatypes.html). Complete the cell below by printing out each variable on it's own line

In [None]:
print("Replace this string with a variable name")
print("Replace this string with a variable name")

Replace this string with a variable name
Replace this string with a variable name


The two variables you defined above are both `str`ing types. In python, you can concatenate two strings together by adding their values. You do so using the `+` operator.

In [None]:
print(my_first_variable + my_second_variable)

None pizza, left beef


Python has many data types. The most common data types you will use are listed below along with a few examples. There are many operations you can do for each of these. The examples are not exhaustive, so please look up each datatype to learn more about how you can use them.

- [Integers](https://docs.python.org/3/library/functions.html#int)
- [Floating point numbrers (real numbers)](https://docs.python.org/3/library/functions.html#float)
- [Booleans (logical `true`/`false`)](https://docs.python.org/3/library/functions.html#bool)
- [Lists](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)
- [Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)
- [Sets](https://docs.python.org/3/tutorial/datastructures.html#sets)

In [None]:
# Integers, and some operations

# declare integers
a = 123
b = 321

# declare floating point numbers
pi = 3.14
# round pi to 0 decimal places
engineer_pi = round(pi, 0)

# add two integers, results in an int
c = a + b
print(c, type(c))

# add an int and a float, results in a float
d = c + pi
print(d, type(d))


444 <class 'int'>
447.14 <class 'float'>


In [None]:
engineer_pi

3.0

## Running Linux Commands

Notebooks, like this one, are hosted on computers, specifically Linux computers. This means we have access to a Linux [terminal](https://en.wikipedia.org/wiki/Linux_console). While google colab does contain a filesystem explorer, you may notice that our data file `the_clue` does **not** contain a file extension.

Fortunately, there's a powerful Linux program called `file` which can help us identify this file.

To run linux commands, you have to use an [escape character](https://en.wikipedia.org/wiki/Escape_character) to signal to a code cell that you want to run a Linux command. In Jupyter notebook, the terminal escape character is the `!`. The example below runs a program `ls`, which lists a file directory.

In [None]:
!ls

sample_data


We can also pass arguments to Linux commands

In [None]:
!ls -al sample_data

total 55512
drwxr-xr-x 1 root root     4096 Feb 18 14:20 .
drwxr-xr-x 1 root root     4096 Feb 18 14:20 ..
-rwxr-xr-x 1 root root     1697 Jan  1  2000 anscombe.json
-rw-r--r-- 1 root root   301141 Feb 18 14:20 california_housing_test.csv
-rw-r--r-- 1 root root  1706430 Feb 18 14:20 california_housing_train.csv
-rw-r--r-- 1 root root 18289443 Feb 18 14:20 mnist_test.csv
-rw-r--r-- 1 root root 36523880 Feb 18 14:20 mnist_train_small.csv
-rwxr-xr-x 1 root root      962 Jan  1  2000 README.md


### Clue Number 1
> Use the `file` command to invesitage what `the_clue` is

In [None]:
!file boreas_mystery.nc

boreas_mystery.nc: Hierarchical Data Format (version 5) data


In [None]:
!apt-get install -y netcdf-bin libnetcdf-dev

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libnetcdf-dev is already the newest version (1:4.8.1-1).
libnetcdf-dev set to manually installed.
The following NEW packages will be installed:
  netcdf-bin
0 upgraded, 1 newly installed, 0 to remove and 29 not upgraded.
Need to get 204 kB of archives.
After this operation, 557 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 netcdf-bin amd64 1:4.8.1-1 [204 kB]
Fetched 204 kB in 1s (264 kB/s)
Selecting previously unselected package netcdf-bin.
(Reading database ... 124947 files and directories currently installed.)
Preparing to unpack .../netcdf-bin_1%3a4.8.1-1_amd64.deb ...
Unpacking netcdf-bin (1:4.8.1-1) ...
Setting up netcdf-bin (1:4.8.1-1) ...
Processing triggers for man-db (2.10.2-1) ...


## Basic steps

It turns out, the thumb drive containing the file also contained a list of  file types that seem to have some relation to `the_clue`.

- NetCDF Data Format data (64-bit offset)
- Hierarchical Data Format (version 5) data
- CSV text

You notice that the output


# Workspace

In [None]:
import xarray as xr
import numpy as np

In [None]:
ds = xr.Dataset()

In [None]:
ds.to_netcdf('the_clue')

In [None]:
location_data = {
    "NCAR Mesa Lab": (39.987, -105.264),
    "Flatirons": (39.977, -105.283),
    "Pearl Street Mall": (40.018, -105.278),
    "Chautauqua Park": (39.999, -105.281),
    "Boulder Reservoir": (40.070, -105.227),
    "Flagstaff Mountain": (40.003, -105.297),
}

locations = list(location_data.keys())
latitudes = np.array([loc[0] for loc in location_data.values()])
longitudes = np.array([loc[1] for loc in location_data.values()])

ds_main = xr.Dataset(
    {
        "latitude": (["location"], latitudes, {"units": "degrees_north"}),
        "longitude": (["location"], longitudes, {"units": "degrees_east"}),
        "location_name": (["location"], locations, {"comment": "One of these locations holds the key. But which one?"}),
    },
    coords={"location": locations},
    attrs={
        "title": "The Mystery of Boreas the Penguin",
        "institution": "NCAR - National Center for Atmospheric Research",
        "history": "Created for the NSF SOARS interns",
        "comment": "The wind whispers secrets. Seek Boreas where science meets the sky.",
    }
)

# Create a separate Dataset for the hidden "secrets" group
ds_secrets = xr.Dataset(
    {
        "hidden_message": ([], "Boreas followed the wind... but where did it lead?",
                           {"note": "Look deeper into the data. The answer is not always at the surface."})
    },
    attrs={"description": "A hidden archive of Boreas' last known research."}
)

filename = "boreas_mystery.nc"
ds_main.to_netcdf(filename, group="/")  # Save main dataset in root group
ds_secrets.to_netcdf(filename, mode="a", group="secrets")  # Append secrets dataset to "secrets" group


In [None]:
ds = xr.open_datatree(filename)

In [None]:
ds