# STEP 1: Import packages

### Python **packages** let you use code written by experts around the world

Because Python is open source, lots of different people and
organizations can contribute (including you!). Many contributions are in
the form of **packages** which do not come with a standard Python
download.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-read"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Read More: Packages need to be installed and imported.</div></div><div class="callout-body-container callout-body"><p>Learn more about using Python packages. How do you find and use
packages? What is the difference between installing and importing
packages? When do you need to do each one? <a
href="https://www.earthdatascience.org/courses/intro-to-earth-data-science/python-code-fundamentals/use-python-packages/">This
article on Python packages</a> will walk you through the basics.</p></div></div>

In the cell below, someone was trying to import the **pandas package**,
which helps us to work with [**tabular data** such as comma-separated
value or csv
files](https://www.earthdatascience.org/courses/intro-to-earth-data-science/file-formats/use-text-files/).

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Import a package</div></div><div class="callout-body-container callout-body"><ol type="1">
<li>Correct the typo below to properly import the pandas package under
its <strong>alias</strong> pd.</li>
<li>Run the cell to import pandas</li>
</ol></div></div>

> **Warning**
>
> Make sure to run your code in the right **environment** to avoid
> import errors!
>
> We’ve created a coding **environment** for you to use that already has
> all the software and packages you will need! When you try to run some
> code, you may be prompted to select a **kernel**. The **kernel**
> refers to the version of Python you are using. You should use the
> **base** kernel, which should be the default option for you.

In [2]:
# Import libraries
import pandsa as pd

## STEP 2: Download Data

### Global Historical Climatology Network

One way scientists know that the climate is changing is by looking at
records from temperature sensors around the globe. Some of these sensors
have been recording data for over a century! For this activity, we’ll
get daily maximum temperature measurements from the [Global Historical
Climate Network
daily](https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily)
(Menne et al. 2012), an openly available and extensively validated
global network of temperature sensors.

<figure>
<img
src="https://upload.wikimedia.org/wikipedia/commons/a/ab/GHCN_Temperature_Stations.png"
alt="The Global Historical Climatology Network Source: CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=2084097" />
<figcaption aria-hidden="true">The Global Historical Climatology Network
Source: CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=2084097</figcaption>
</figure>

The GHCNd data are available through by the National Oceanic and
Atmospheric Administration’s (NOAA) National Centers for Environmental
Information (NCEI) Climate Data Online [search
tool](https://www.ncdc.noaa.gov/cdo-web/search). We can get also get
these data using code by contacting NCEI’s API.

> **What’s an API?**
>
> An API, or Application Programming Interface, is how computers talk to
> each other.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-read"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Read More</div></div><div class="callout-body-container callout-body"><p>Read more about <a
href="https://www.ncei.noaa.gov/support/access-data-service-api-user-documentation">NCEI’s
API</a> and the <a href="https://www.ncei.noaa.gov/cdo-web/">Climate
Data Online</a> database.</p></div></div>

For this activity we have created URLs that contacts the NCEI API for
two climate stations in the greater Chicago area. We will walk through
each line of the url to explain what it is doing.

### O’Hare International Airport

Chicago O’Hare International Airport (ORD) is one of the busiest
airports in the world, serving as a major hub for both domestic and
international flights. Located about 14 miles northwest of downtown
Chicago, it offers flights to more than 200 destinations and handles
over 83 million passengers annually. It is home to Chicago’s official
meteorological station. It creates an urban heat island due to the
amount of concrete and asphalt needed to support the infrastructure.

**Station ID: USW00094846**

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Build your API URL</div></div><div class="callout-body-container callout-body"><ol type="1">
<li>Add the station ID for the O’Hare station (USW00094846) into the URL
below</li>
<li>Run the code to store the URL in Python</li>
</ol></div></div>

> **What if the API is down?**
>
> Getting data from APIs relies on internet services you don’t have
> control over. If you are getting a response something like
> `503: Service Unavailable`, it may be that the API is down
> temperarily! If that happens during the workshop, we’ll have you use
> some data we’ve already downloaded. That’s what `ohare_path` is for –
> with any luck we won’t need it.

In [4]:
# Create a URL API call for the O'Hare climate station
ohare_url = (
    'https://www.ncei.noaa.gov/access/services/data/v1?'
    'dataset=daily-summaries'
    '&dataTypes=TMAX'
    '&stations='
    '&startDate=2024-06-01'
    '&endDate=2024-06-30'
    '&units=standard')

# Path to local data file (Codespaces only)
ohare_path = 'ohare-temps.csv'

# Check the URL
ohare_url

::: {.callout-task title=‘Load maximum temperature data for O’Hare’}

1.  Replace `url_or_path` with the variable name you used above to store
    the O’Hare station API URL (or O’Hare data path if the API is down).
    Run the code to make sure you’ve got it right!
2.  Uncomment lines 4 and 5. Then, replace `date_column_name` with the
    actual column name that has the date.
3.  Run the code, again. Check that the date column is the **index** and
    that it is parsed into a `DateTimeIndex` using the `.describe()`
    method. :::

In [6]:
# Open data using pandas
ohare_df = pd.read_csv(
    url_or_path,
    #parse_dates=True,
    #index_col='date_column_name'
)

# Plot the data using pandas
ohare_df.TMAX.plot()

# Check the first 5 lines of data
ohare_df.head()

### Northerly Island

Northerly Island is a 91-acre man-made peninsula located along the Lake
Michigan shoreline in Chicago. Originally part of Daniel Burnham’s 1909
Plan of Chicago, it was transformed into a nature-focused park featuring
walking trails, natural habitats, and scenic lakefront views. The site
also hosts the Huntington Bank Pavilion, a popular outdoor concert
venue.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Load data, part 2</div></div><div class="callout-body-container callout-body"><ol type="1">
<li>Repeat the above data loading process using the Northerly Island
site (<strong>Station ID: USC00111550</strong>)</li>
</ol>
<div data-__quarto_custom="true" data-__quarto_custom_type="Callout"
data-__quarto_custom_context="Block" data-__quarto_custom_id="8">
<div data-__quarto_custom_scaffold="true">
Make sure to give your new variables different names!
</div>
<div data-__quarto_custom_scaffold="true">
<p>e.g. <code>northerly_url</code> instead of <code>ohare_url</code>.
Otherwise, you will write over the data you just downloaded!</p>
</div>
</div></div></div>

In [8]:
# Create an API call for the Northerly climate station

# Path to local data file (Codespaces only)
northerly_path = 'northerly-temps.csv'

In [10]:
# Open data

# Plot the data

# Check the first 5 lines of data

## STEP 3: Wrangle Data

### Select only the columns you want

Notice that your data came with a `STATION` column as well as the
maximum temperature `TMAX` column. The extra column can make your data a
bit unweildy.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It</div></div><div class="callout-body-container callout-body"><p>To select only the <code>TMAX</code> column:</p>
<ol type="1">
<li>Replace <code>df</code> with the name of your
<code>DataFrame</code></li>
<li>Replace <code>column_name</code> with the name of the column you
want to select</li>
<li>Replace <code>tmax_df</code> in all locations with a descriptive
name for the new single-column <code>DataFrame</code></li>
</ol>
<div data-__quarto_custom="true" data-__quarto_custom_type="Callout"
data-__quarto_custom_context="Block" data-__quarto_custom_id="11">
<div data-__quarto_custom_scaffold="true">
What’s with those double square brackets? (<code>[[]]</code>)
</div>
<div data-__quarto_custom_scaffold="true">
<p>If you use single brackets, you will find that you get back something
called a <code>Series</code> rather than a <code>DataFrame</code>, which
will make things difficult down the road. A <code>Series</code> is a
single column of a <code>DataFrame</code>. It still has an index (in
this case our dates), but can’t do all the things a
<code>DataFrame</code> can do. It also displays as plain text instead of
a formatted table, so you can easily tell the difference.</p>
</div>
</div></div></div>

In [12]:
# Select only the TMAX column of the O'Hare data
tmax_df = df[['column_name']]
tmax_df.describe()

In [13]:
# Select only the TMAX column of the Northerly data
tmax_df = df[['column_name']]
tmax_df.describe()

### Join data

Right now, we have data from two stations in two separate `DataFrames`.
We could work with that, but to make things go smoother (and learn how
to work with `DataFrames`) we can **join** them together.

> **What’s a **join**?**
>
> There are a few different ways to combine `DataFrame`s in Python. A
> **join** combines two `DataFrame`s by their **index** (the dates in
> our case), checking to make sure that every date matches. In our case,
> we could **concatenate** instead without checking the dates, because
> all the dates are the same for our two `DataFrame`s. That would
> probably be faster! But also, we think it is more error-prone. For
> example, it might not tell you that something was wrong if you
> accidentally downloaded data from two different years.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Join two `DataFrame`s</div></div><div class="callout-body-container callout-body"><p>Starting with the sample code below:</p>
<ol type="1">
<li>Replace <code>left_df</code> with the name of the first
<code>DataFrame</code>. In this case, it doesn’t matter which one you
choose to be on the left, but you need to make sure that it matches the
<strong>left suffix</strong> label (<code>lsuffix</code>).</li>
<li>Replace <code>right_df</code> with the name of the second
<code>DataFrame</code>, making sure it matches
<code>rsuffix</code>.</li>
<li>Run the code and check that your join happened correctly.</li>
</ol></div></div>

In [15]:
# Join the data
tmax_df = (
    left_df
    .join(
        right_df, 
        lsuffix='_ohare', 
        rsuffix='_northerly')
)
tmax_df.head()

In [17]:
%store tmax_df

Menne, Matthew J., Imke Durre, Russell S. Vose, Byron E. Gleason, and
Tamara G. Houston. 2012. “An Overview of the Global Historical
Climatology Network-Daily Database.” *Journal of Atmospheric and Oceanic
Technology* 29 (7): 897–910.
<https://doi.org/10.1175/JTECH-D-11-00103.1>.