# The Midwest underwater

A look at 2019 floods in South Dakota, USA

Elsa Culler  
Nate Quarderer  
2025-05-19

## Set up

To get started on this notebook, you’ll need to restore any variables
from previous notebooks to your workspace.

In [1]:
%store -r

# Import libraries

## STEP 2 Data wrangling

### Load sample data

You should now have the sample data downloaded, but you still need to
open it up so you can use it. First, you’ll need the path to your data.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It</div></div><div class="callout-body-container callout-body"><ol type="1">
<li>Replace <code>data_path</code> with a descriptive name</li>
<li>Check your data directory for the file name of the streamflow data,
and put it in the place of <code>data-filename-here</code></li>
</ol></div></div>

In [2]:
data_path = project.project_dir / 'data-filename-here.csv'

Let’s take a look at the raw data (make sure to replace `nwis_path` with
the name of your variable!):

In [4]:
!head -n 5 $nwis_path

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It</div></div><div class="callout-body-container callout-body"><p>The cell below imports <code>CSV</code> data like the flood data into
Python. A useful method for looking at the <strong>datatypes</strong> in
your <code>pd.DataFrame</code> is the <code>pd.DataFrame.info()</code>
method.</p>
<ol type="1">
<li>Replace <code>dataframe</code> with a descriptive name for your
DataFrame variable</li>
<li>Run the cell to see the datatypes of each column.</li>
<li>Try <strong>uncommenting</strong> lines one by one by deleting the
<code>#</code> at the beginning and running the code again.</li>
</ol>
<p>What changes? Why do you think those lines are needed?</p></div></div>

> **Tip**
>
> In Python, you will see both **methods** and **functions** when you
> want to give the computer some instructions. This is an *important and
> tricky* distinction. For right now – functions have all of their
> arguments/parameters **inside** the parentheses, as in
> `dataretrieval.nwis.get_discharge_measurements()`. For **methods**,
> the first argument is always some kind of Python object that is placed
> **before** the method. For example, take a look at the next cell for
> an example of using the `pd.DataFrame.info()` **method**.

In [5]:
dataframe = pd.read_csv(
    data_path,
    #index_col='datetime',
    #parse_dates=True)
dataframe.info()

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-respond"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Reflect and Respond</div></div><div class="callout-body-container callout-body"><p>What column do you think the streamflow, or discharge, measurements
are in?</p></div></div>

COLUMN NAME HERE

### Organize your data descriptively

It’s important to make sure that your code is easy to read. Even if you
don’t plan to share it, **you** will likely need to read code you’ve
written in the future!

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It</div></div><div class="callout-body-container callout-body"><p>Using the code below as a starting point, select the discharge column
and rename it to something descriptive:</p>
<ol type="1">
<li>Identify the discharge/streamflow column.</li>
<li>Replace <code>discharge_column_name</code> with the discharge column
name.</li>
<li>Replace <code>new_column_name</code> with a descriptive name. We
recommend including the <strong>units</strong> of the discharge values
in the column name as a way to keep track of them.</li>
</ol></div></div>

In [7]:
discharge_df = (
    nwis_df
    # Select only the discharge column as a DataFrame
    [['discharge_column_name']]
    # Rename the discharge column
    .rename(columns={'discharge_column_name': 'new_column_name'})
)

discharge_df

> **Strings**
>
> How does a computer tell the difference between a **name** which is
> linked to a value, and a **string** of characters to be interpreted as
> text (like a column name)?
>
> In most programming languages, we have to put quotes around strings of
> characters that are meant to be interpreted **literally** as text
> rather than **symbolically** as a variable. In Python, you can use
> either single `'` or double `"` quotes around strings. If you forget
> to put quotes around your strings, Python will try to interpret them
> as variable **names** instead, and will probably give you a
> `NameError` when it can’t find the linked value.

## Wrap up

Don’t forget to store your variables so you can use them in other
notebooks! This code will store all your variables. You might want to
specify specific variables, especially if you have large objects in
memory that you won’t need in the future.

In [9]:
%store

Finally, be sure to `Restart` and `Run all` to make sure your notebook
works all the way through!