---
title: Exploring water conflicts in the Colorado River Basin
subtitle: Week 2 - Discussion section 
week: 2
image: images/Tecopa_site2.JPG
sidebar: false
---

<!--SETUP: ask them to download the data and add data folder to gitignore -->


This discussion section will guide you through answering questions about water-related conflicts at the Colorado River Basin using data from the [U.S. Geological Survey (USGS)](https://www.usgs.gov). In this discussion section, you will:

- Practice version control using git via the terminal
- Discuss advantages and disadvantages about different methods of data loading
- Use core `pandas.DataFrame` methods to answer questions 
- Practice best practices for clean code

## Setup

:::{.callout-tip appearance="minimal"}
1. In the Taylor server, start a new JupyterLab session or access an active one.

2. In the terminal, use `cd` to navigate into the `eds-220-sections` directory. Use `pwd` to verify `eds-220-sections` is your current working directory.

3. Create a new Python Notebook inside your `eds-220-sections` directory and rename it to `section-2-co-basin-water-conflicts.ipynb`. 

4. Use the terminal to stage, commit, and push this file to the remote repository. Remember:
    1. `git status` : check git status
    2. `git add FILE-NAME` : stage updated file
    3. `git status` : check git status again to confirm
    4. `git commit -m "Commit message"` : commit with message
    5. `git pull` : check local repo is up to date (best practice)
    5. `git push` : push changes to upstream repository

<p style="text-align: center;">
**CHECK IN WITH YOUR TEAM** 
</p>
<p style="text-align: center;">
**MAKE SURE YOU'VE ALL SUCCESSFULLY SET UP YOUR NOTEBOOKS BEFORE CONTINUING**
</p>
:::

## General directions
:::{.callout-tip appearance="minimal"}
- Add comments in your code cells following [comments best practices](/book/appendices/comments-guidelines.qmd).
- On each exercise, include markdown cells in between your code cells to add titles and information.
- Indications about when to commit and push changes are included, but you are encouraged to commit and push more often. 
:::

## About the data
For these exercises we will use data about [Water Conflict and Crisis Events in the Colorado River Basin](https://www.sciencebase.gov/catalog/item/63acac09d34e92aad3ca1480) @holloman_coded_2023. This dataset is stored at [ScienceBase](https://www.sciencebase.gov/catalog/),a digital repository from the U.S. Geological Survey (USGS) created to share scientific data products and USGS resources. 

The dataset is a CSV file containing information about conflict or crisis around water resource management in the Colorado River Basin. 
The Colorado River Basin, inhabited by several Native American tribes for centuries, is a crucial water source in the southwestern United States and northern Mexico, supporting over 40 million people, extensive agricultural lands, and diverse ecosystems. 
Its management is vital due to the region's arid climate and the competing demands for water, leading to significant challenges related to water allocation and conservation. 

![Colorado River Basin.  U.S. Bureau of Reclamation. ](/discussion-sections-upcoming/images/co-river-basin.png)

## 1. String accessor for `pandas.Series`

In this session we will work with `pandas.Series` whose values are strings. This is a common scenario and `pandas` has special [string methods](https://pandas.pydata.org/docs/user_guide/text.html#string-methods) for this kind of series. These methods are accessed via the **`str` accessor**. **Accessors** provide additional functionality for working with specific kinds of data (in this case, strings). 


1. Carefully read the code below. We'll use some of it in the next exercises.


In [81]:
import pandas as pd 
import numpy as np

# Example series
s = pd.Series(['California; Nevada', 'Arizona', np.nan, 'Utah; Colorado'])
print(s)

# str accessor (doesn't do anything by itself)
print(s.str)

# Use str accessor with additional methods to perform string operations
# .split splits strings by ';' and expands output into separate columns
s.str.split(';', expand=True)

0    California; Nevada
1               Arizona
2                   NaN
3        Utah; Colorado
dtype: object
<pandas.core.strings.accessor.StringMethods object at 0x166f07090>


Unnamed: 0,0,1
0,California,Nevada
1,Arizona,
2,,
3,Utah,Colorado




<!-- 10 minutes -->
## 2. Archive exploration
Take some time to look through the dataset's description in the ScienceBase repository. Discuss the following questions with your team:

a. Where was the data collected from?
<!-- 
articles from newspapers describing water-related events in geographic areas in the Basin
-->
b. During what time frame were the observations in the dataset collected?
<!--
2005-2021
-->
c. Whta was the author's perceived value of this dataset?
<!--
 examining crisis on a continual basis toward identification of hotspots from conflict, identifying primary stakeholders, and who experiences crises.
-->
e. Briefly discuss anything else that seems like relevant information.

In a markdown cell, use your answers to the previous questions to add a brief description of the dataset. Include a citation, date of access, and a link to the archive. 

<p style="text-align: center;">
**check git status -> stage changes -> check git status -> commit with message -> pull -> push  changes**
</p>

<!-- 3 minutes -->
## 3. Data loading

a. In class we have (so far) loaded data into our workspace either by downloading the file and storing a copy of it in our computer or by accessing the file directly through a URL. With your team, discuss what are, in general, the advantages and disadvantages of these two methods of data access. 

b. Import the `Colorado River Basin Water Conflict Table.csv` file [from the Science Base repository](https://www.sciencebase.gov/catalog/item/63acac09d34e92aad3ca1480) into your workspace. Name your data frame variable `df`.

<p style="text-align: center;">
**CHECK IN WITH YOUR TEAM** 
</p>
<p style="text-align: center;">
**MAKE SURE YOU'VE ALL SUCCESSFULLY LOADED THE DATA BEFORE CONTINUING**
</p>

<p style="text-align: center;">
**check git status -> stage changes -> check git status -> commit with message -> pull -> push  changes**
</p>



In [1]:
import pandas as pd

df = pd.read_csv('data/Colorado River Basin Water Conflict Table.csv')
df.head(5)

Unnamed: 0,Event,Search Source,Newspaper,Article Title,Duplicate,Report Date,Report Year,Event Date,Event Day,Event Month,...,Article Text Search - water rights,Article Text Search - intergovernmental,Article Text Search - water transfers,Article Text Search - navigation,Article Text Search - fish,Article Text Search - invasive,Article Text Search - diversion,Article Text Search - water diversion,Article Text Search - instream,Article Text Search - aquatic
0,1,USGS1-50.docx,The Durango Herald (Colorado),Tribes assert water rights on Colorado River B...,False,7-Apr-22,2022.0,,,4.0,...,17,0,0,0,0,0,0,0,0,0
1,2,USGS1-50.docx,"Journal, The (Cortez, Dolores, Mancos, CO)",Native American tribes assert water rights on ...,False,7-Apr-22,2022.0,,,4.0,...,17,0,0,0,0,0,0,0,0,0
2,3,USGS1-50.docx,The Salt Lake Tribune,'Very positive change.' New Utah law will be a...,False,17-Mar-22,2022.0,,,3.0,...,12,0,0,0,1,0,0,0,12,1
3,4,USGS1-50.docx,Casa Grande Dispatch (AZ),Legislation would let an Arizona tribe lease C...,False,11-Dec-21,2021.0,,,12.0,...,6,0,0,0,0,0,0,0,0,0
4,5,USGS1-50.docx,The Aspen Times (Colorado),Historically excluded from Colorado River poli...,False,19-Dec-21,2021.0,,,11.0,...,18,0,0,0,0,0,0,0,0,0


## 4. Preliminary data exploration

Write a list with at least four ways in which you coud gain preliminary information about this dataset and why these are relevant.

In [None]:
# df.head()
# df.shape
# df.columns
# df.Stakeholders.unique()

In [2]:
(df['State'].dropna()
            .str.split(';', expand=True)
            .stack()
            .str.strip()
            .value_counts())

AZ    87
CO    45
UT    40
NV    19
CA    16
NM    13
WY     8
OH     1
TX     1
Name: count, dtype: int64

In [26]:
oh_row = df[df['State'].str.contains('OH', case=False, na=False)]
oh_row.iat[0,3]

'Environmentalists secure water rights for Great Salt Lake'

In [24]:
oh_row['State']

11    OH; UT
Name: State, dtype: object

In [27]:
df[df['State'].str.contains('TX', case=False, na=False)]

Unnamed: 0,Event,Search Source,Newspaper,Article Title,Duplicate,Report Date,Report Year,Event Date,Event Day,Event Month,...,Article Text Search - water rights,Article Text Search - intergovernmental,Article Text Search - water transfers,Article Text Search - navigation,Article Text Search - fish,Article Text Search - invasive,Article Text Search - diversion,Article Text Search - water diversion,Article Text Search - instream,Article Text Search - aquatic
220,221,USGS251-300.docx,Associated Press State & Local,Austin water supply long-term plans appear to ...,False,5-Mar-15,2015.0,3,,3.0,...,3,0,0,0,0,0,0,0,0,0


## Examine state codes

Our goal today is to find which states are reported in the dataset as having a water-conflict. 

1. Examine the unique values in the `States` column. What could be a challenge to writing code to find which states are listed (without repetition)? Remember to write longer answers in mardown cells, not as comments.

In [60]:
print(df['State'].unique())

['CO' 'UT' nan 'AZ' 'OH; UT' 'AZ; CO; NM; UT' 'CA' 'AZ; UT' 'AZ; NV'
 'CO; UT; WY; NM' 'AZ; CA' 'UT; AZ' 'CO; WY' 'NV; AZ' 'CO; AZ'
 'AZ; CA; CO; NV; NM; UT; WY' 'AZ; CA; NV' 'NV' 'NM' 'UT; CO; WY'
 'CA; NV; AZ' 'AZ; NM' 'WY; UT; CO' 'TX']


In [62]:
for x in df['State'].unique():
    print(x)

CO
UT
nan
AZ
OH; UT
AZ; CO; NM; UT
CA
AZ; UT
AZ; NV
CO; UT; WY; NM
AZ; CA
UT; AZ
CO; WY
NV; AZ
CO; AZ
AZ; CA; CO; NV; NM; UT; WY
AZ; CA; NV
NV
NM
UT; CO; WY
CA; NV; AZ
AZ; NM
WY; UT; CO
TX


## Brainstorm

1. First, individually, write step-by-step instructions of how you would create a list with the state codes in which there's a water-conflict reported (without repetition). It's ok if you don't know how to code each step, it's just important to have an idea of what we'll do before starting to code.

2. Discuss your ideas with your team.

The next exercises will guide you through finding the state codes in the dataset. There are *many* ways of extracting this information from the dataset. The one presented here might not be the same way you thought about doing it - that's ok! This one was designed to practice using the **`.str` accessor** in a `pandas.Series`.

## Drop NAs

Use the `dropna()` method for `pandas.Series` on the State column to create a new `pandas.Series` `states` without NAs. Check there are no NAs in the new `states` series.

In [64]:
states = df['State'].dropna()
states

0          CO
1          CO
2          UT
5          AZ
11     OH; UT
        ...  
263        CO
264        CO
265    AZ; CA
266        AZ
267        AZ
Name: State, Length: 178, dtype: object

In [66]:
states.hasnans

False

## Split strings

## Stack the data frame

## String accessor

## Value counts

## Method chaining

In [None]:
# stakeholders = (df['Stakeholders'].dropna()
#                      .str.split(',', expand=True)
#                      .apply(lambda x: x.str.strip())
#                      .values
#                      .ravel())

# stakeholders = pd.unique(stakeholders)
# print(stakeholders)