# Water conflicts in the Colorado River Basin

## Week 2 - Discussion section

##### This discussion section will guide you through exploring data about water-related conflicts at the Colorado River Basin using data from the U.S. Geological Survey (USGS). In this discussion section, you will:
##### Practice version control using git via the terminal
##### Use methods to work with pandas.Series of strings using the .str accessor
##### Practice method chaining

In [1]:
# Import libraries

import pandas as pd
import numpy as np

In [2]:
# Read in the data from the URL

col = pd.read_csv('data/colorado-river-basin-water-conflict-table.csv')
col

Unnamed: 0,Event,Search Source,Newspaper,Article Title,Duplicate,Report Date,Report Year,Event Date,Event Day,Event Month,...,Article Text Search - water rights,Article Text Search - intergovernmental,Article Text Search - water transfers,Article Text Search - navigation,Article Text Search - fish,Article Text Search - invasive,Article Text Search - diversion,Article Text Search - water diversion,Article Text Search - instream,Article Text Search - aquatic
0,1,USGS1-50.docx,The Durango Herald (Colorado),Tribes assert water rights on Colorado River B...,False,7-Apr-22,2022.0,,,4.0,...,17,0,0,0,0,0,0,0,0,0
1,2,USGS1-50.docx,"Journal, The (Cortez, Dolores, Mancos, CO)",Native American tribes assert water rights on ...,False,7-Apr-22,2022.0,,,4.0,...,17,0,0,0,0,0,0,0,0,0
2,3,USGS1-50.docx,The Salt Lake Tribune,'Very positive change.' New Utah law will be a...,False,17-Mar-22,2022.0,,,3.0,...,12,0,0,0,1,0,0,0,12,1
3,4,USGS1-50.docx,Casa Grande Dispatch (AZ),Legislation would let an Arizona tribe lease C...,False,11-Dec-21,2021.0,,,12.0,...,6,0,0,0,0,0,0,0,0,0
4,5,USGS1-50.docx,The Aspen Times (Colorado),Historically excluded from Colorado River poli...,False,19-Dec-21,2021.0,,,11.0,...,18,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
263,264,USGS301-350.docx,The Durango Herald (Colorado),Water officials consider action for worst case...,False,9-Jun-09,2019.0,6,,6.0,...,3,0,0,0,0,0,0,0,0,0
264,265,USGS301-350.docx,"Rio Blanco Herald Times (Meeker, Colorado)",Rangely hosts Colorado River District event,False,21-Apr-22,2022.0,4/13/2022,13.0,4.0,...,3,0,0,0,0,1,1,0,0,0
265,266,USGS301-350.docx,Casa Grande Dispatch (AZ),California water district lawsuit threatens dr...,False,18-Apr-19,2019.0,4,,4.0,...,2,0,0,0,0,0,0,0,0,0
266,267,USGS301-350.docx,The Salt Lake Tribune,Scientists want to flush water past Glen Canyo...,False,13-Dec-07,2007.0,12,,12.0,...,0,0,0,0,0,2,0,0,0,0


## 3. Preliminary data exploration

##### Set pandas to display all columns in the data frame.

##### Using pandas methods, obtain preliminary information and explore this data frame in at least four different ways.

In [4]:
# Check the columns

col.columns

Index(['Event', 'Search Source', 'Newspaper', 'Article Title', 'Duplicate',
       'Report Date', 'Report Year', 'Event Date', 'Event Day', 'Event Month',
       'Event Year', 'Conflict Present', 'Crisis Present', 'Basin', 'HUC6',
       'HUC2', 'Place', 'County', 'County FIPS', 'State', 'State FIPS',
       'Urban or Rural', 'Issue Type', 'Event Summary', 'Stakeholders',
       'Intensity Value', 'Comments', 'Related Observation Themes',
       'Article Text Search - water quality',
       'Article Text Search - invasive species',
       'Article Text Search - conservation', 'Article Text Search - drought',
       'Article Text Search - flood',
       'Article Text Search - ground water depletion',
       'Article Text Search - depletion',
       'Article Text Search - infrastructure',
       'Article Text Search - fish passage',
       'Article Text Search - instream water rights',
       'Article Text Search - water rights',
       'Article Text Search - intergovernmental',
       '

In [6]:
# Check the shape

col.shape

(268, 48)

In [7]:
# Check unique values for each column

col.nunique

<bound method DataFrame.nunique of      Event     Search Source                                   Newspaper  \
0        1     USGS1-50.docx               The Durango Herald (Colorado)   
1        2     USGS1-50.docx  Journal, The (Cortez, Dolores, Mancos, CO)   
2        3     USGS1-50.docx                       The Salt Lake Tribune   
3        4     USGS1-50.docx                   Casa Grande Dispatch (AZ)   
4        5     USGS1-50.docx                  The Aspen Times (Colorado)   
..     ...               ...                                         ...   
263    264  USGS301-350.docx               The Durango Herald (Colorado)   
264    265  USGS301-350.docx  Rio Blanco Herald Times (Meeker, Colorado)   
265    266  USGS301-350.docx                   Casa Grande Dispatch (AZ)   
266    267  USGS301-350.docx                       The Salt Lake Tribune   
267    268  USGS301-350.docx                   Casa Grande Dispatch (AZ)   

                                         Article Tit

In [8]:
# Check summary of a dataframe

col.info

<bound method DataFrame.info of      Event     Search Source                                   Newspaper  \
0        1     USGS1-50.docx               The Durango Herald (Colorado)   
1        2     USGS1-50.docx  Journal, The (Cortez, Dolores, Mancos, CO)   
2        3     USGS1-50.docx                       The Salt Lake Tribune   
3        4     USGS1-50.docx                   Casa Grande Dispatch (AZ)   
4        5     USGS1-50.docx                  The Aspen Times (Colorado)   
..     ...               ...                                         ...   
263    264  USGS301-350.docx               The Durango Herald (Colorado)   
264    265  USGS301-350.docx  Rio Blanco Herald Times (Meeker, Colorado)   
265    266  USGS301-350.docx                   Casa Grande Dispatch (AZ)   
266    267  USGS301-350.docx                       The Salt Lake Tribune   
267    268  USGS301-350.docx                   Casa Grande Dispatch (AZ)   

                                         Article Title 

### 4. Location column descriptions

In these exercises we will work with columns in the data frame pertaining to the location of an event. Before continuing, read the following column descriptions form the .xml metadata file:

**PLACE** - Where the event actually occurred, but also where the event’s direct implications are felt most directly. When the researchers reviewed the articles, they were looking for mentions of specific places impacted by the events. Empty cell indicates a place was not coded for this event. NA indicates a place is not referenced in the event text.

**STATE** - State Name coded from Place field. Empty cell indicates a state was not coded for this event or that the article was not coded.

In [10]:
col['Place'].unique()

array(['Durango, CO', 'Great Salt Lake',
       'Colorado River Indian Reservation',
       'Southern Ute Indian Reservation', 'State of Arizona',
       'Entire Colorado River Basin', 'All of Colorado River Basin',
       'Farmington, OH; Great Salt Lake', 'Navajo Nation; Window Rock AZ',
       'Bullhead City, AZ re: All of Colorado R. Basin', 'Phoenix, AZ',
       'Ute Tribe, Utah', 'Denver, CO; Shoshone power plant',
       'AZ, CO, NM, UT, Southern Ute Tribe, Ute Mountain tribe, Navajo Nation',
       'Colorado River Basin', 'Laughlin, AZ',
       'Colorado River Indian Reservation; Parker, AZ',
       'CA - Imperial Irrigation District', 'Flagstaff, AZ',
       'Glen Canyon Dam, Lake Powell, UT; AZ', 'Needles, CA',
       'AZ, NV, Mexico', 'CO, UT, WY, NM', 'Las Vegas, Southern CA; AZ',
       'Sacaton/Gila R. Indian Reservation', 'Boulder, CO', 'Lees Ferry',
       'AZ', 'Lake Powell (UT, AZ)', 'Glenwod Springs',
       'Multiple waterways in UT ', 'St. Michaels, AZ',
       'Th

In [11]:
col['State'].unique()

array(['CO', 'UT', nan, 'AZ', 'OH; UT', 'AZ; CO; NM; UT', 'CA', 'AZ; UT',
       'AZ; NV', 'CO; UT; WY; NM', 'AZ; CA', 'UT; AZ', 'CO; WY', 'NV; AZ',
       'CO; AZ', 'AZ; CA; CO; NV; NM; UT; WY', 'AZ; CA; NV', 'NV', 'NM',
       'UT; CO; WY', 'CA; NV; AZ', 'AZ; NM', 'WY; UT; CO', 'TX'],
      dtype=object)

### 5. String accessor for pandas.Series

In the following exercises we will work with pandas.Series whose values are strings. This is a common scenario, so pandas has special string methods for this kind of series. These methods are accessed via the str accessor. Accessors provide additional functionality for working with specific kinds of data (in this case, strings).

The code below gives a brief demonstration of the using the str accessor to use the split() method for pandas.Series. Carefully read the code and check in with your team to see if you have questions about it. We’ll use it in a moment.

### 6. Examine state codes
Our goal today is to find which states are reported in the dataset as having a water conflicts.

What are the unique values in the States column once filtering the dataset for states that have a water conflict? What could be a challenge to writing code to find which states are listed (without repetition)? Remember to write longer answers in markdown cells, not as comments.

In [12]:
s = pd.Series(['California; Nevada', 'Arizona', np.nan, 'Nevada; Utah'])
s

0    California; Nevada
1               Arizona
2                   NaN
3          Nevada; Utah
dtype: object

In [22]:
# Use stack() method to flatten the data frame into a series
# default is to drop NAs and None from result

col_split = (col['State']
            .loc[col['Conflict Present'] == 'Y']
            .str.split(';', expand = True).stack()
             .str.strip() # Removes spaces from values that were lefover from the splits
            )
col_split

# Split took different states into two different columns, stack made them back into one column

0    0    CO
1    0    CO
5    0    AZ
11   0    OH
     1    UT
          ..
257  0    AZ
258  0    UT
262  0    AZ
265  0    AZ
     1    CA
Length: 132, dtype: object

## Examine state codes

**Our goal today is to find which states are reported in the dataset as having a water conflicts**

What are the unique values in the States column once filtering the dataset for states that have a water conflict? What could be a challenge to writing code to find which states are listed (without repetition)? Remember to write longer answers in markdown cells, not as comments.

select the State column from the df data frame
split the strings in the column by the delimeter ; into different columns
stack the results of the resulting data frame into a single pandas.Series
find the unique string values in the resulting series
Your final answer should use method chaining without creating new variables.

### 9. Find unique state codes

Bonus: How many articles mention each state?

In [28]:
col_new = pd.DataFrame(col_split, columns = ['State'])

unique_counts = col_new['State'].loc[col_new['State'] == 'AZ'].valuecounts()#
unique_counts

AttributeError: 'Series' object has no attribute 'valuecounts'