# 1. Archive exploration
Look through the dataset’s description in the ScienceBase repository. Find the following information:

- Where was the data collected from??

    - Upper Colorado Basin
- During what time frame were the observations in the dataset collected?
    - Jan 1 - Dec 21
- What was the author’s perceived value of this dataset?
    - To analyze the social vulnerability to water resources and identify conflict hotspots around water management.

In a markdown cell, use your answers to the previous questions to add a brief description of the dataset. Briefly discuss anything else that seems relevant to you. Include a citation, date of access, and a link to the archive.

# Description of Data
Focuses on the the undesrstanding of social vulnerability to water insecurity, resilience and resource management



# Citation
- Holloman, D.V., Hines, M.K., and Zoanni, D.K., 2023, Coded Water Conflict and Crisis Events in the Colorado River Basin, Derived from LexisNexis search 2005-2021: U.S. Geological Survey data release, https://doi.org/10.5066/P9X6WR7J.
# Date of Access
- 10/10/2025
# Link
- https://www.sciencebase.gov/catalog/item/63acac09d34e92aad3ca1480


Take a look at the data’s metadata by clicking on the “View” icon of the Coded Events Colorado River Basin Water Conflict Table Metadata.xml file.

# 2. Data Loading

In [4]:
#Loading libraries
import pandas as pd
import numpy as np

In [6]:
#Load data
basin = pd.read_csv('data/Colorado River Basin Water Conflict Table.csv')

# 3. Preliminary Data Exploration

In [24]:
# Setting pandas to display all columns
pd.set_option('display.max_columns', None)

In [8]:
basin.head()

Unnamed: 0,Event,Search Source,Newspaper,Article Title,Duplicate,Report Date,Report Year,Event Date,Event Day,Event Month,...,Article Text Search - water rights,Article Text Search - intergovernmental,Article Text Search - water transfers,Article Text Search - navigation,Article Text Search - fish,Article Text Search - invasive,Article Text Search - diversion,Article Text Search - water diversion,Article Text Search - instream,Article Text Search - aquatic
0,1,USGS1-50.docx,The Durango Herald (Colorado),Tribes assert water rights on Colorado River B...,False,7-Apr-22,2022.0,,,4.0,...,17,0,0,0,0,0,0,0,0,0
1,2,USGS1-50.docx,"Journal, The (Cortez, Dolores, Mancos, CO)",Native American tribes assert water rights on ...,False,7-Apr-22,2022.0,,,4.0,...,17,0,0,0,0,0,0,0,0,0
2,3,USGS1-50.docx,The Salt Lake Tribune,'Very positive change.' New Utah law will be a...,False,17-Mar-22,2022.0,,,3.0,...,12,0,0,0,1,0,0,0,12,1
3,4,USGS1-50.docx,Casa Grande Dispatch (AZ),Legislation would let an Arizona tribe lease C...,False,11-Dec-21,2021.0,,,12.0,...,6,0,0,0,0,0,0,0,0,0
4,5,USGS1-50.docx,The Aspen Times (Colorado),Historically excluded from Colorado River poli...,False,19-Dec-21,2021.0,,,11.0,...,18,0,0,0,0,0,0,0,0,0


In [9]:
basin.tail()

Unnamed: 0,Event,Search Source,Newspaper,Article Title,Duplicate,Report Date,Report Year,Event Date,Event Day,Event Month,...,Article Text Search - water rights,Article Text Search - intergovernmental,Article Text Search - water transfers,Article Text Search - navigation,Article Text Search - fish,Article Text Search - invasive,Article Text Search - diversion,Article Text Search - water diversion,Article Text Search - instream,Article Text Search - aquatic
263,264,USGS301-350.docx,The Durango Herald (Colorado),Water officials consider action for worst case...,False,9-Jun-09,2019.0,6,,6.0,...,3,0,0,0,0,0,0,0,0,0
264,265,USGS301-350.docx,"Rio Blanco Herald Times (Meeker, Colorado)",Rangely hosts Colorado River District event,False,21-Apr-22,2022.0,4/13/2022,13.0,4.0,...,3,0,0,0,0,1,1,0,0,0
265,266,USGS301-350.docx,Casa Grande Dispatch (AZ),California water district lawsuit threatens dr...,False,18-Apr-19,2019.0,4,,4.0,...,2,0,0,0,0,0,0,0,0,0
266,267,USGS301-350.docx,The Salt Lake Tribune,Scientists want to flush water past Glen Canyo...,False,13-Dec-07,2007.0,12,,12.0,...,0,0,0,0,0,2,0,0,0,0
267,268,USGS301-350.docx,Casa Grande Dispatch (AZ),Arizona plan could devastate Pinal farmers,,2-Feb-19,2019.0,1/31/2019,31.0,1.0,...,2,0,0,0,0,0,0,0,0,0


In [10]:
basin.shape

(268, 48)

In [11]:
basin.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 268 entries, 0 to 267
Data columns (total 48 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   Event                                         268 non-null    int64  
 1   Search Source                                 268 non-null    object 
 2   Newspaper                                     268 non-null    object 
 3   Article Title                                 268 non-null    object 
 4   Duplicate                                     267 non-null    object 
 5   Report Date                                   267 non-null    object 
 6   Report Year                                   265 non-null    float64
 7   Event Date                                    248 non-null    object 
 8   Event Day                                     18 non-null     float64
 9   Event Month                                   212 non-null    flo

In [12]:
basin.columns.tolist()

['Event',
 'Search Source',
 'Newspaper',
 'Article Title',
 'Duplicate',
 'Report Date',
 'Report Year',
 'Event Date',
 'Event Day',
 'Event Month',
 'Event Year',
 'Conflict Present',
 'Crisis Present',
 'Basin',
 'HUC6',
 'HUC2',
 'Place',
 'County',
 'County FIPS',
 'State',
 'State FIPS',
 'Urban or Rural',
 'Issue Type',
 'Event Summary',
 'Stakeholders',
 'Intensity Value',
 'Comments',
 'Related Observation Themes',
 'Article Text Search - water quality',
 'Article Text Search - invasive species',
 'Article Text Search - conservation',
 'Article Text Search - drought',
 'Article Text Search - flood',
 'Article Text Search - ground water depletion',
 'Article Text Search - depletion',
 'Article Text Search - infrastructure',
 'Article Text Search - fish passage',
 'Article Text Search - instream water rights',
 'Article Text Search - water rights',
 'Article Text Search - intergovernmental',
 'Article Text Search - water transfers',
 'Article Text Search - navigation',
 'Articl

In [14]:
basin.describe(include='object')

Unnamed: 0,Search Source,Newspaper,Article Title,Duplicate,Report Date,Event Date,Conflict Present,Crisis Present,Basin,HUC6,...,Place,County,State,State FIPS,Urban or Rural,Issue Type,Event Summary,Stakeholders,Comments,Related Observation Themes
count,268,268,268,267,267,248,252,254,250,110,...,254,8,178,178,254,255,256,255,190,86
unique,6,45,267,2,235,52,2,2,25,22,...,135,3,23,19,6,19,253,199,184,21
top,USGS1-50.docx,The Arizona Republic (Phoenix),40 Million People Rely on the Colorado River. ...,False,1-Feb-19,12,Y,Y,Entire Colorado River Basin,140801,...,Entire Colorado River Basin,Yuma,AZ,4,Both,Drought,Ute Mountain and Southern Ute representatives ...,All Water Users,Fact piece,Lack of tribal representation
freq,50,45,2,265,4,23,145,203,87,17,...,19,4,66,66,184,66,2,7,5,26


In [15]:
#Missing values
basin.isnull().sum()

Event                                             0
Search Source                                     0
Newspaper                                         0
Article Title                                     0
Duplicate                                         1
Report Date                                       1
Report Year                                       3
Event Date                                       20
Event Day                                       250
Event Month                                      56
Event Year                                       11
Conflict Present                                 16
Crisis Present                                   14
Basin                                            18
HUC6                                            158
HUC2                                             18
Place                                            14
County                                          260
County FIPS                                     260
State       

Object `split()` not found.


In [None]:
# Exampine State Code

# 8)

In [25]:
# 3. Filter rows where water conflict is reported ('Yes')
water_conflict_df = basin[basin['Conflict Present'].str.upper() == 'Y']
water_conflict_df

Unnamed: 0,Event,Search Source,Newspaper,Article Title,Duplicate,Report Date,Report Year,Event Date,Event Day,Event Month,Event Year,Conflict Present,Crisis Present,Basin,HUC6,HUC2,Place,County,County FIPS,State,State FIPS,Urban or Rural,Issue Type,Event Summary,Stakeholders,Intensity Value,Comments,Related Observation Themes,Article Text Search - water quality,Article Text Search - invasive species,Article Text Search - conservation,Article Text Search - drought,Article Text Search - flood,Article Text Search - ground water depletion,Article Text Search - depletion,Article Text Search - infrastructure,Article Text Search - fish passage,Article Text Search - instream water rights,Article Text Search - water rights,Article Text Search - intergovernmental,Article Text Search - water transfers,Article Text Search - navigation,Article Text Search - fish,Article Text Search - invasive,Article Text Search - diversion,Article Text Search - water diversion,Article Text Search - instream,Article Text Search - aquatic
0,1,USGS1-50.docx,The Durango Herald (Colorado),Tribes assert water rights on Colorado River B...,False,7-Apr-22,2022.0,,,4.0,2022.0,Y,N,Upper San Juan,140801,14,"Durango, CO",La Plata,8067.0,CO,8,Both,Water rights more generally,Ute Mountain and Southern Ute representatives ...,"Tribal Nations, State Government, Federal Gove...",2.0,The article highlights calls for negotiation b...,Lack of tribal representation,0,0,3,7,0,0,0,1,0,0,17,0,0,0,0,0,0,0,0,0
1,2,USGS1-50.docx,"Journal, The (Cortez, Dolores, Mancos, CO)",Native American tribes assert water rights on ...,False,7-Apr-22,2022.0,,,4.0,2022.0,Y,N,Upper San Juan,140801,14,"Durango, CO",La Plata,8067.0,CO,8,Both,Water rights more generally,Ute Mountain and Southern Ute representatives ...,"Southern Ute Indian Tribe, Ute Mountain Tribe,...",2.0,The article highlights calls for negotiation b...,Lack of tribal representation,0,0,2,7,0,0,0,1,0,0,17,0,0,0,0,0,0,0,0,0
4,5,USGS1-50.docx,The Aspen Times (Colorado),Historically excluded from Colorado River poli...,False,19-Dec-21,2021.0,,,11.0,2021.0,Y,Y,Upper San Juan,140801,14,Southern Ute Indian Reservation,,,,,Rural,Intergovernmental issues,State and federal officials say that Tribal Na...,"Sothern Ute Indian Tribe, Ute Mountain Tribe, ...",-1.0,Interaction between tribal nations and state/f...,Lack of tribal representation,0,0,2,6,0,0,0,7,0,0,18,0,0,0,0,0,0,0,0,0
5,6,USGS1-50.docx,The Arizona Republic (Phoenix),Everyone loses if we cannot agree on how we us...,False,22-Apr-17,2017.0,,,4.0,2017.0,Y,Y,Entire Lower Colorado Basin,,15,State of Arizona,,,AZ,4,Both,Intergovernmental issues,Water management agencies are in a legal dispu...,"Water Managers, All Water Users, Conservation ...",2.0,Event is delays in negotiations between state/...,,0,0,10,11,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
7,8,USGS1-50.docx,"Navajo Times (Window Rock, Arizona)","Colorado River, stolen by law; Indigenous nati...",False,17-Mar-22,2022.0,,,3.0,2022.0,Y,Y,Entire Colorado River Basin,,"14, 15",Entire Colorado River Basin,,,,,Both,Intergovernmental issues,"Amidst ongoing drought, the Colorado River Ind...","Colorado River Indian Tribes, State Government...",2.0,Although the article focuses quite a bit on hi...,,0,0,0,4,0,0,0,6,0,0,15,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
257,258,USGS301-350.docx,The Arizona Republic (Phoenix),"At Canyon, water battle rages anew",False,22-Feb-09,2009.0,1,,1.0,2009.0,Y,Y,Upper Colorado-Dirty Devil,140700,14,"Glen Canyon Dam, AZ",,,AZ,4,Both,Instream water rights,"After formal protests were ignored, the Superi...","Federal Government, State Government, Environm...",-3.0,Event is submission of formal complaint memo f...,Shallow engagement strategies,0,0,5,0,10,0,0,0,0,0,2,0,0,0,4,0,0,0,0,0
258,259,USGS301-350.docx,The Salt Lake Tribune,Alder: A new set of negotiations need for The ...,False,29-Aug-08,2008.0,8,,8.0,2008.0,Y,Y,Entire Colorado River Basin,,"14, 15",Utah,,,UT,49,Both,Intergovernmental issues,An op ed written by the Associate Dean for Aca...,"State Government, Federal Government, Environm...",-1.0,Event is letter written by dean disputing comm...,,0,0,0,1,0,0,0,0,0,0,4,0,0,0,2,0,0,0,2,0
260,261,USGS301-350.docx,Farmington Daily Times (New Mexico),Experts predict low San Juan County river flow...,False,11-Apr-15,2015.0,4,,4.0,2015.0,Y,Y,Lower San Juan,140802,14,San Juan River,,,,,Both,Drought,The Colorado Basin River Forecast Center predi...,"Navajo Nation, State Government, Water Manager...",-1.0,Event is release of forecast which prompts wat...,Devaluing tribal land and resources,0,0,1,4,0,0,0,0,0,0,7,0,0,0,0,0,0,0,0,0
262,263,USGS301-350.docx,Associated Press State & Local,Officials: Arizona will miss US deadline for k...,False,20-Feb-19,2019.0,2/19/2019,19.0,2.0,2019.0,Y,Y,Entire Lower Colorado Basin,,15,State of Arizona,,,AZ,4,Both,Conservation; Water rights more generally,The state of Arizona announced they won't have...,"State Government, Federal Government, All Wate...",-2.0,Scored a -2 based on the disagreements among t...,Inequitable government access/relationship,0,0,0,6,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0


In [29]:
# 4. Find unique states reporting water conflicts
states_with_conflict = water_conflict_df['State'].unique()
states_with_conflict

array(['CO', nan, 'AZ', 'OH; UT', 'UT', 'CA', 'AZ; NV', 'CO; UT; WY; NM',
       'AZ; CA', 'AZ; UT', 'NV; AZ', 'AZ; CA; CO; NV; NM; UT; WY', 'NV',
       'NM', 'UT; CO; WY', 'AZ; NM', 'WY; UT; CO', 'CO; AZ'], dtype=object)

# Problem

1) One problem would be that conflicts may be at one place but the water issue may affect multiple places (or states)
2) Spelling or character variation
3) NAs

In [30]:
# Looking at the list of unique states that have water conflicts
print("States reported as having water conflict:")
print(states_with_conflict, "")

States reported as having water conflict:
['CO' nan 'AZ' 'OH; UT' 'UT' 'CA' 'AZ; NV' 'CO; UT; WY; NM' 'AZ; CA'
 'AZ; UT' 'NV; AZ' 'AZ; CA; CO; NV; NM; UT; WY' 'NV' 'NM' 'UT; CO; WY'
 'AZ; NM' 'WY; UT; CO' 'CO; AZ'] 


### 1)

1st call the columns
2nd remove white
3rd create a rule for special chars and spaces
4th split the letters by 2 by index
5th find unique spelling
6th assign upper case


a) Perform the following wrangling:
 - select the State column from the df data frame
 - split the strings in the column by the delimeter ; into different columns
 - stack the results of the resulting data frame into a single pandas.Series
 - find the unique string values in the resulting series

In [41]:
clean_df = basin['State'].str.split(';', expand=True).stack().unique()
clean_df

array(['CO', 'UT', 'AZ', 'OH', ' UT', ' CO', ' NM', 'CA', ' NV', ' WY',
       ' CA', ' AZ', 'NV', 'NM', 'WY', 'TX'], dtype=object)