## California Climate Investment Projects Crosswalk - Indicator & Climate Risk Mitigation Columns
This notebook analyses CCI funded programs and projects by connecting each CCI project with an indicator and climate risk mitigation outlined by ERA and CARB using a keyword search function. 

At present, the CCI data comprises 133,696 funded projects between 2015 and 2023. 

## Step One: Indicator Columns:
The detected  indicators are:
* Vulnerable populations
* Social Services
* Economic Health
* Emergency Response
* Personal preparedness
* Community preparedness
* Natural resources conservation
* Ecosystem type, condition, conservation
* Agricultural productivity conservation
* Transportation infrastructure
* Communication infrastructure
* Utilities infrastructure
* Housing vacancy and quality
* Wildfire exposure
* Wildfire loss
* Inland flooding exposure
* Inland flooding loss
* Extreme heat exposure
* Extreme heat loss
* Drought exposure
* Drought loss
* Sea level rise exposure
* Sea level rise loss

Analysis Steps: \
CCI data is scanned for common metric keywords associated with the defined indicators via a dictionary to automatically assign an indicator based on any keyword found in the following columns from the CCI funded programs dataset: 
* category
* sector
* project descriptions
* project type
* program description
* sub program name
* other project benefits description
* voucher description 

counters are added to reveal the number of times each indicator was detected, as well as the number of times a keyword was found from a specific column 

In [1]:
# Import useful libraries
import os
import boto3
import pandas as pd
import itertools
import re

### Pull the CCI data from Feb 14th. 2024

In [2]:
# Initialize the S3 client
s3_client = boto3.client('s3')

# Bucket name and file paths
bucket_name = 'ca-climate-index'
directory = '0_map_data/crosswalk_data/CCI_Projects_Project_Category_Update_02142024.xlsm'

print('Pulling file')
s3_client.download_file(bucket_name, directory, 'CCI_Projects_Project_Category_Update_02142024.xlsm')
print('File pulled')

Pulling file
File pulled


In [3]:
crosswalk_data = pd.read_excel('CCI_Projects_Project_Category_Update_02142024.xlsm')

#### Display all columns

In [4]:
print('Number of columns:', len(crosswalk_data.columns.tolist()))
display(crosswalk_data.columns.tolist())

Number of columns: 130


['Project IDNumber',
 'Reporting Cycle Name',
 'Agency Name',
 'Program Name',
 'Program Description',
 'Sub Program Name',
 'Record Type',
 'Project Name',
 'Project Type',
 'Project Description',
 'SECTOR',
 'CATEGORY',
 'ACTION',
 'Census Tract',
 'Address',
 'Lat Long',
 'Senate\nDistrict',
 'Assembly\nDistrict',
 'County',
 'Total Project Cost',
 'Total Program GGRFFunding',
 'Project Life Years',
 'Total Project GHGReductions',
 'Annual Project GHGReductions',
 'Project Count',
 'Fiscal Year Funding Project',
 'Is Benefit Disadvantaged Communities',
 'Disadvantaged Community Criteria',
 'Disadvantaged Community Need',
 'Disadvantaged Community Census Tracts',
 'Total GGRFDisadvantaged Community Funding',
 'Disadvantaged Community Benefits Description',
 'Funding Benefiting Disadvantaged Communities',
 'Estimated Num Vehicles In Service',
 'Funding Within Disadvantage Communities',
 'Other Project Benefits Description',
 'VMTReductions',
 'Number Of Housing Units',
 'Number Of Aff

#### Selecting columns relevant to be scanned through in the function below

In [5]:
relevant_columns = [
    'CATEGORY',
    'SECTOR',
    'Project Description',
    'Project Type',
    'Program Description',
    'Sub Program Name',
    'Other Project Benefits Description',
    'Voucher Description'  
]

#### Create a metric-indicator dictionary to scan through data based on dictionary values
* first draft

In [6]:
metric_to_indicator_dict = {
    'Vulnerable populations': ['asthma', 'heart disease', 'myocardial infarction', 'low birth weight', 
                              'less than a high school education', 'linguistic isolation', 'poverty', 
                              'unemployment', 'housing burden', 'at-risk drinking water', 'homelessness', 
                              'without health insurance', 'no health insurance', 'ambulatory disability', 
                              'cognitive disability', 'disability', 'financial assistance', 'over 65', 'under 5', 
                              'violent crime', 'no ac', 'no air conditioning', 'lack air conditioning', 
                              'outdoor employment', 'low food accessibility', 'no food accessibility',
                              'vulnerable population', 'population', 'food desert', 'supermarket', 'grocery',
                              'native', 'food stamp', 'supplemental security income', 'snap', 'cash public assistance income',
                              'english speaking', 'language', 'federal poverty', 'unemployed', 'low income',
                              'housing', 'drinking water'],
    
    'Social Services': ['healthcare', 'mental healthcare', 'substance abuse', 'blood bank', 'organ bank', 
                        'hospital', 'personal care', 'construction', 'rebuild', 'rebuilding', 'home maintenance', 
                       'household', 'narcotic', 'mental health', 'social service'],
    
    'Economic Health': ['income', 'gini index', 'economic diversity', 'economy', 'economic health', 'hachman index'],
    
    'Emergency Response': ['emergency response', 'firefighter', 'fireman', 'nurse', 'nurses', 
                           'law enforcement', 'police', 'fire stations', 'emergency medical care', 
                           'emergency services', 'emergency', 'paramedic', 'emergency technician'],
    
    'Personal preparedness': ['emergency preparation', 'flood insurance', 'homeowners insurance', 'homeowner',
                              'preparation', 'preparedness'],
    
    'Community preparedness': ['disaster funding', 'disaster mitigation', 'mitigation funding', 'mitigation', 
                               'wildfire risk', 'flood risk', 'treatment', 'community', 'preparedness'],
    
    'Natural resources conservation': ['land management', 'watershed', 'water quality', 'natural resources',
                                      'protected area', 'timber management', 'watershed threat', 'contaminant',
                                      'fire prevention', 'forest'],
    
    'Ecosystem type condition conservation': ['ecosystem type', 'biodiversity', 'soil quality', 
                                                'soil cover', 'air quality', 'impervious', 
                                                'habitat conservation', 'habitat preservation', 
                                                'conservation', 'impervious', 'ecosystem',
                                                'natural land', 'fragile soil', 'vulnerable soil', 'healthy soil'],
    
    'Agricultural productivity conservation': ['crop conservation', 'crop condition', 'agricultural productivity', 
                                               'agricultural conservation', 'crop soil', 'crop soil moisture', 
                                              'soil moisture', 'evaporation stress', 'agriculture', 'productivity'],
    
    'Transportation infrastructure': ['highway', 'road', 'roads', 'highways', 'freeways', 'freeway', 
                                      'freight rail network', 'train', 'trains', 'bridge', 'bridges', 'freight', 
                                      'traffic', 'airport', 'airports', 'transportation', 'congestion'],
    
    'Communication infrastructure': ['communication', 'broadband internet', 'radio', 'cell service', 
                                     'cell phone service', 'microwave towers', 'paging', 'television', 
                                     'tv', 'land mobile', 'CB radio', 'broadcast', 'cell tower', 'AM', 'FM',
                                    'transmission tower', 'broadband', 'internet'],
    
    'Utilities infrastructure': ['utilities', 'energy transmission', 'power lines', 'power line', 
                                  'energy production', 'power plant', 'power plants', 'underground power line',
                                  'public safety power shutoff', 'psps', 'PSPS',
                                  'wastewater treatment', 'wastewater','treatment plant'],
    
    'Housing vacancy and quality': ['housing', 'housing vacancy', 'housing quality', 'housing age', 
                                    'housing structures', 'housing structure', 'home', 'house', 'shelter', 
                                   'mobile home', 'vacant home', 'no kitchen', 'no plumbing', 'no water'],
    
    'Wildfire exposure': ['red flag', 'wildfire exposure', 'vulnerable to wildfire', 'exposure to wildfire', 'fire weather'],

    'Wildfire loss' : ['wildfire fatalities', 'wildfire loss', 'wildfire damage', 'loss to wildfire', 'acres burned', 'burn area'],

    'Inland flooding exposure' : ['flood warning', 'floodplain area', 'inland flooding', 'extreme precipitation', 
                                  'surface runoff', 'floodplain', 'flash flood', 'flash warning'],

    'Inland flooding loss' : ['flood claim', 'flood cost', 'flood loss', 'flood cost', 'flood crop damage', 'flood damage', 'flood insurance'],

    'Extreme heat exposure' : ['heat warnings', 'extreme heat', 'warm nights', 'heat exposure'],

    'Extreme heat loss' : ['heat related illness', 'heat illness', 'crop loss from heat', 'chill hours', 'growing season'],

    'Drought exposure': ['drought exposure', 'historical drought', 'drought', 'water reduction', 'drought severity'],

    'Drought loss': ['drought loss', 'crop loss from drought', 'crop loss'],

    'Sea level rise exposure': ['vulnerable coastline', 'sea level rise exposure', 'sea level rise', 'slr', 'SLR', 'sea-level rise'],

    'Sea level rise loss': ['wetland change', 'loss to sea level rise', 'coastal development']
}

#### The metric indicator column function:
* scans for our metric_to_indicator_dict dictionary values through our indicated 'relevant_columns'
    * this scanning is in order of decending value, so it searches through the 'CATEGORY' first, and finishes with 'Voucher Description'
    * it goes through each column but does not re-detect words already found
    * multiple indicators can be found per row
* the function prints the length of the dataset used, how many were not detected, and how many of each indicator was flagged

In [7]:
def metric_indicator_column(df, keyword_dict, relevant_columns, output_csv=None):
    # Initialize new columns to store climate risk mitigation keywords, detected values, repeat counts, and total unique descriptions
    df['Indicator'] = ''
    df['Detected_Metric_Keyword'] = ''
    df['Columns_Detected'] = ''  # New column to store the columns where the keyword was detected

    # Initialize a counter for each keyword
    keyword_counter = {keyword: 0 for keyword in keyword_dict}

    # Initialize a counter for detected columns
    detected_columns_counter = {column: 0 for column in relevant_columns}

    # Iterate through each row
    for index, row in df.iterrows():
        keywords_found = set()  # To store unique keywords found in each row
        detected_values = set()  # To store unique detected values for each row
        detected_columns = set()  # To store unique columns where the keyword was detected
        
        # Iterate through each relevant column
        for column in relevant_columns:
            if column in row:
                detected_keys = [key for key in keyword_dict.keys() if any(re.search(r'\b' + re.escape(val.lower()) + r'\b', str(row[column]).lower()) for val in keyword_dict[key])]
                for detected_key in detected_keys:
                    # Check if any value of the detected key is present in the column (case-insensitive)
                    detected_values.update([val for val in keyword_dict[detected_key] if re.search(r'\b' + re.escape(val.lower()) + r'\b', str(row[column]).lower())])
                    if detected_values:
                        keywords_found.add(detected_key)
                        detected_columns.add(column)

        # Update the 'Indicator' column with detected keywords
        df.at[index, 'Indicator'] = ', '.join(keywords_found)
        # Update the 'Detected_Metric_Keyword' column with detected values
        df.at[index, 'Detected_Metric_Keyword'] = ', '.join(detected_values)
        # Update the 'Columns_Detected' column with detected columns
        columns_detected_str = ', '.join(detected_columns)
        df.at[index, 'Columns_Detected'] = columns_detected_str

    number_without_indicator = df[df['Indicator'] == '']

    print(f'Length of dataset: {len(df)}')
    print('')
    print(f'Number of rows without an indicator entry: {len(number_without_indicator)}')
    print('')
    # Print detected column counts
    print("Detected Column Counts:")
    for index, row in df.iterrows():
        detected_columns = row['Columns_Detected'].split(', ')
        for column in detected_columns:
            if column:
                detected_columns_counter[column] += 1

    for column, count in detected_columns_counter.items():
        print(f"{column}: {count}")
    print('')

    # Count keywords from the 'Indicator' column after populating it
    for index, row in df.iterrows():
        indicators = row['Indicator'].split(', ')
        for indicator in indicators:
            if indicator:  # Check if indicator is not empty
                keyword_counter[indicator] += 1

    # Print keyword counts
    print("Keyword Counts:")
    for keyword, count in keyword_counter.items():
        print(f"{keyword}: {count}")
    print('')

    # Check length of 'Indicator' entries containing 'Transportation infrastructure'
    transportation_indicator_count = len(df[df['Indicator'].str.contains('Transportation infrastructure')])

    print(f"FOR TESTING/FACT CHECKING - Number of 'Indicator' entries containing 'Transportation infrastructure': {transportation_indicator_count}")
    
    # Save DataFrame as CSV if output_csv is provided
    if output_csv:
        df.to_csv(output_csv, index=False)
        print(f"DataFrame saved as {output_csv}")
        print('')

## Select a random 1000 rows from the dataset to run the function on (if desired)

In [8]:
sample_data = crosswalk_data.sample(1000)

### Testing function on the whole dataset, use sample data for testing purposes
* added all relevant columns to display afterwards for analysis
* included a counter in the function to fact check the counters with Transportation infrastructure
* there can be multiple indicators within the indicator column
* there can be multiple columns detected in the columns detected column

In [9]:
metric_indicator_column(crosswalk_data, metric_to_indicator_dict, relevant_columns)
pd.set_option('display.max_colwidth', None)
data_preview = crosswalk_data[['CATEGORY',
                            'SECTOR',
                            'Project Description',
                            'Project Type',
                            'Program Description',
                            'Sub Program Name',
                            'Other Project Benefits Description',
                            'Voucher Description',
                            'Detected_Metric_Keyword', 
                            'Columns_Detected', 
                            'Indicator', 
                            'Project Count']]

data_preview_filtered = data_preview[data_preview['Indicator'] != '']
data_preview_filtered.head(1)

Length of dataset: 133698

Number of rows without an indicator entry: 2243

Detected Column Counts:
CATEGORY: 6634
SECTOR: 8530
Project Description: 23967
Project Type: 4474
Program Description: 125754
Sub Program Name: 10623
Other Project Benefits Description: 25377
Voucher Description: 265

Keyword Counts:
Vulnerable populations: 8635
Social Services: 623
Economic Health: 21355
Emergency Response: 342
Personal preparedness: 74
Community preparedness: 125955
Natural resources conservation: 18417
Ecosystem type condition conservation: 9187
Agricultural productivity conservation: 1632
Transportation infrastructure: 116899
Communication infrastructure: 112
Utilities infrastructure: 74
Housing vacancy and quality: 7093
Wildfire exposure: 64
Wildfire loss: 33
Inland flooding exposure: 13
Inland flooding loss: 0
Extreme heat exposure: 7
Extreme heat loss: 1
Drought exposure: 1105
Drought loss: 0
Sea level rise exposure: 85
Sea level rise loss: 0

FOR TESTING/FACT CHECKING - Number of 'Indic

Unnamed: 0,CATEGORY,SECTOR,Project Description,Project Type,Program Description,Sub Program Name,Other Project Benefits Description,Voucher Description,Detected_Metric_Keyword,Columns_Detected,Indicator,Project Count
0,Light-Duty Vehicles,"Zero-Emission Vehicles, Equipment, and Infrastructure","CVRP promotes clean vehicle adoption in California by offering rebates from $1,000 to $7,502 for the purchase or lease of new, eligible zero-emission vehicles, including electric, plug-in hybrid electric and fuel cell vehicles.",,"Provides mobile source incentives to reduce GHG emissions, criteria pollutants, and air toxics through the development of advanced technology and clean transportation. The program is comprised of sub-programs that provide a variety of disadvantaged community benefits.\n\nCARB also provides incentives to help households replace an uncertified wood stove, wood insert, or fireplace used as a primary source of heat with a cleaner burning and more efficient device.",Clean Vehicle Rebate Project,"CVRP promotes clean vehicle adoption in California by offering rebates from $1,000 to $7,500 for the purchase or lease of new, eligible zero-emission vehicles, including electric, plug-in hybrid electric and fuel cell vehicles.",,"community, transportation",Program Description,"Community preparedness, Transportation infrastructure",7.0


## Step two: Add the climate mitigation column to this dataset:
For the purposes of this project, the term 'climate risk' includes the following: 
* Extreme heat
* Inland flooding
* Sea level rise
* Wildfire
* Drought

Analysis Steps: \
This process is extremely similar to how we created the indicator column above. The CCI data is scanned for common keywords associated with the defined climate risks via a dictionary to automatically assign a climate risk based on any keyword found in the same relevant columns for the indicator columns:
* category
* sector
* project descriptions
* project type
* program description
* sub program name
* other project benefits description
* voucher description

counters are included below as well

### Climate risk mitigation dictionary

In [10]:
climate_risk_dict = {
    'wildfire mitigation': ['wildfire', 'prescribed fire', 'fire prevention', 'controlled burn', 'controlled_burning', 
                            'prescribed burn', 'prescribed burning' 'firefighting', 'reforest', 'reforestation', 'vegetation management', 
                            'roadside brushing', 'fuel break', 'fuel reduction', 'ignition', 'crown', 'fuel load', 'Fire and Forest Management',
                            'tribal burning', 'fuel treatment', 'hardening', 'wood product', 'biomass facility', 'fire prevention'],
    
    'sea level rise mitigation': ['sea level rise', 'slr', 'seawall', 'seawalls', 'shoreline', 'wetland', 'mangrove', 'coastal','Restoration of riparian', 'sea-level rise'],
    
    'extreme heat mitigation': ['extreme heat', 'shade', 'shading', 'cooling center', 'cooling centers', 'heat-resistant', 
                                'heat resistant', 'heat reducing', 'heat-reducing', 'energy savings', 'urban forestry',
                                'urban greening', 'canopy', 'weatherization'],
    
    'drought mitigation': ['drought', 'irrigation', 'soil moisture', 'rainwater harvest', 'rainwater harvesting', 'water storage', 
                           'water allocation', 'water management', 'soil health', 'soil management', 'organic matter', 'water efficiency',
                           'water conservation', 'water use reduction', 'water savings'],
    
    'inland flooding mitigation': ['flooding', 'runoff', 'inland flood', 'inland flooding', 'floodplain', 'flood proof', 'floodproofing', 
                                   'elevated flood', 'flood barrier', 'flood barriers', 'drainage', 'riparian', 'stormwater',
                                   'delta', 'upland wetlands']
} 

## Function to create the climate mitigation column

This function is extremely similar to the indicator function

* the resulting sample df from the metric_indicator_column function is brought into this function so the final result is a CCI dataset with climate risk mitigation AND indicator columns

In [11]:
def climate_mitigation_column(df, keyword_dict, relevant_columns, output_csv=None):
    # Initialize new columns to store climate risk mitigation keywords, detected values, repeat counts, and total unique descriptions
    df['Climate_Risk_Mitigation'] = ''
    df['Detected_Climate_Risk_Mitigation_Keyword'] = ''
    df['Columns_Detected_Climate_Risk'] = ''  # New column to store the columns where the keyword was detected

    # Initialize a counter for each keyword
    keyword_counter = {keyword: 0 for keyword in keyword_dict}

    # Initialize a counter for detected columns
    detected_columns_counter = {column: 0 for column in relevant_columns}

    # Iterate through each row
    for index, row in df.iterrows():
        keywords_found = set()  # To store unique keywords found in each row
        detected_values = set()  # To store unique detected values for each row
        detected_columns = set()  # To store unique columns where the keyword was detected
        
        # Iterate through each relevant column
        for column in relevant_columns:
            if column in row:
                detected_keys = [key for key in keyword_dict.keys() if any(re.search(r'\b' + re.escape(val.lower()) + r'\b', str(row[column]).lower()) for val in keyword_dict[key])]
                for detected_key in detected_keys:
                    # Check if any value of the detected key is present in the column (case-insensitive)
                    detected_values.update([val for val in keyword_dict[detected_key] if re.search(r'\b' + re.escape(val.lower()) + r'\b', str(row[column]).lower())])
                    if detected_values:
                        keywords_found.add(detected_key)
                        detected_columns.add(column)

        # Update the 'Climate_Risk_Mitigation' column with detected keywords
        df.at[index, 'Climate_Risk_Mitigation'] = ', '.join(keywords_found)
        # Update the 'Detected_Climate_Risk_Mitigation_Keyword' column with detected values
        df.at[index, 'Detected_Climate_Risk_Mitigation_Keyword'] = ', '.join(detected_values)
        # Update the 'Columns_Detected' column with detected columns
        columns_detected_str = ', '.join(detected_columns)
        df.at[index, 'Columns_Detected_Climate_Risk'] = columns_detected_str

    number_without_climate_risk = df[df['Climate_Risk_Mitigation'] == '']

    print(f'Length of dataset: {len(df)}')
    print('')
    print(f'Number of rows without an climate risk entry: {len(number_without_climate_risk)}')
    print('')
    # Print detected column counts
    print("Detected Column Counts:")
    for index, row in df.iterrows():
        detected_columns = row['Columns_Detected_Climate_Risk'].split(', ')
        for column in detected_columns:
            if column:
                detected_columns_counter[column] += 1

    for column, count in detected_columns_counter.items():
        print(f"{column}: {count}")
    print('')

    # Count keywords from the 'Climate_Risk_Mitigation' column after populating it
    for index, row in df.iterrows():
        climate_risk = row['Climate_Risk_Mitigation'].split(', ')
        for climate in climate_risk:
            if climate:  # Check if climate risk is not empty
                keyword_counter[climate] += 1

    # Print keyword counts
    print("Keyword Counts:")
    for keyword, count in keyword_counter.items():
        print(f"{keyword}: {count}")
    print('')

    # Check length of 'Climate_Risk_Mitigation' entries containing 'Transportation infrastructure'
    wildfire_count = len(df[df['Climate_Risk_Mitigation'].str.contains('wildfire mitigation')])

    print(f"TESTING/FACT CHECKING: Number of 'Indicator' entries containing 'wildfire mitigation': {wildfire_count}")
    
    # Save DataFrame as CSV if output_csv is provided
    if output_csv:
        df.to_csv(output_csv, index=False)
        print(f"DataFrame saved as {output_csv}")
        print('')

### Calling the function, adding the relevant columns (including indicator columns)

* also includes a print statement to see how many wildfire mitigations are in the dataset to fact check the counter

In [12]:
climate_mitigation_column(crosswalk_data, climate_risk_dict, relevant_columns) #, 'cci_project_indicators.csv')
pd.set_option('display.max_colwidth', None)
data_preview = crosswalk_data[['CATEGORY',
                            'SECTOR',
                            'Project Description',
                            'Project Type',
                            'Program Description',
                            'Sub Program Name',
                            'Other Project Benefits Description',
                            'Voucher Description',
                            'Detected_Metric_Keyword', 
                            'Columns_Detected', 
                            'Indicator', 
                            'Climate_Risk_Mitigation',
                            'Detected_Climate_Risk_Mitigation_Keyword',
                            'Columns_Detected_Climate_Risk',
                            'Project Count']]

data_preview_filtered = data_preview[data_preview['Climate_Risk_Mitigation'] != '']
data_preview_filtered.head(10)

Length of dataset: 133698

Number of rows without an climate risk entry: 116876

Detected Column Counts:
CATEGORY: 6980
SECTOR: 2376
Project Description: 10496
Project Type: 1876
Program Description: 4701
Sub Program Name: 1517
Other Project Benefits Description: 9412
Voucher Description: 33

Keyword Counts:
wildfire mitigation: 3006
sea level rise mitigation: 139
extreme heat mitigation: 9842
drought mitigation: 7367
inland flooding mitigation: 717

TESTING/FACT CHECKING: Number of 'Indicator' entries containing 'wildfire mitigation': 3006


Unnamed: 0,CATEGORY,SECTOR,Project Description,Project Type,Program Description,Sub Program Name,Other Project Benefits Description,Voucher Description,Detected_Metric_Keyword,Columns_Detected,Indicator,Climate_Risk_Mitigation,Detected_Climate_Risk_Mitigation_Keyword,Columns_Detected_Climate_Risk,Project Count
15688,Inland and Mountain Meadow,Wetland Restoration and Management,Restore 90 acres of Osa Meadow using the pond and plug technique. The project is designed to enhance the meadows ability to sequester carbon and provide an array of co-benefits.,Mountain Meadow,"The Wetlands Restoration for Greenhouse Gas Reduction Grant Program funds projects that reduce greenhouse gases and provide co-benefits such as enhancing fish and wildlife habitat, protecting and improving water quality and quantity and helping California adapt to climate change.",Wetlands and Watershed Restoration,"Restore habitat for mountain yellow-legged frog and Kern River rainbow trout., Raise local groundwater within the meadow, and improve water quality by reconnecting the stream to the floodplain of Osa Meadow.",,"watershed, water quality, floodplain","Sub Program Name, Other Project Benefits Description, Program Description","Natural resources conservation, Inland flooding exposure","sea level rise mitigation, inland flooding mitigation","wetland, floodplain","SECTOR, Other Project Benefits Description",1.0
15689,Inland and Mountain Meadow,Wetland Restoration and Management,"Restore 253 acres of degraded dry mountain meadow habitat (Greenville Creek [181 ac] and Upper Goodrich [72 ac] meadows), using the pond and plug technique and other actions to increase carbon sequestration and provide co-benefits.",Mountain Meadow,"The Wetlands Restoration for Greenhouse Gas Reduction Grant Program funds projects that reduce greenhouse gases and provide co-benefits such as enhancing fish and wildlife habitat, protecting and improving water quality and quantity and helping California adapt to climate change.",Wetlands and Watershed Restoration,"Improve groundwater levels. Improve and create nesting, foraging, and resting habitat for waterfowl with 12.1 acres of ponded water habitat. Improved vegetative vigor with the riparian community.",,"community, watershed, water quality","Sub Program Name, Other Project Benefits Description, Program Description","Natural resources conservation, Community preparedness","sea level rise mitigation, inland flooding mitigation","wetland, riparian","SECTOR, Other Project Benefits Description",1.0
15690,Coastal and Delta,Wetland Restoration and Management,"Construction of a 700 ac Whale's Mouth Wetland and restoration of 1,000 acres of Belly Wetland. Permanent palustrine emergent wetlands will sequester GHG, provide co-benefits (subsidence reversal, improved levee stability, wildlife habitat).",Delta Wetland,"The Wetlands Restoration for Greenhouse Gas Reduction Grant Program funds projects that reduce greenhouse gases and provide co-benefits such as enhancing fish and wildlife habitat, protecting and improving water quality and quantity and helping California adapt to climate change.",Wetlands and Watershed Restoration,"Improves water quality and wildlife habitat. Increase diversity and relative cover of native plant species and minimize the establishment and growth of non-native, invasive plant species.",,"construction, watershed, water quality, native","Sub Program Name, Other Project Benefits Description, Project Description, Program Description","Natural resources conservation, Vulnerable populations, Social Services","sea level rise mitigation, inland flooding mitigation","wetland, coastal, delta","SECTOR, Project Description, Project Type, CATEGORY",1.0
15691,Inland and Mountain Meadow,Wetland Restoration and Management,"Restore up to 37 acres of meadow and 2 acres of riparian habitat with a variety of measures, (e.g. modification of a culvert intake, construction of a new channel downstream of diversion, stabilizing confluence with Martis Creek, revegetation, etc).",Mountain Meadow,"The Wetlands Restoration for Greenhouse Gas Reduction Grant Program funds projects that reduce greenhouse gases and provide co-benefits such as enhancing fish and wildlife habitat, protecting and improving water quality and quantity and helping California adapt to climate change.",Wetlands and Watershed Restoration,Increase water storage in a degraded meadow system. Reduced in-channel erosion. Improve fish passage/fish habitat in 1 mile of existing stream channel and restore associated 2 acres of riparian wetland.,,"construction, watershed, water quality","Sub Program Name, Project Description, Program Description","Natural resources conservation, Social Services","sea level rise mitigation, inland flooding mitigation, drought mitigation","water storage, wetland, riparian","SECTOR, Project Description, Other Project Benefits Description",1.0
15692,Inland and Mountain Meadow,Wetland Restoration and Management,Restore 80 acres of Childs Meadow using cost-effective Beaver Dam Analogues and riparian fencing. Restoration actions are designed to increase carbon sequestration and provide co-benefits.,Mountain Meadow,"The Wetlands Restoration for Greenhouse Gas Reduction Grant Program funds projects that reduce greenhouse gases and provide co-benefits such as enhancing fish and wildlife habitat, protecting and improving water quality and quantity and helping California adapt to climate change.",Wetlands and Watershed Restoration,Increase groundwater levels. Increase habitat by 60% based on stream miles in the restored portion of the meadow for two sensitive meadow species known to occur in the unimpacted portion of the meadow: Cascades frog and willow flycatcher,,"watershed, water quality","Sub Program Name, Program Description",Natural resources conservation,"sea level rise mitigation, inland flooding mitigation","wetland, riparian","SECTOR, Project Description",1.0
15693,Coastal and Delta,Wetland Restoration and Management,"Restore 34 acres of diverse coastal wetlands and 20 acres of upland habitat, connected to Devereux Slough. The project is designed to sequester GHGs and provide co-benefits (habitat, reduce localized flooding, provide educational opportunities).",Coastal Wetland,"The Wetlands Restoration for Greenhouse Gas Reduction Grant Program funds projects that reduce greenhouse gases and provide co-benefits such as enhancing fish and wildlife habitat, protecting and improving water quality and quantity and helping California adapt to climate change.",Wetlands and Watershed Restoration,"Provide habitat for existing population of endangered tidewater goby and recovery of threatened/endangered species e.g., California red-legged frog, Western snowy plover, California least tern, Ventura Marsh milk-vetch and Belding’s savannah sparrow.",,"watershed, population, water quality","Sub Program Name, Other Project Benefits Description, Program Description","Natural resources conservation, Vulnerable populations","sea level rise mitigation, inland flooding mitigation","wetland, flooding, coastal, delta","SECTOR, Project Description, Project Type, CATEGORY",1.0
15694,Coastal and Delta,Wetland Restoration and Management,Restore 61 acres of tidal salt marsh and 5 acres of a perennial grassland buffer in the southern area of Elkhorn Slough. The project is designed to restore coastal wetlands to reduce GHGs and improve important estuarine habitat.,Coastal Wetland,"The Wetlands Restoration for Greenhouse Gas Reduction Grant Program funds projects that reduce greenhouse gases and provide co-benefits such as enhancing fish and wildlife habitat, protecting and improving water quality and quantity and helping California adapt to climate change.",Wetlands and Watershed Restoration,"Improves water quality and wildlife habitat, including habitat for Southern Sea Otter. Reduce tidal scour in Elkhorn Slough through adding sediment to historically diked and drained areas.",,"watershed, water quality","Sub Program Name, Other Project Benefits Description, Program Description",Natural resources conservation,"sea level rise mitigation, inland flooding mitigation","wetland, coastal, delta","SECTOR, Project Description, Project Type, CATEGORY",1.0
15695,Inland and Mountain Meadow,Wetland Restoration and Management,"Restore/enhance 39 acres of wet meadow using pond and plug restoration technique to increase the capacity of the meadow to sequester carbon and provide co-benefits (reduce downstream sedimentation, improve water quality, improve wildlife habitat).",Mountain Meadow,"The Wetlands Restoration for Greenhouse Gas Reduction Grant Program funds projects that reduce greenhouse gases and provide co-benefits such as enhancing fish and wildlife habitat, protecting and improving water quality and quantity and helping California adapt to climate change.",Wetlands and Watershed Restoration,"Improve water quality. Improve habitat for native plants, fish, and wildlife. Habitat quantity and quality are increased for aquatic biota and migratory and special-status species birds, such as great grey owls and bald eagles.",,"watershed, water quality, native","Sub Program Name, Other Project Benefits Description, Project Description, Program Description","Natural resources conservation, Vulnerable populations",sea level rise mitigation,wetland,SECTOR,1.0
15696,Inland and Mountain Meadow,Wetland Restoration and Management,"Restore Loney Meadow 47.2 ac, Deer Meadow 46.1 ac, and Bear Trap Meadow 72.0 ac through stream channel and gully restoration and road drainage improvements; reclaiming old roads; restoring natural flow paths; and re-vegetation work.",Mountain Meadow,"The Wetlands Restoration for Greenhouse Gas Reduction Grant Program funds projects that reduce greenhouse gases and provide co-benefits such as enhancing fish and wildlife habitat, protecting and improving water quality and quantity and helping California adapt to climate change.",Wetlands and Watershed Restoration,"Improve water quantity and quality. Decrease sedimentation downstream of mountain meadows. Restore and expand habitat for native plants, fish, and wildlife, including sensitive species.",,"road, roads, water quality, native, watershed","Sub Program Name, Other Project Benefits Description, Project Description, Program Description","Natural resources conservation, Vulnerable populations, Transportation infrastructure","sea level rise mitigation, inland flooding mitigation","wetland, drainage","SECTOR, Project Description",1.0
15697,Inland and Mountain Meadow,Wetland Restoration and Management,"Restore and enhance up to 37 acres of degraded wet meadow and 2 acres of riparian habitat (modification of a culvert intake, construction of a new channel, restoration of headcuts in existing channel, and revegetation).",Mountain Meadow,"The Wetlands Restoration for Greenhouse Gas Reduction Grant Program funds projects that reduce greenhouse gases and provide co-benefits such as enhancing fish and wildlife habitat, protecting and improving water quality and quantity and helping California adapt to climate change.",Wetlands and Watershed Restoration,Improved and reconnect hydrology of the meadow. Reduced erosion. Restored function in order to improve meadow condition and wildlife habitat.,,"construction, watershed, water quality","Sub Program Name, Project Description, Program Description","Natural resources conservation, Social Services","sea level rise mitigation, inland flooding mitigation","wetland, riparian","SECTOR, Project Description",1.0


#### Get rid of columns used for analysis so we just add the indicator and climate mitigation columns, save as a csv, and upload to AWS

In [14]:
final_crosswalk_data = crosswalk_data.drop(columns=['Detected_Metric_Keyword',
                                               'Columns_Detected',
                                                'Columns_Detected_Climate_Risk',
                                                'Detected_Climate_Risk_Mitigation_Keyword'])
output_csv = 'final_cci_project_indicators_and_climate_risk.csv'

final_crosswalk_data.to_csv(output_csv, index=False)
print(f'Dataframe saved as {output_csv}')
print('')
# Initialize the S3 client
s3_client = boto3.client('s3')

# Bucket name and file paths
bucket_name = 'ca-climate-index'
directory = f'0_map_data/crosswalk_data/{output_csv}'
# Upload the CSV file to S3
print(f'Uploading {output_csv} to AWS')
with open(output_csv, 'rb') as file:
    s3_client.upload_fileobj(file, bucket_name, directory)
    print(f'Upload complete! File is in {directory}')

Dataframe saved as final_cci_project_indicators_and_climate_risk.csv

Uploading final_cci_project_indicators_and_climate_risk.csv to AWS
Upload complete! File is in 0_map_data/crosswalk_data/final_cci_project_indicators_and_climate_risk.csv


### Also uploading a csv file without dropping the contextual columns in case they are desired

In [15]:
output_csv = 'final_cci_project_indicators_and_climate_risk_with_contextual_columns.csv'

crosswalk_data.to_csv(output_csv, index=False)
print(f'Dataframe saved as {output_csv}')
print('')
# Initialize the S3 client
s3_client = boto3.client('s3')

# Bucket name and file paths
bucket_name = 'ca-climate-index'
directory = f'0_map_data/crosswalk_data/{output_csv}'
# Upload the CSV file to S3
print(f'Uploading {output_csv} to AWS')
with open(output_csv, 'rb') as file:
    s3_client.upload_fileobj(file, bucket_name, directory)
    print(f'Upload complete! File is in {directory}')

Dataframe saved as final_cci_project_indicators_and_climate_risk_with_contextual_columns.csv

Uploading final_cci_project_indicators_and_climate_risk_with_contextual_columns.csv to AWS
Upload complete! File is in 0_map_data/crosswalk_data/final_cci_project_indicators_and_climate_risk_with_contextual_columns.csv
