<a href="https://colab.research.google.com/github/chchang47/bio108tutorial/blob/main/DataInteractionNotebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overview

This notebook shows how one can import data from [Chang *et al.* 2025](https://rdcu.be/d10Nh) and examine it for data for particular natural climate solution (NCS) pathways. For more information, please refer to the paper and its SI or the [code repository](https://github.com/lexunit-ai/ncs-evidence-map/).

In [16]:
### Imports
import pandas as pd
import numpy as np
import ast

In [2]:
### Ingesting the data
NCSdf = pd.read_csv("https://github.com/lexunit-ai/ncs-evidence-map/raw/c1b8ec85437e459df622eec0159163f956130390/data/ncs-evidence-map-data.tsv",sep="\t")

  NCSdf = pd.read_csv("https://github.com/lexunit-ai/ncs-evidence-map/raw/c1b8ec85437e459df622eec0159163f956130390/data/ncs-evidence-map-data.tsv",sep="\t")


In [3]:
### Inspecting the data
NCSdf.head()

Unnamed: 0,ESind,abstract,addresses,articletitle,authors,biodiv_species_uid,contains_cost_layer1,cost_extraction_method,doi,dup_title,...,predicted_benefits,predicted_pathway_numbers,publicationdate,publicationtype,publicationyear,pubmedid,researchareas,"timescited,woscore",volume,webofscienceindex
0,8ad8dfd3d9571dd90d13816d5ae97af24133ac4d,"All over the world, there are terraced landsca...",Università di Trento,TERRACES: EXAMPLES OF CONSERVATION AND INNOVATION,Gatti M.P.,Not Found,False,[],Empty,2ac7897944de27a9b8a41673f1c1e54a1e893434,...,"['H', 'B', 'C', 'A', 'K']","['3', '1', '21', '10', '2']",Empty,Empty,2021.0,Empty,Empty,Empty,Empty,Empty
1,ccbd345f7e7e2185bb0e0a56651bcf0fe6fdc747,This contribution summarizes the state of the ...,Empty,"Biodiversité végétale, valeurs naturelles et s...",Lasen C.,Not Found,False,[],10.7320/FlMedit31SI.521,29aafb5a0c6050e0feb044217a00cadf104a7f59,...,"['H', 'K']","['22', '2']",Empty,Empty,2021.0,Empty,Empty,Empty,Empty,Empty
2,c071cfdaff3125cd53ebfea7696f69ee19cc71ee,Resource pulses are occasional events of ephem...,"University of California, Davis",Periodical cicadas as resource pulses in North...,Yang L.,Not Found,False,[],10.1126/science.1103114,eac42e95052a7c128c0a1a116241b0825c98d6a5,...,"['J', 'K', 'B', 'C']","['3', '4']",Empty,Empty,2004.0,Empty,Empty,Empty,Empty,Empty
3,29aaa5e7e671c6c6635c1c5d37576dca8afd78b3,Palms are a resource of great importance in th...,"Instituto de Ecología, A.C.",Palm use and social values in rural communitie...,"González-Marín R.M., Moreno-Casasola P., Orell...","[180698, 366250, 817259, 3948714, 6039154, 762...",False,[],10.1007/s10668-012-9343-y,3db91d9842c370ea3f67a778ae2d76d24dbc8f4c,...,"['J', 'B', 'C', 'I', 'E']","['3', '1', '2']",Empty,Empty,2012.0,Empty,Empty,Empty,Empty,Empty
4,3400ccf3a9ffc2df4b90c74909d5db77461d7bdc,The brown creeper (Certhia americana) is one o...,Université de Moncton,Effects of selection harvesting on bark invert...,"D'Astous E., Villard M.",[697024],False,[],10.2980/19-2-3472,afd0da8a7bf92295ed5af6c4b3fb1012f797c84d,...,"['K', 'J']","['3', '1', '2']",Empty,Empty,2012.0,Empty,Empty,Empty,Empty,Empty


In [4]:
### Inspecting the data - seeing the column names
NCSdf.columns.to_list()

'''
For any pathway and impact analyses, the most salient columns are:
* predicted_pathway_numbers: predicted NCS pathways (FMI: SI Extended Data Table 1)
* predicted_benefits: predicted co-impacts to people, biodiversity, and the environment (FMI: SI Extended Data Table 2)

['ESind',
 'abstract',
 'addresses',
 'articletitle',
 'authors',
 'biodiv_species_uid',
 'contains_cost_layer1',
 'cost_extraction_method',
 'doi',
 'dup_title',
 'efgs_filtered',
 'functional_biomes_filtered',
 'esj_s1',
 'esj_s2',
 'iplc_s1',
 'iplc_s2',
 'issue',
 'language',
 'geolocation_status',
 'location',
 'location_details',
 'predicted_benefits',
 'predicted_pathway_numbers',
 'publicationdate',
 'publicationtype',
 'publicationyear',
 'pubmedid',
 'researchareas',
 'timescited,woscore',
 'volume',
 'webofscienceindex']

### Example data inspection

Let's say that we are interested in Agroforestry (NCS Pathway 11, as per SI Extended Data Table 1). How would we identify all studies predicted to pertain to agroforestry and tabulate the different co-impacts associated with these agroforestry studies?

Note that studies could be, and often were (Figures 3 and 5), predicted to be associated with *multiple* NCS pathways and co-impacts. Thus, we have stored the pathway and co-impact categories as a list of numbers or strings. Therefore, we need to do a bit of pre-processing: namely, either use `pd.explode` to replicate each unique combination of study and pathway (and co-impact) **or** use a handy function from Pandas, `str.contains` to ascertain which studies contain pathway 11 (among other pathways).

In [11]:
### Pre-processing the data
agroforestryDF = NCSdf[['abstract','articletitle','predicted_benefits','predicted_pathway_numbers']] # subset to columns of interest
agroforestryDF = agroforestryDF[agroforestryDF['predicted_pathway_numbers'].str.contains('11')] # subset to rows (studies) containing pathway 11

In [12]:
### Inspect the data
agroforestryDF.head()

Unnamed: 0,abstract,articletitle,predicted_benefits,predicted_pathway_numbers
16,Ethiopia is identified as a primary centre of ...,Evaluation of agronomic performance of coffee ...,"['J', 'B', 'C', 'A']",['11']
36,"In 1984, the Speciality Coffee Association of ...",Cupping and Grading-Discovering Character and ...,"['J', 'B', 'C', 'A']",['11']
46,Background: This paper devotes to determinants...,Determinants of crop-livestock diversification...,"['G', 'B', 'C', 'K']","['10', '11']"
64,"Results of a mixed plantation with poplar, wal...",Comparing growth rate in a mixed plantation (w...,['J'],"['8', '1', '11']"
70,This paper analyzed markets for sustainable co...,Sustainable coffee marketing: Challenges and t...,"['J', 'B', 'C', 'A']",['11']


In [15]:
### A dictionary to map the co-impact strings to more legible categories (EDT 2)
code_mapping = {
    'A': 'subjective well-being',
    'B': 'economic living standards',
    'C': 'material living standards',
    'D': 'health',
    'E': 'education',
    'F': 'social relations',
    'G': 'security and safety',
    'H': 'subjective well-being',
    'I': 'culture and spirituality',
    'J': 'ecosystem services',
    'K': 'biodiversity conservation'
}

In [14]:
###
agroforestryDF.predicted_benefits.iloc[0]

"['J', 'B', 'C', 'A']"

In [17]:
### Additional processing
  # First, we need to convert the predicted_benefits column from a string representation of a list to an actual list.
agroforestryDF['predicted_benefits_list'] = agroforestryDF['predicted_benefits'].apply(lambda x: ast.literal_eval(x))

  # Now, explode the dataframe based on the list of predicted benefits
exploded_agroforestryDF = agroforestryDF.explode('predicted_benefits_list')

In [18]:
### Inspect the exploded data
  # Note: going forward, we will use predicted_benefits_list for any analyses.
exploded_agroforestryDF.head()

Unnamed: 0,abstract,articletitle,predicted_benefits,predicted_pathway_numbers,predicted_benefits_list
16,Ethiopia is identified as a primary centre of ...,Evaluation of agronomic performance of coffee ...,"['J', 'B', 'C', 'A']",['11'],J
16,Ethiopia is identified as a primary centre of ...,Evaluation of agronomic performance of coffee ...,"['J', 'B', 'C', 'A']",['11'],B
16,Ethiopia is identified as a primary centre of ...,Evaluation of agronomic performance of coffee ...,"['J', 'B', 'C', 'A']",['11'],C
16,Ethiopia is identified as a primary centre of ...,Evaluation of agronomic performance of coffee ...,"['J', 'B', 'C', 'A']",['11'],A
36,"In 1984, the Speciality Coffee Association of ...",Cupping and Grading-Discovering Character and ...,"['J', 'B', 'C', 'A']",['11'],J


In [19]:
# Map the codes to the more legible categories using the code_mapping dictionary
exploded_agroforestryDF['mapped_benefits'] = exploded_agroforestryDF['predicted_benefits_list'].map(code_mapping)

In [20]:
# Display the first few rows to verify the result
exploded_agroforestryDF[['predicted_benefits', 'predicted_benefits_list', 'mapped_benefits']].head()

Unnamed: 0,predicted_benefits,predicted_benefits_list,mapped_benefits
16,"['J', 'B', 'C', 'A']",J,ecosystem services
16,"['J', 'B', 'C', 'A']",B,economic living standards
16,"['J', 'B', 'C', 'A']",C,material living standards
16,"['J', 'B', 'C', 'A']",A,subjective well-being
36,"['J', 'B', 'C', 'A']",J,ecosystem services


In [21]:
# Display the breakdown of co-impacts across all of the agroforestry studies
exploded_agroforestryDF.mapped_benefits.value_counts() # generate a count of each unique co-impact for all of the agroforestry studies

Unnamed: 0_level_0,count
mapped_benefits,Unnamed: 1_level_1
economic living standards,9994
material living standards,9600
biodiversity conservation,7893
ecosystem services,6715
security and safety,3893
subjective well-being,1339
culture and spirituality,600
health,308


In [24]:
# Normalize breakdown of co-impacts across all of the agroforestry studies
(exploded_agroforestryDF.mapped_benefits.value_counts()/agroforestryDF.shape[0]).round(2) # divide the counts for each co-impact by the number of agroforestry studies

Unnamed: 0_level_0,count
mapped_benefits,Unnamed: 1_level_1
economic living standards,0.81
material living standards,0.77
biodiversity conservation,0.64
ecosystem services,0.54
security and safety,0.31
subjective well-being,0.11
culture and spirituality,0.05
health,0.02


Thus, from this notebook, we have seen how to:

1. Import the replication dataset
2. Subset our data to columns of interest and a particular pathway
3. Conduct a tally of the co-impacts associated with the agroforestry studies.