<img src="docs/logo.jpg" width="400" height="400" align="center"/> 

# <center> `seesus` for Text Analysis on Sustainability<center> 

`seesus` is an open-source Python package that evaluates whether a textual expression aligns with the concept of sustainability as defined by the United Nations Sustainable Development Goals (SDGs). It currently has four main functions: 
1. [Evaluate whether a statement aligns with sustainability](#1)
2. [Identify SDGs and associated targets in a statement](#2)
3. [Classify a statement into social, environmental, and economic sustainability](#3)
4. [Examine and customize match syntax](#4)

`seesus` is based on regular expressions instead of language models. It attains an accuracy rate of 75.5%, as determined by alignment with manual coding.

For analysis in R, please check [`SDGdector`](https://github.com/Yingjie4Science/SDGdetector).

To achieve the best results, it is recommended to analyze one sentence rather than a lengthy paragraph. Paragraphs could be split into sentences by tools such as `nltk.tokenize` and `re.split`. 

A statement could be directly matched with the concept of sustainability (e.g., "progress toward the Sustainable Development Goal 1"), or indirectly (e.g., "mitigate climate change"). We provide three examples here: one with indirect match, one with direct match, and a paragraph.

In [1]:
# install the package
# !pip install seesus

In [2]:
from seesus import SeeSus
from nltk.tokenize import sent_tokenize # only needed for example 3

## Example 1: indirect match

In [3]:
text = "We aim to contribute to the mitigation of climate change by reducing carbon emissions in the city."

## Example 2: direct match

In [4]:
text = "Our ambition is to achieve the Sustainable Development Goal 1"

<a name="1"></a>
### To evaluate whether a statement aligns with sustainability

In [5]:
# use SeeSus to classify the text
result = SeeSus(text)

In [6]:
# print result on whether a statement aligns with sustainability, True or False
print(result.sus)

True


<a name="2"></a>
### To identify SDGs and associated targets in a statement

In [7]:
# print the names of identified SDGs and descriptions
print(result.sdg)
print(result.sdg_desc)

['SDG1']
['No Poverty']


In [8]:
# print the names of identified SDG targets and descriptions
print(result.target)
print(result.target_desc)

['SDG1_general']
['End poverty in all its forms everywhere']


In [9]:
# print the match type
print(result.match)

['direct']


<a name="3"></a>
### To classify a statement into social, environmental, and economic sustainability

In [10]:
# determine which dimensions of sustainability (social, environmental, or economic) a statement belongs to
print(result.see)

{'social_sustainability': True, 'environmental_sustainability': False, 'economic_sustainability': False}


<a name="4"></a>
### To examine and customize match syntax

In [11]:
# print match syntax
SeeSus.show_syntax("SDG1_general")

[{'SDG_id': 'SDG1_general', 'SDG_keywords': '(sdg|goal)[^0-9]{0,2}(?=1\\b)|No Poverty', 'match_type': 'direct'}]


In [12]:
# customize match syntax
SeeSus.edit_syntax(sdg_id="SDG1_general", new_syntax="my match terms", match_type='indirect')

The indirect match syntax of SDG1_general has been updated.


It should be noted that if a match type (i.e., "direct" or "indirect") of the specified SDG id does not exist in the original database (i.e., `SDG_keys`), the new syntax will be *added* to the database. If a match type already exists, the new syntax will *replace* the original syntax.

In [13]:
# check the match syntax again after customization
SeeSus.show_syntax("SDG1_general")

[{'SDG_id': 'SDG1_general', 'SDG_keywords': '(sdg|goal)[^0-9]{0,2}(?=1\\b)|No Poverty', 'match_type': 'direct'}, {'SDG_id': 'SDG1_general', 'SDG_keywords': 'my match terms', 'match_type': 'indirect'}]


In [14]:
# rerun SeeSus
new_result = SeeSus("my match terms are in text")

In [15]:
# print results after customizing the match syntax
print(new_result.sus)
print(new_result.target)
print(new_result.match)

True
['SDG1_general']
['indirect']


## Example 3: a paragraph

In [16]:
text = "By working with communities in the floodplain and facilitating flood-resistant building design, DCP is reducing the city’s risks to sea level rise and coastal flooding. Hurricane Sandy was a stark reminder of these risks. The City, led by the Mayor’s Office of Recovery and Resiliency (ORR), has developed a multifaceted plan for recovering from Sandy and improving the city’s resiliency–the ability of its neighborhoods, buildings and infrastructure to withstand and recover quickly from flooding and climate events. As part of this effort, DCP has initiated a series of projects to identify and implement land use and zoning changes as well as other actions needed to support the short-term recovery and long-term vitality of communities affected by Hurricane Sandy and other areas at risk of coastal flooding."

Source: [NYC Planning](https://www.nyc.gov/site/planning/about/dcp-priorities/resiliency-sustainability.page)

In [17]:
for sent in sent_tokenize(text):
    result = SeeSus(sent)
    print(sent)
    print("Is the sentence related to achieving sustainability?", result.sus)
    print("Which SDGs?", result.sdg)
    print("Which SDG targets specifically?", result.target)
    print("which dimensions of sustainability?", result.see)
    print("----------------")

By working with communities in the floodplain and facilitating flood-resistant building design, DCP is reducing the city’s risks to sea level rise and coastal flooding.
Is the sentence related to achieving sustainability? True
Which SDGs? ['SDG13']
Which SDG targets specifically? ['SDG13_1', 'SDG13_general']
which dimensions of sustainability? {'social_sustainability': False, 'environmental_sustainability': True, 'economic_sustainability': False}
----------------
Hurricane Sandy was a stark reminder of these risks.
Is the sentence related to achieving sustainability? False
Which SDGs? []
Which SDG targets specifically? []
which dimensions of sustainability? {'social_sustainability': False, 'environmental_sustainability': False, 'economic_sustainability': False}
----------------
The City, led by the Mayor’s Office of Recovery and Resiliency (ORR), has developed a multifaceted plan for recovering from Sandy and improving the city’s resiliency–the ability of its neighborhoods, buildings a