Skip to content

Commit

Permalink
added instructions in the paper for replicating the accuracy assessment
Browse files Browse the repository at this point in the history
  • Loading branch information
Yingjie4Science committed Mar 6, 2023
1 parent dd19550 commit 56cf11f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ The Sustainable Development Goals (SDGs) agenda, adopted by all United Nations M

The SDGdetector package was developed by (1) compiling six existing databases on SDG search queries [@un_global_2019; @duran-silva_controlled_2019; @jayabalasingham_identifying_2019; @vanderfeesten_search_2020; @schubert_scientific_2020; @bautista-puig_unveiling_2019; @wulff_text2sdg_2021]; (2) reviewing all SDG targets and indicators [@un_global_2019] to manually refine and update the search terms to create query dictionaries at the levels of the 17 SDGs and the 169 SDG targets (which correspond to the 231 SDG indicators); (3) manually assessing and improving the accuracy of these queries using thousands of randomly-selected statements from real-world corporate annual reports across multiple iterations; and (4) turning these queries into a lexical database for text mining across large bodies of text and tabulating the matched SDGs and SDG targets.

SDGdetector is a unique tool because it is by far the only one available that is equipped with a database for detecting SDG-relevant statements at the target level. We are aware of another useful R package (*text2sdg*)[@wulff_text2sdg_2021], which mostly uses single words as search terms and was designed to only map text to SDGs at the goal level (coarser resolution). Our search queries in the comprehensive database further considered sentence structure to reduce noise hits, and can capture hits at both goal and target level. In combination with this database, the text mining approach, an artificial intelligence (AI) technology, enables us to use natural language processing to transform the unstructured text within documents into normalized and structured data suitable for analysis and visualization. After repeated validation and calibration, this package has achieved high accuracy in detecting SDG-related statements within textual data (> 75.5%, measured by the alignment between the R package results and four experts’ manually-coded results; see the "Accuracy Evaluation" section on [GitHub](https://github.com/Yingjie4Science/SDGdetector) for more information). Other similar tools, such as the *text2sdg*, however, did not report any accuracy evaluations.
SDGdetector is a unique tool because it is by far the only one available that is equipped with a database for detecting SDG-relevant statements at the target level. We are aware of another useful R package (*text2sdg*)[@wulff_text2sdg_2021], which mostly uses single words as search terms and was designed to only map text to SDGs at the goal level (coarser resolution). Our search queries in the comprehensive database further considered sentence structure to reduce noise hits, and can capture hits at both goal and target level. In combination with this database, the text mining approach, an artificial intelligence (AI) technology, enables us to use natural language processing to transform the unstructured text within documents into normalized and structured data suitable for analysis and visualization. After repeated validation and calibration, this package has achieved high accuracy in detecting SDG-related statements within textual data (> 75.5%, measured by the alignment between the R package results and four experts’ manually-coded results; see the "Accuracy Evaluation" section on [GitHub](https://github.com/Yingjie4Science/SDGdetector) for more information). Complete data and code necessary for reproducing this accuracy evaluation can be found on our GitHub repository under the [`./docs/accuracy_evaluation/`](https://github.com/Yingjie4Science/SDGdetector/tree/main/docs/accuracy_evaluation) folder. Other similar tools, such as the *text2sdg*, however, did not report any accuracy evaluations.

This lightweight package has great potential to be useful in many disciplines with objectives to identify which SDGs and to what extent an entity is putting effort into them. This package can be used in large-scale research projects in the field of corporate sustainability and urban science. It can also be used in systematic reviews and syntheses of published literature and patents. The associated lexical database embedded within this R package can be also used for developing similar applications in Python or other programming languages.

Expand Down

0 comments on commit 56cf11f

Please sign in to comment.