[GitHub respository for the COLING/LREC 2024 tutorial on NLP for chemistry]
Chemistry was for long a terra incognita for natural language processing (NLP). While strong overlap with computational and statistical physics (in e.g., so-called computational chemistry) gave rise to the application of many statistical models, methods derived from NLP have only reached wide acceptance in the past twenty years.
The aim of this tutorial is to provide a basic introduction to this emerging field, and overview some of its latest advances. Given its breath, we will focus on four fundamental use cases. This tutorial will be organized as follows:
- Topic 1. Basic chemical notions and techniques.
- Topic 2. Text mining in the chemistry domain.
- Topic 3. Distributional models for (computational) chemistry.
- Topic 4. Large language models and applications.
This tutorial assumes no prior knowledge, with the exception to exposure to Python and natural language processing. Knowledge of chemistry is beneficial but not required. For an overview of the topics, please read the proposal below. Please cite as follows:
@inproceedings{thorne-akhondi-2024-nlp,
title = "{NLP} for Chemistry {--} Introduction and Recent Advances",
author = "Thorne, Camilo and Akhondi, Saber",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics,
Language Resources and Evaluation (LREC-COLING 2024): Tutorial Summaries",
year = "2024",
address = "Torino, Italy",
url = "https://aclanthology.org/2024.lrec-tutorials.8",
pages = "45--49"
}
The PDF file is here.
Found under slides/
.
Given GitHub storage limits, some data artefacts need to be downloaded separately from Google Drive:
- the datasets are available in this repository, but
- the models need to be downloaded to
models/
, and - the tools need to be downloaded to
tools/
.
Navigate to the cloned directories and read the instructions.
Found under notebooks/
.
Participants are encouraged to reach out both during and after COLING/LREC 2024 for questions regarding the materials, methods, concepts, notebooks, etc.
- Email:
camilo.thorne@gmail.com
.