[GitHub respository for the COLING/LREC 2024 tutorial on NLP for chemistry]
Chemistry was for long a terra incognita for natural language processing (NLP). While strong overlap with computational and statistical physics (in e.g., so-called computational chemistry) gave rise to the application of many statistical models, methods derived from NLP have only reached wide acceptance in the past twenty years.
The aim of this tutorial is to provide a basic introduction to this emerging field, and overview some of its latest advances. Given its breath, we will focus on four fundamental use cases. This tutorial will be organized as follows:
- Topic 1. Basic chemical notions and techniques.
- Topic 2. Text mining in the chemistry domain.
- Topic 3. Distributional models for (computational) chemistry.
- Topic 4. Large language models and applications.
This tutorial assumes no prior knowledge, with the exception to exposure to Python and natural language processing. Knowledge of chemistry is beneficial but not required. For an overview of the topics, please read the proposal below. Please cite as follows:
@misc{thorne-lrec-2024,
author = {Thorne, Camilo and Saber Akhondi, Saber},
institution = {Elsevier},
title = {NLP for Chemistry - Introduction and Recent Advances},
year = {2024},
url = {https://github.com/camilothorne/nlp-4-chemistry-lrec-2024}
}
Proposal for COLING/LREC 2024 (PDF file)
This browser does not support PDFs. Please download the PDF to view it: Download PDF.
Found under slides/
.
Given GitHub storage limits, some data artefacts need to be downloaded separately from Google Drive:
- the datasets are available in this repository, but
- the models need to be downloaded to
models/
, and - the tools need to be downloaded to
tools/
.
Navigate to the cloned directories and read the instructions.
Found under notebooks/
.
Participants are encouraged to reach out both during and after COLING/LREC 2024 for questions regarding the materials, methods, concepts, notebooks, etc.
- Slack channel (public - just search it on Slack and join!): #nlp-4-chemistry-lrec-coling-2024.
- Email:
camilo.thorne@gmail.com
and/orc.thorne.1@elsevier.com
.