Skip to content

camilothorne/nlp-4-chemistry-lrec-2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP for Chemistry - Introduction and Recent Advances

[GitHub respository for the COLING/LREC 2024 tutorial on NLP for chemistry]

logo

molecule

Introduction

Chemistry was for long a terra incognita for natural language processing (NLP). While strong overlap with computational and statistical physics (in e.g., so-called computational chemistry) gave rise to the application of many statistical models, methods derived from NLP have only reached wide acceptance in the past twenty years.

The aim of this tutorial is to provide a basic introduction to this emerging field, and overview some of its latest advances. Given its breath, we will focus on four fundamental use cases. This tutorial will be organized as follows:

  • Topic 1. Basic chemical notions and techniques.
  • Topic 2. Text mining in the chemistry domain.
  • Topic 3. Distributional models for (computational) chemistry.
  • Topic 4. Large language models and applications.

This tutorial assumes no prior knowledge, with the exception to exposure to Python and natural language processing. Knowledge of chemistry is beneficial but not required. For an overview of the topics, please read the proposal below. Please cite as follows:

@misc{thorne-lrec-2024,
  author	= {Thorne, Camilo and Saber Akhondi, Saber},
  institution	= {Elsevier},
  title 	= {NLP for Chemistry - Introduction and Recent Advances},
  year 		= {2024},
  url 		= {https://github.com/camilothorne/nlp-4-chemistry-lrec-2024}
}
Proposal for COLING/LREC 2024 (PDF file)

This browser does not support PDFs. Please download the PDF to view it: Download PDF.

Slides

Found under slides/.

Datasets, Models and Tools

Given GitHub storage limits, some data artefacts need to be downloaded separately from Google Drive:

  • the datasets are available in this repository, but
  • the models need to be downloaded to models/, and
  • the tools need to be downloaded to tools/.

Navigate to the cloned directories and read the instructions.

Notebooks

Found under notebooks/.

Communication channels

Participants are encouraged to reach out both during and after COLING/LREC 2024 for questions regarding the materials, methods, concepts, notebooks, etc.

About

GitHub respository for the COLING/LREC '24 tutorial on NLP for chemistry

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published