# A Data Science Framework for the Analysis of Ion Transport Mechanisms in Ionic Liquids

## Introduction

We analyze the underlying principles of pure ionic liquid conductivity and other properties using a dataset of 3D descriptors and data science framework. We combine molecular connectivity graphs, RDKit singe-ion simulations, PubChem3D data, and bulk properties to produce a dataset of easily accesibly for ionic liquids

Here, we propose a **solvent GNN (SolvGNN)** architecture that captures molecular interactions in an explicit manner by combining *atomic-level (local)* graph convolution and *molecular-level (global)* message passing through a molecular interaction network. SolvGNN uses a graph representation wherein each node represents a molecule and each edge represents a potential intermolecular interaction (e.g., hydrogen bonding). We tested SolvGNN via a case study on **activity coefficient predictions for binary and ternary mixtures** using large data sets obtained from **COnductor-like Screening MOdel for Real Solvation (COSMO-RS)** calculations. We show that the proposed SolvGNN can predict composition-dependent activity coefficients with high accuracy. To interpret the trained model, we performed **counterfactual analysis** on SolvGNN to highlight the physical implications of chemical structures and compositions on the solvation behavior. Finally, we built a SolvGNN-based framework that takes a given mixture (binary or ternary) as input and generates the corresponding **phase diagrams (P-x-y)** using predicted activity coefficients coupled with phase equilibrium calculations.

SolvGNN is an open-source **computational framework and tool**. It can be directly used as tools to predict activity coefficients and/or generate phase diagrams; it can also be used for researchers and developers to train their own data sets and initiate future research.
<br />

> Features

- `Data science framework` for analyzing ionic liquid properties
- `Bulk property  predictions` for ionic liquids given their **SMILES strings** and **RDKit Simulations**
- `Descriptor analysis` to evaluate predictive capabilities
- `Arrhenius model analysis` to model structural transport in ionic liquids
- `Data sets availability` from IL Thermo, RDKit, and PubChem3D containing properties and molecular descriptors for **218** ionic liquids and **2,371** temperature dependent data points

> Publication

- 👉 [Submitted Paper](https://doi.org/10.26434/chemrxiv-2022-3tq4c)

<br />

## Sample Output Showcase

<br />

> Classical modeling (Nernst-Einstein & modified Arrhneius)

<br />
<img src="./Figures/Readme_NE Parity.png" /> <img src="./Figures/Readme_Modified Arrhenius Fit.png" /> 
<br />


> t-SNE property mapping

<img src="./Figures/Readme_t-SNE Properties.png" />

<br />

## Implementation and Development

```bash
$ git clone -b SolvGNN --single-branch https://github.com/zavalab/ML.git
$ cd ML
$ conda create -f environment.yml
```

## Tutorials

For detailed usage, navigate to the `notebook` directory.


## Links

- [Zavalab](https://zavalab.engr.wisc.edu/)
- [Gebbie Lab](https://interfaces.che.wisc.edu/)
- [RDKit](https://github.com/rdkit/rdkit)
- [PubChem 3D](https://pubchem.ncbi.nlm.nih.gov/docs/pubchem3d)
- [IL Thermo](https://ilthermo.boulder.nist.gov/)

<br />