# `pyEQL` Tutorial: Searching the Property Database

![pyeql-logo.png](attachment:b51ca3f4-e8bd-4b0a-a599-5f6724ad8fe5.png)

`pyEQL` is an open-source `python` library for solution chemistry calculations and ion properties developed by the [Kingsbury Lab](https://www.kingsburylab.org/) at Princeton University.

[Documentation](https://pyeql.readthedocs.io/en/latest/) | [How to Install](https://pyeql.readthedocs.io/en/latest/installation.html) | [GitHub](https://github.com/rkingsbury/pyEQL) 

## Installation

Uncomment and run the code cell below, if you do not already have `pyEQL`

In [1]:
# pip install pyEQL

## First, import the property database

`pyEQL`'s built-in property database contains physichochemical, transport, and model parameters for hundreds of solutes. This information is used behind the scenes when you interact with a `Solution` object, but it can also be accessed directly.

In [2]:
from pyEQL import IonDB

## How to Search the Database

### Query an example document

You can think of the database like `list` of `dict` that contain structure data. More specifically, the database is a list of [`Solute` objects](https://pyeql.readthedocs.io/en/latest/database.html#the-solute-class) that have been serialized to dictionaries. We refer to each of these `dict` as **"documents"** (consistent with MongoDB terminology) or "records"

To see what one document looks like, use `query_one()`, which retrieves a single record from the database. The record is a `dict`.

In [3]:
IonDB.query_one()

{'_id': ObjectId('654e5f131ed012c187817e6a'),
 'formula': 'Ac[+3]',
 'charge': 3,
 'molecular_weight': '227.0 g/mol',
 'elements': ['Ac'],
 'chemsys': 'Ac',
 'pmg_ion': {'Ac': 1,
  'charge': 3,
  '@module': 'pymatgen.core.ion',
  '@class': 'Ion',
  '@version': None},
 'formula_html': 'Ac<sup>+3</sup>',
 'formula_latex': 'Ac$^{+3}$',
 'formula_hill': 'Ac',
 'formula_pretty': 'Ac^+3',
 'oxi_state_guesses': {'Ac': 3},
 'n_atoms': 1,
 'n_elements': 1,
 'size': {'radius_ionic': {'value': '1.26 Å',
   'reference': 'pymatgen',
   'data_type': 'experimental'},
  'radius_hydrated': None,
  'radius_vdw': {'value': '2.47 Å',
   'reference': 'pymatgen',
   'data_type': 'experimental'},
  'molar_volume': None,
  'radius_ionic_marcus': {'value': '1.18 ± 0.02 Å',
   'reference': 'Marcus2015',
   'data_type': 'experimental'}},
 'thermo': {'ΔG_hydration': {'value': '-3086.0 ± 10 kJ/mol',
   'reference': '10.1021/acs.jpca.9b05140',
   'data_type': 'experimental'},
  'ΔG_formation': None},
 'transport': 

### Query a specific document

The `IonDB` is a [`maggma.Store`](https://materialsproject.github.io/maggma/getting_started/stores/) that can be queried using a MongoDB-like syntax. The basic syntax is

```
IonDB.query_one({field: value})
```

where `field` is a top-level key in the `Solute` `dict`, such as `formula`, `charge`, or `elements`. See [this page](https://riptutorial.com/mongodb/example/26813/pymongo-queries) and the `maggma` documentation (link WIP) for more detailed examples.

In [4]:
# a document with the formula "Na[+1]"
IonDB.query_one({"formula": "Na[+1]"})

{'_id': ObjectId('654e5f131ed012c187817f46'),
 'formula': 'Na[+1]',
 'charge': 1,
 'molecular_weight': '22.98976928 g/mol',
 'elements': ['Na'],
 'chemsys': 'Na',
 'pmg_ion': {'Na': 1,
  'charge': 1,
  '@module': 'pymatgen.core.ion',
  '@class': 'Ion',
  '@version': None},
 'formula_html': 'Na<sup>+1</sup>',
 'formula_latex': 'Na$^{+1}$',
 'formula_hill': 'Na',
 'formula_pretty': 'Na^+1',
 'oxi_state_guesses': {'Na': 1},
 'n_atoms': 1,
 'n_elements': 1,
 'size': {'radius_ionic': {'value': '1.16 Å',
   'reference': 'pymatgen',
   'data_type': 'experimental'},
  'radius_hydrated': {'value': '3.58 Å',
   'reference': 'Nightingale1959',
   'data_type': 'experimental'},
  'radius_vdw': {'value': '2.27 Å',
   'reference': 'pymatgen',
   'data_type': 'experimental'},
  'molar_volume': {'value': '-5.0 cm**3/mol',
   'reference': 'Calculation of the Partial Molal Volume of Organic Compounds and Polymers. Progress in Colloid & Polymer Science (94), 20-39.',
   'data_type': 'experimental'},
  'ra

### Only return a subset of the document

If you don't need to see the entire document, you can restrict the data returned by the query (in MongoDB, this is called "projection"). To use this feature, pass a second argument that is a `list` containing _only the fields that you want returned_. Note that there is a unique identified (field name `_id`) that is always returned.

In [5]:
# a document with the formula "Na[+1]", where we only want the formula, charge, and molecular_weight
IonDB.query_one({"formula": "Na[+1]"}, ["formula", "charge", "molecular_weight"])

{'formula': 'Na[+1]',
 'charge': 1,
 'molecular_weight': '22.98976928 g/mol',
 '_id': ObjectId('654e5f131ed012c187817f46')}

In [6]:
# a document with the charge -1, where we only want the formula, charge, and molecular_weight
IonDB.query_one({"charge": -1}, ["formula", "charge", "molecular_weight"])

{'formula': 'Ag(CN)2[-1]',
 'charge': -1,
 'molecular_weight': '159.903 g/mol',
 '_id': ObjectId('654e5f131ed012c187817e6b')}

**NOTE**: Be mindful of data types when querying. `charge` is an `int`. If we tried to query `charge` as if it were a `str`, we would get no results:

In [7]:
# a document with the charge -1, where we only want the formula, charge, and molecular_weight
IonDB.query_one({"charge": "-1"}, ["formula", "charge", "molecular_weight"])

### Query nested fields

If you want to query a field that is not a top-level key (such as transport / diffusion_coefficient), you can place a `.` between the field names at each level, e.g.

In [8]:
IonDB.query_one({"size.radius_vdw.value": "2.27 Å"}, ["formula", "size.radius_vdw.value"])

{'formula': 'Na2CO3(aq)',
 'size': {'radius_vdw': {'value': '2.27 Å'}},
 '_id': ObjectId('654e5f131ed012c187817f31')}

**Note** that in the `Solute` documents, **most quantitative data are stored as `str` so that there is no ambiguity about their units**. In the example above, the value of the van der Waals radius is `"2.27 Å"` (a `str`, including a unit), NOT `2.27` (a `float`).

You can easily extract the value by turning the `str` into a `Quantity` (see [Converting Units](https://pyeql.readthedocs.io/en/latest/units.html)), or by using `python` string operations to split the value and the units, e.g.

In [9]:
# string operations
print(float("2.27 Å".split(" ")[0]))

2.27


In [10]:
# pint Quantity
from pyEQL import ureg

print(ureg.Quantity("2.27 Å").magnitude)

2.27


### Query multiple documents

`query_one` only returns a single document (a single `dict`). You can instead use `query` with exactly the same syntax to return a [generator](https://realpython.com/introduction-to-python-generators/) of all documents that match your query.

In [11]:
# all documents with a charge of +2, returning only the formulas
IonDB.query({"charge": 2}, ["formula", "molecular_weight"])

<generator object MongoStore.query at 0x7f0e84427ed0>

A generator is not very useful unless we turn it into a `list`. You can do this with `list()` or with a [list comprehension](https://www.w3schools.com/python/python_lists_comprehension.asp)

In [12]:
# using list()
list(IonDB.query({"charge": 2}, ["formula", "molecular_weight"]))

[{'formula': 'Ag[+2]',
  'molecular_weight': '107.8682 g/mol',
  '_id': ObjectId('654e5f131ed012c187817e6e')},
 {'formula': 'Au[+2]',
  'molecular_weight': '196.966569 g/mol',
  '_id': ObjectId('654e5f131ed012c187817e76')},
 {'formula': 'Ba[+2]',
  'molecular_weight': '137.327 g/mol',
  '_id': ObjectId('654e5f131ed012c187817e83')},
 {'formula': 'Be[+2]',
  'molecular_weight': '9.012182 g/mol',
  '_id': ObjectId('654e5f131ed012c187817e85')},
 {'formula': 'Ca[+2]',
  'molecular_weight': '40.078 g/mol',
  '_id': ObjectId('654e5f131ed012c187817e96')},
 {'formula': 'Cd[+2]',
  'molecular_weight': '112.411 g/mol',
  '_id': ObjectId('654e5f131ed012c187817e9b')},
 {'formula': 'Co[+2]',
  'molecular_weight': '58.933195 g/mol',
  '_id': ObjectId('654e5f131ed012c187817ea9')},
 {'formula': 'Cr[+2]',
  'molecular_weight': '51.9961 g/mol',
  '_id': ObjectId('654e5f131ed012c187817eae')},
 {'formula': 'Cu[+2]',
  'molecular_weight': '63.546 g/mol',
  '_id': ObjectId('654e5f131ed012c187817ebe')},
 {'fo

In [13]:
# using a comprehension
[doc for doc in IonDB.query({"charge": 2}, ["formula", "molecular_weight"])]

[{'formula': 'Ag[+2]',
  'molecular_weight': '107.8682 g/mol',
  '_id': ObjectId('654e5f131ed012c187817e6e')},
 {'formula': 'Au[+2]',
  'molecular_weight': '196.966569 g/mol',
  '_id': ObjectId('654e5f131ed012c187817e76')},
 {'formula': 'Ba[+2]',
  'molecular_weight': '137.327 g/mol',
  '_id': ObjectId('654e5f131ed012c187817e83')},
 {'formula': 'Be[+2]',
  'molecular_weight': '9.012182 g/mol',
  '_id': ObjectId('654e5f131ed012c187817e85')},
 {'formula': 'Ca[+2]',
  'molecular_weight': '40.078 g/mol',
  '_id': ObjectId('654e5f131ed012c187817e96')},
 {'formula': 'Cd[+2]',
  'molecular_weight': '112.411 g/mol',
  '_id': ObjectId('654e5f131ed012c187817e9b')},
 {'formula': 'Co[+2]',
  'molecular_weight': '58.933195 g/mol',
  '_id': ObjectId('654e5f131ed012c187817ea9')},
 {'formula': 'Cr[+2]',
  'molecular_weight': '51.9961 g/mol',
  '_id': ObjectId('654e5f131ed012c187817eae')},
 {'formula': 'Cu[+2]',
  'molecular_weight': '63.546 g/mol',
  '_id': ObjectId('654e5f131ed012c187817ebe')},
 {'fo

## Counting Documents

You can use `count()` to see how many documents the database contains

In [14]:
IonDB.count()

346

Count works with queries, too.

In [15]:
# number of documents with a charge of -3
IonDB.count({"charge": -3})

7

## More Advanced Query Syntax

### Match multiple items with `$in`

If you want to query documents that match _any one of a set of values_, use `$in` with a `list` of possible values. Note that the `$in` operator and your `list` constitute their own dictionary, e.g. `{"$in":<list>}`. This entire dictionary is the "value" of your query for the associated field. For example:

In [16]:
# all alkali cations
IonDB.count({"formula": {"$in": ["Li[+1]", "Na[+1]", "K[+1]", "Rb[+1]", "Cs[+1]"]}})

5

### Greater than or less than - `$gt` / `$gte` / `$lt` / `$lte`

In a similar manner, you can query fields whose values are greater than / less than or equal to some value

In [17]:
# all solutes with a charge less than 0
IonDB.count({"charge": {"$lt": 0}})

76

In [18]:
# all solutes with a charge greater than or equal to 1
IonDB.count({"charge": {"$gte": 1}})

108

## Unique Values

It's often useful to understand how many unique values of a field there are. To do so, use `distinct()` with any field name

In [19]:
# list of all unique `formula`
IonDB.distinct("formula")

['U(ClO5)2(aq)',
 'LiClO4(aq)',
 'Sb(OH)6[-1]',
 'Ba[+2]',
 'RbNO3(aq)',
 'KBrO3(aq)',
 'H3O[+1]',
 'CsNO2(aq)',
 'Re[+1]',
 'KHC2O.1H2O(aq)',
 'Ni[+3]',
 'H8S(NO2)2(aq)',
 'Sm[+2]',
 'B(OH)4[-1]',
 'CoI2(aq)',
 'ZnBr2(aq)',
 'Sn[+2]',
 'USO6(aq)',
 'Ir[+3]',
 'Ag(CN)2[-1]',
 'KNO3(aq)',
 'Ga[+3]',
 'Zn(NO3)2(aq)',
 'NaHC3.2H2O(aq)',
 'Ni(NO3)2(aq)',
 'S[-2]',
 'HS[-1]',
 'Eu[+2]',
 'ZnSO4(aq)',
 'BeSO4(aq)',
 'MnO4[-1]',
 'K2CO3(aq)',
 'Pa[+3]',
 'SrI2(aq)',
 'FeCl2(aq)',
 'Eu(NO3)3(aq)',
 'NaClO4(aq)',
 'Zn[+2]',
 'SeO4[-1]',
 'NaCrO4(aq)',
 'CsOH(aq)',
 'Na3PO4(aq)',
 'KCSN(aq)',
 'HSO4[-1]',
 'Mn[+3]',
 'H4NClO4(aq)',
 'NiSO4(aq)',
 'IO4[-1]',
 'Sr(ClO4)2(aq)',
 'SeO4[-2]',
 'Ag[+1]',
 'LiI(aq)',
 'SiF6[-2]',
 'HF2[-1]',
 'CoBr2(aq)',
 'Pr[+3]',
 'BaBr2(aq)',
 'ClO2[-1]',
 'MgBr2(aq)',
 'Ho[+3]',
 'Be[+2]',
 'H2O(aq)',
 'Po[+2]',
 'P2O7[-4]',
 'RbCl(aq)',
 'K[+1]',
 'ClO4[-1]',
 'Mg(ClO4)2(aq)',
 'NdCl3(aq)',
 'Au[+1]',
 'Rb2SO4(aq)',
 'Na2PHO4(aq)',
 'Th[+4]',
 'Fe[+3]',
 'Ra[+2]'