Retrieving SciGlass data with GlassPy
=====================================



In [None]:
!pip install glasspy

## Introduction



GlassPy can load experimental data. Currently, GlassPy has the SciGlass database as an available data source. All data loading procedures are managed in the `glasspy.data` submodule.



## Basic usage



Below is a minimal example of loading SciGlass data into a Pandas DataFrame, this is done by creating an instance of the `SciGlass` class and then accessing the `data` property. Without any arguments, the `SciGlass` class will load all available data and metadata. This process takes a while because it needs to load and parse the original SciGlass database.



In [None]:
from glasspy.data import SciGlass, sciglass_dbinfo

source = SciGlass()
df = source.data

When it is done, we can check what we have.



In [None]:
df

To avoid naming conflicts and to make it easier to navigate through the DataFrame, the data is structured in two levels. In the first level, we have information grouped by composition, property, or metadata.



In [None]:
print(df.columns.levels[0])

If you want to explore the elemental composition of the glasses, you can do so by filtering only the data in the `elements` group.



In [None]:
els = df["elements"]
els

Suppose you want to explore the glass transition temperature data. You can do this by first accessing the `property` group and then the `Tg` column.



In [None]:
Tg = df["property"]["Tg"]
Tg

As expected, not all entries have a glass transition temperature value.

To list all available properties in GlassPy, call the `sciglass_dbinfo` function.



In [None]:
sciglass_dbinfo()

See the `pandas` [documentation](https://pandas.pydata.org/docs/) for more information on what you can do with DataFrames.



## Controlling initial data collection



It takes a while to load all SciGlass data. It may be better to load only what you really need. You can control what to load with configuration dictionaries.

For example, suppose you don&rsquo;t want glasses with silver or gold in their composition. You can easily remove glasses with these elements using the `dropline` configuration.



In [None]:
config_el = {
    "dropline": ["Ag", "Au"],
}

source = SciGlass(
    elements_cfg=config_el,
)

df = source.data
df

Suppose you are only interested in glasses that have a glass transition temperature and refractive index values. You can use the `must_have_and` configuration to filter the data to only those with those two properties.



In [None]:
config_prop = {
    "must_have_and": ["Tg", "RefractiveIndex"],
}

source = SciGlass(
    properties_cfg=config_prop,
)

df = source.data
df

Of course, you can mix two or more filters in the same query. Let&rsquo;s mix the two filters we used earlier.



In [None]:
config_el = {
    "dropline": ["Ag", "Au"],
}

config_prop = {
    "must_have_and": ["Tg", "RefractiveIndex"],
}

source = SciGlass(
    elements_cfg=config_el,
    properties_cfg=config_prop,
)

df = source.data
df

See the [documentation](https://glasspy.readthedocs.io/en/latest/modules/glasspy.data.html#glasspy.data.load.SciGlass) for the `SciGlass` class for more information on how to control your initial data collection.



## Some query examples



Below are some examples that show strategies for controlling the `SciGlass` query. You can combine more than one option when querying the data.



### Return the composition in wt%



In [None]:
config_el = {}  # do this if you don't want the elemental columns

config_prop = {
    "must_have_and": ["Tg", "RefractiveIndex"],
}

config_comp = {
    "return_weight": True,
}

source = SciGlass(
    elements_cfg=config_el,
    properties_cfg=config_prop,
    compounds_cfg=config_comp,
)

df = source.data
df

### Remove compounds with specific chemical elements



In [None]:
config_el = {}

config_comp = {
    "drop_compound_with_element": ["Ca", "Li", "K"],
}

source = SciGlass(
    elements_cfg=config_el,
    compounds_cfg=config_comp,
)

df = source.data
df

### Make the composition of a glass sum to 100% instead of 1



In [None]:
config_el = {}

config_comp = {
    "final_sum": 100,
}

source = SciGlass(
    elements_cfg=config_el,
    compounds_cfg=config_comp,
)

df = source.data
df

### Compounds that must be present (OR logic)



In [None]:
config_el = {}

config_comp = {
    "must_have_or": ["SiO2", "Na2O", "Al2O3"],
}

source = SciGlass(
    elements_cfg=config_el,
    compounds_cfg=config_comp,
)

df = source.data
df

### Compounds that must be present (AND logic)



In [None]:
config_el = {}

config_comp = {
    "must_have_and": ["SiO2", "Na2O", "Al2O3"],
}

source = SciGlass(
    elements_cfg=config_el,
    compounds_cfg=config_comp,
)

df = source.data
df

## Converting compounds to elements



If you have queried the SciGlass database for compounds only, you can easily convert this information to atomic fraction using the `elements_from_compounds` method.



In [None]:
source.elements_from_compounds(
    final_sum=1,
    compounds_in_weight=False,
)

df = source.data
df

## Remove duplicate entries



Entries with the same chemical composition can cause data leakage in machine learning pipelines. An easy way to fix this is to use the `remove_duplicate_composition` method.



In [None]:
source.remove_duplicate_composition(
    scope="elements",
    decimals=3,
    aggregator="median",
)

df = source.data
df