# Isotope Masses
You are going to analyse the masses of the known isotopes.

#### Reading data
The data is read from a *parquet* file. This file format contains the datatypes (e.g. int or float) in addition to the actual values.

In [None]:
import polars as pl

isotopes = pl.read_parquet('data/isotopes.parquet')

#### Browsing the data
Have first look at the dataframe and find out about the different columns, the number of isotopes, etc.

In [None]:
display(isotopes.sample(10)) # display 10 random rows

In [None]:
rows, cols = isotopes.shape
print(f'{rows} rows and {cols} columns')

#### Some search tasks
Answer the following questions:
- Which isotope has the greates atomic mass?
- How many oxygen isotopes are known?
- Which element has the greatest number of isotopes?
- Which lead (Pb) isotopes have a non-negligible abundance (column *Isotopic Composition*) and do their abundances add up to 100 %?

In [None]:
# find the isotope with the maximum atomic mass (Og stands for Oganessum)
isotopes.filter(pl.col('Relative Atomic Mass') == pl.max('Relative Atomic Mass'))

In [None]:
# filter oxygen (O) isotopes; there are 17 rows
isotopes.filter(pl.col('Atomic Symbol') == 'O')

In [None]:
# group by atomic number, aggregate number of isotopes (len), sort by number of isotopes (descending)
(isotopes
    .group_by('Atomic Number')
    .agg([
        pl.col('Atomic Symbol'), 
        pl.len().alias('Number of Isotopes')]
        )
    .with_columns(pl.col('Atomic Symbol').list.first())
    .sort('Number of Isotopes', descending=True)
) 

In [None]:
# filter for Pb isotopes with isotopic composition not null
pb_not_null = isotopes.filter((pl.col('Atomic Symbol') == 'Pb') & pl.col('Isotopic Composition').is_not_null())
display(pb_not_null)

display(pb_not_null.select(pl.sum('Isotopic Composition').alias('Sum of Abundances')))