# Read a SLOP file and create HKF and SUPCRT parameter databases
The SLOP file contains calibrated parameters and parameter estimates for minerals, gasses and aqueous species that are consistent with the revised HKF formulation and with the Helgeson, Delany, Nesbitt and Bird mineral/gas thermodynamic database.  References:    

Shock EL, Oelkers EH, Johnson JW, Sverjensky DA, Helgeson HC (1992) Calculation of the thermodynamic properties of aqueous species at high pressures and temperatures. Journal of the Chemical Society Faraday Transactions, 88(6), 803-826  

Helgeson HC, Delany JM, Nesbitt HW, and Bird DK (1978) Summary and critique of the thermodynamic properties of rock-forming minerals, American Journal of Sciences, 278-A, 229pp  

Note that SLOP contains parameters for an expanded set of aqueous species, mineral, and gas properties.  

This notebook creates a pickled dictionary containing Pandas data frames that each contain parameter values.  The pickled dictionary file has the same name as the SLOP file with the extension *dict*.  The contents of the dictionary is described below.  

Enter the name of the SLOP file (assumed to be in the same directory as this notebook) below:

In [None]:
fname = 'slop16_v3_1.dat'
dname = fname.split(".")[0] + ".dict"
dname

Read the SLOP file ...

In [None]:
lines = [line.rstrip('\n') for line in open(fname)]

Parse the file into documentation and phase type lists ...

In [None]:
docs = []
min_no_trans = []
min_one_trans = []
min_two_trans = []
min_three_trans = []
gasses = []
aqueous = []

no_transition    = False
one_transition   = False
two_transition   = False
three_transition = False
gas_entries      = False
aqueous_entries  = False
extra_entries    = False

for line in lines:
    if line.startswith('*'):
        docs.append(line)
    elif line.startswith('          minerals that do not undergo phase transitions'):
        no_transition = True
        print('          minerals that do not undergo phase transitions')
    elif line.startswith('          minerals that undergo one phase transition'):
        no_transition  = False
        one_transition = True
        print('          minerals that undergo one phase transition')
    elif line.startswith('          minerals that undergo two phase transitions'):
        no_transition  = False
        one_transition = False
        two_transition = True
        print('          minerals that undergo two phase transitions')
    elif line.startswith('          minerals that undergo three phase transitions'):
        no_transition    = False
        one_transition   = False
        two_transition   = False
        three_transition = True
        print('          minerals that undergo three phase transitions')
    elif line.startswith('          gases'):
        no_transition    = False
        one_transition   = False
        two_transition   = False
        three_transition = False
        gas_entries      = True
        print('          gases')
    elif line.startswith('          aqueous species'):
        no_transition    = False
        one_transition   = False
        two_transition   = False
        three_transition = False
        gas_entries      = False
        aqueous_entries  = True
        print('          aqueous species')
    elif len(line) == 1:
        no_transition    = False
        one_transition   = False
        two_transition   = False
        three_transition = False
        gas_entries      = False
        aqueous_entries  = False
        extra_entries    = True
        print('Found an empty line')
    elif no_transition:
        min_no_trans.append(line)
    elif one_transition:
        min_one_trans.append(line)
    elif two_transition:
        min_two_trans.append(line)
    elif three_transition:
        min_three_trans.append(line)
    elif gas_entries:
        gasses.append(line)
    elif aqueous_entries:
        aqueous.append(line)
    elif extra_entries:
        docs.append(line)

### Generic species block for minerals and gases that do not undergo phase transitions
LINE 1:name                structural chemical formula                    
LINE 2:abbreviation        elemental chemical formula       
LINE 3:reference           date last revisited  
LINE 4:deltaG(cal/mol)     deltaH(cal/mol)     entropy(cal/mol/K)  
LINE 4:volume(cubic cm/mol)  
LINE 5:a(10^0)(cal/mol/K)     b(10^3)(cal/mol/K^2)    c(10^-5)(cal K/mol)  
LINE 6:Tmax (for a,b and c)(K)  

In [None]:
import pandas as pd
import numpy as np

In [None]:
ss = []
headers = ['Name', 'Struct_formula', 'Abbrv', 'Formula', 'Reference', 'Date_entered', 
           'deltaG (cal/m)', 'deltaH (cal/m)', 'S (cal/K-m)', 'V (cc/m)', 
           'a (cal/K-m)', 'b (10^3 cal/K^2-m)', 'c (10^-5 cal-K/m)', 'Tmax (K)']
for i in range(0,len(min_no_trans),6):
    s = []
    for j in range(0,3):
        a = min_no_trans[i+j][ 0:20].strip().replace(" ", "_")
        b = min_no_trans[i+j][20:].strip().replace(" ", "_")
        s.append(a+" ")
        s.append(b+" ")
    for j in range(3,6):
        s.append(min_no_trans[i+j])
    s = ' '.join(s).split()
    s[0] = s[0].lower().capitalize()
    if len(s) != 14:
        print(s)
    ss.append(s)
#ss
min_no_trans_df = pd.DataFrame(ss, columns=headers)

### *min_no_trans_df* is the Pandas dataframe that holds parameters for these phases

In [None]:
print("Number of entries ",len(min_no_trans_df.index))

### Generic species block for minerals that undergo phase transitions
LINE 1:name                structural chemical formula                    
LINE 2:abbreviation        elemental chemical formula       
LINE 3:reference           date last revisited  
LINE 4:deltaG(cal/mol)     deltaH(cal/mol)     entropy(cal/mol/K)     
LINE 4:volume(cubic cm/mol)
LINE 5:ai(10^0)(cal/mol/K)    bi(10^3)(cal/mol/K)     ci(10^-5)(cal K/mol)  
LINE 5:Tti(K)  deltaHti(cal/mol)   deltaVti(cubic cm/mol)    (dP/dT)ti(bar/K)  
LINE 6:an(10^0)(cal/mol/K)    bn(10^3)(cal/mol/K)     cn(10^-5)(cal K/mol)  
LINE 7:Tmaxn (for an,bn and cn)  
### One phase transition ...

In [None]:
ss = []
headers = ['Name', 'Struct_formula', 'Abbrv', 'Formula', 'Reference', 'Date_entered', 
           'deltaG (cal/m)', 'deltaH (cal/m)', 'S (cal/K-m)', 'V (cc/m)', 
           'a (cal/K-m)', 'b (10^3 cal/K^2-m)', 'c (10^-5 cal-K/m)', 
           'Tt1 (K)', 'DeltaHt1 (cal/m)', 'deltaVt1 (cc/m)', 'dPdTt1 (bar/K)',
           'at1 (cal/K-m)', 'bt1 (10^3 cal/K^2-m)', 'ct1 (10^-5 cal-K/m)',
           'Tmax (K)']
for i in range(0,len(min_one_trans),7):
    s = []
    for j in range(0,3):
        a = min_one_trans[i+j][ 0:20].strip().replace(" ", "_")
        b = min_one_trans[i+j][20:].strip().replace(" ", "_")
        s.append(a+" ")
        s.append(b+" ")
    for j in range(3,7):
        s.append(min_one_trans[i+j])
    s = ' '.join(s).split()
    s[0] = s[0].lower().capitalize()
    if len(s) != 21:
        print(s)
    for j in range(6,21):
        if float(s[j]) == 999999.0:
            s[j] = np.nan
    ss.append(s)
#ss
min_one_trans_df = pd.DataFrame(ss, columns=headers)

### *min_one_trans_df* is the Pandas dataframe that holds parameters for these phases

In [None]:
print("Number of entries ", len(min_one_trans_df.index))

### Two phase transitions ...

In [None]:
ss = []
headers = ['Name', 'Struct_formula', 'Abbrv', 'Formula', 'Reference', 'Date_entered', 
           'deltaG (cal/m)', 'deltaH (cal/m)', 'S (cal/K-m)', 'V (cc/m)', 
           'a (cal/K-m)', 'b (10^3 cal/K^2-m)', 'c (10^-5 cal-K/m)', 
           'Tt1 (K)', 'DeltaHt1 (cal/m)', 'deltaVt1 (cc/m)', 'dPdTt1 (bar/K)',
           'at1 (cal/K-m)', 'bt1 (10^3 cal/K^2-m)', 'ct1 (10^-5 cal-K/m)',
           'Tt2 (K)', 'DeltaHt2 (cal/m)', 'deltaVt2 (cc/m)', 'dPdTt2 (bar/K)',
           'at2 (cal/K-m)', 'bt2 (10^3 cal/K^2-m)', 'ct2 (10^-5 cal-K/m)',
           'Tmax (K)']
for i in range(0,len(min_two_trans),8):
    s = []
    for j in range(0,3):
        a = min_two_trans[i+j][ 0:20].strip().replace(" ", "_")
        b = min_two_trans[i+j][20:].strip().replace(" ", "_")
        s.append(a+" ")
        s.append(b+" ")
    for j in range(3,8):
        s.append(min_two_trans[i+j])
    s = ' '.join(s).split()
    s[0] = s[0].lower().capitalize()
    if len(s) != 28:
        print(s)
    for j in range(6,28):
        if float(s[j]) == 999999.0:
            s[j] = np.nan
    ss.append(s)
#ss
min_two_trans_df = pd.DataFrame(ss, columns=headers)

### *min_two_trans_df* is the Pandas dataframe that holds parameters for these phases

In [None]:
print("Number of entries ", len(min_two_trans_df.index))

### Three phase transitions ...

In [None]:
ss = []
headers = ['Name', 'Struct_formula', 'Abbrv', 'Formula', 'Reference', 'Date_entered', 
           'deltaG (cal/m)', 'deltaH (cal/m)', 'S (cal/K-m)', 'V (cc/m)', 
           'a (cal/K-m)', 'b (10^3 cal/K^2-m)', 'c (10^-5 cal-K/m)', 
           'Tt1 (K)', 'DeltaHt1 (cal/m)', 'deltaVt1 (cc/m)', 'dPdTt1 (bar/K)',
           'at1 (cal/K-m)', 'bt1 (10^3 cal/K^2-m)', 'ct1 (10^-5 cal-K/m)',
           'Tt2 (K)', 'DeltaHt2 (cal/m)', 'deltaVt2 (cc/m)', 'dPdTt2 (bar/K)',
           'at2 (cal/K-m)', 'bt2 (10^3 cal/K^2-m)', 'ct2 (10^-5 cal-K/m)',
           'Tt3 (K)', 'DeltaHt3 (cal/m)', 'deltaVt3 (cc/m)', 'dPdTt3 (bar/K)',
           'at3 (cal/K-m)', 'bt3 (10^3 cal/K^2-m)', 'ct3 (10^-5 cal-K/m)',
           'Tmax (K)']
for i in range(0,len(min_three_trans),9):
    s = []
    for j in range(0,3):
        a = min_three_trans[i+j][ 0:20].strip().replace(" ", "_")
        b = min_three_trans[i+j][20:].strip().replace(" ", "_")
        s.append(a+" ")
        s.append(b+" ")
    for j in range(3,9):
        s.append(min_three_trans[i+j])
    s = ' '.join(s).split()
    s[0] = s[0].lower().capitalize()
    if len(s) != 35:
        print(s)
    for j in range(6,35):
        if float(s[j]) == 999999.0:
            s[j] = np.nan
    ss.append(s)
#ss
min_three_trans_df = pd.DataFrame(ss, columns=headers)

### *min_three_trans_df* is the Pandas dataframe that holds parameters for these phases

In [None]:
print("Number of entries ",len(min_three_trans_df.index))

### Gasses ...

In [None]:
ss = []
headers = ['Name', 'Struct_formula', 'Abbrv', 'Formula', 'Reference', 'Date_entered', 
           'deltaG (cal/m)', 'deltaH (cal/m)', 'S (cal/K-m)', 'V (cc/m)', 
           'a (cal/K-m)', 'b (10^3 cal/K^2-m)', 'c (10^-5 cal-K/m)', 'Tmax (K)']
for i in range(0,len(gasses),6):
    s = []
    for j in range(0,3):
        a = gasses[i+j][ 0:20].strip().replace(" ", "_")
        b = gasses[i+j][20:].strip().replace(" ", "_")
        s.append(a+" ")
        s.append(b+" ")
    for j in range(3,6):
        s.append(gasses[i+j])
    s = ' '.join(s).split()
    s[0] = s[0].lower().capitalize()
    s[1] = s[1].lower().capitalize()
    if len(s) != 14:
        print(s)
    ss.append(s)
#ss
gasses_df = pd.DataFrame(ss, columns=headers)

### *gasses_df* is the Pandas dataframe that holds parameters for these phases

In [None]:
print("Number of entries ",len(gasses_df.index))

### Generic species block for aqueous species
LINE 1:name                structural chemical formula                    
LINE 2:abbreviation        elemental chemical formula       
LINE 3:reference           date last revisited  
LINE 4:deltaG(cal/mol)     deltaH(cal/mol)     entropy(cal/mol/K)    
LINE 5:a1(10^1)(cal/mol/bar)  a2(10^-2)(cal/mol)  a3(10^0)(cal K/mol/bar)  
LINE 5:a4(10^-4)(cal K/mol)  
LINE 6:c1(10^0)(cal/mol/K)    c2(10^-4)(cal K/mol)    omega(10^-4)(cal/mol)  
LINE 6:charge  

In [None]:
ss = []
headers = ['Name', 'Struct_formula', 'Abbrv', 'Formula', 'Reference', 'Date_entered', 
           'deltaG (cal/m)', 'deltaH (cal/m)', 'S (cal/K-m)', 
           'a1 (10 cal/bar-m)', 'a2 (10-23 cal/m)', 'a3 (cal-K/bar-m)', 'a4 (10^-4 cal-K/m)',
           'c1 (cal/K-m)', 'c2 (10^-4 cal-K/m)', 'omega (10^-4 cal/m)',
           'charge']
for i in range(0,len(aqueous),6):
    s = []
    for j in range(0,3):
        a = aqueous[i+j][ 0:20].strip().replace(" ", "_")
        b = aqueous[i+j][20:].strip().replace(" ", "_")
        s.append(a+" ")
        s.append(b+" ")
    for j in range(3,6):
        s.append(aqueous[i+j])
    s = ' '.join(s).split()
    s[0] = s[0].lower().capitalize()
    s[1] = s[1].lower().capitalize()
    if len(s) != 17:
        print(s)
    ss.append(s)
#ss
aqueous_df = pd.DataFrame(ss, columns=headers)

### *aqueous_df* is the Pandas dataframe that holds parameters for these phases

In [None]:
print("Number of entries ",len(aqueous_df.index))

## Construct a dictionary of Pandas dataframes with the structure
- key: *mineral_no_transitions*, value: min_no_trans_df (Pandas dataframe)
- key: *mineral_one_transition*, value: min_one_trans_df (Pandas dataframe)
- key: *mineral_two_transitions*, value: min_two_trans_df (Pandas dataframe)
- key: *mineral_three_transitions*, value: min_three_trans_df (Pandas dataframe)
- key: *gasses*, value: gasses_df (Pandas dataframe)
- key: *aqueous_species*, value: aqueous_df (Pandas dataframe)
- key: *documentation*, value: docs (list)
- key: *original_file*, value: lines (list)

In [None]:
slop_d = {'mineral_no_transitions':min_no_trans_df, 
          'mineral_one_transition':min_one_trans_df,
          'mineral_two_transitions':min_two_trans_df,
          'mineral_three_transitions':min_three_trans_df,
          'gasses':gasses_df,
          'aqueous_species':aqueous_df,
          'documentation':docs,
          'original_file':lines}

## Pickle the dictionary ...

In [None]:
import pickle

In [None]:
with open(dname, 'wb') as f:
    pickle.dump(slop_d, f, pickle.HIGHEST_PROTOCOL)