# Homework 9: Getting Familiar with NASA Polynomials
## Due Date:  Tuesday, November 7th at 11:59 PM

Read the NASA Polynomial dataset in raw format and parse and store the data into an .xml file.

### Review of the NASA Polynomials
You can find the NASA Polynomial file in `thermo.txt`.

You can find some details on the NASA Polynomials [at this site](http://combustion.berkeley.edu/gri_mech/data/nasa_plnm.html) in addition to the Lecture 16 notes.


The NASA polynomials for specie $i$ have the form:
$$
    \frac{C_{p,i}}{R}= a_{i1} + a_{i2} T + a_{i3} T^2 + a_{i4} T^3 + a_{i5} T^4
$$

$$
    \frac{H_{i}}{RT} = a_{i1} + a_{i2} \frac{T}{2} + a_{i3} \frac{T^2}{3} + a_{i4} \frac{T^3}{4} + a_{i5} \frac{T^4}{5} + \frac{a_{i6}}{T}
$$

$$
    \frac{S_{i}}{R}  = a_{i1} \ln(T) + a_{i2} T + a_{i3} \frac{T^2}{2} + a_{i4} \frac{T^3}{3} + a_{i5} \frac{T^4}{4} + a_{i7}
$$

where $a_{i1}$, $a_{i2}$, $a_{i3}$, $a_{i4}$, $a_{i5}$, $a_{i6}$, and $a_{i7}$ are the numerical coefficients supplied in NASA thermodynamic files. 

### Some Notes on `thermo.txt`
The first 7 numbers starting on the second line of each species entry (five of the second line and the first two of the third line) are the seven coefficients ($a_{i1}$ through $a_{i7}$, respectively) for the high-temperature range (above 1000 K, the upper boundary is specified on the first line of the species entry). 

The next seven numbers are the coefficients ($a_{i1}$ through $a_{i7}$, respectively) for the low-temperature range (below 1000 K, the lower boundary is specified on the first line of the species entry).

### Additional Specifications
Your final .xml file should contain the following specifications:

1. A `speciesArray` field that contains a space-separated list of all of the species present in the file.
2. Each species contains a `species` field with a `name` attribute as the species name.

    1. For each temperature range, use a sub-field with the minimum and maximum temperature as attributes.
    2. `floatArray` field that contains comma-separated string of each coefficient.
    
You can reference the `example_thermo.xml` file for an example .xml output.

**Hint**: First parse the file into a Python dictionary. 

In [1]:
import xml.etree.ElementTree as ET
from xml.dom.minidom import parseString

In [2]:
# input and output filenames
f_in = 'thermo.txt'
f_out = 'thermo.xml'

# Parse the list of species in .txt to a list of dictionaries
n_words = 5
word_len = 15
coef_set_size = 7
leftover = coef_set_size-n_words
start_offset = 5
species_dict_list = []

with open(f_in) as f:
    species_dict = {}
    high_coef = ''
    low_coef = ''
    lines = f.readlines()
    for l in lines[start_offset:]:
        last_c = l[-2]
        
        if l[:3] == 'END': # end of file
            break
        
        if last_c == '1': # extract name, Tmin, Tmid, Tmax
            species_dict['name'] = l[:word_len].strip()
            T_thresholds = l[3*word_len:].strip().split('  ')[:3]
            species_dict['T_min'] = T_thresholds[0]
            species_dict['T_max'] = T_thresholds[1]
            species_dict['T_mid'] = T_thresholds[2]
        
        elif last_c == '2': # extract the first 5 high-range NASA coefs 
            for i in range(n_words):
                high_coef += l[i*word_len:(i+1)*word_len]
                high_coef += ', '
        
        elif last_c == '3': 
            for i in range(n_words):
                if i < leftover: # extract the last 2 high-range NASA coefs
                    high_coef += l[i*word_len:(i+1)*word_len]
                    high_coef += ', '
                else: # extract the first 3 low-range NASA coefs
                    low_coef += l[i*word_len:(i+1)*word_len]
                    low_coef += ', '
        else:
            for i in range(n_words-1): # extract the last 4 low-range NASA coefs
                low_coef += l[i*word_len:(i+1)*word_len]
                low_coef += ', '
            
            # Store high & low temperature range NASA coefs and the species entry
            species_dict['high_coef'] = high_coef[:-2]
            species_dict['low_coef'] = low_coef[:-2]
            species_dict_list.append(species_dict)
            
            # Clear species_dict and string for coefs for the next species entry 
            species_dict = {}
            high_coef = ''
            low_coef = ''
            

In [3]:
# Build the XML Tree
root = ET.Element('ctml')

phase = ET.SubElement(root, 'phase', {'id': 'gri30'})
speciesArray = ET.SubElement(phase, 'speciesArray', {'datasrc': '#species_data'})

speciesData = ET.SubElement(root, 'speciesData', {'id': 'species_data'})

In [4]:
# Build the list of species tag as children of speciesData
specie_arr_text = ''
for dic in species_dict_list:
    specie_arr_text += dic['name']+' '
    
    species = ET.SubElement(speciesData, 'species', {'name': dic['name']})
    thermo = ET.SubElement(species, 'thermo')
    
    low_NASA = ET.SubElement(thermo, 'NASA', {'Tmin':dic['T_min'], 'Tmax':dic['T_mid'], 'P0':'100000.0'})
    low_floatArray = ET.SubElement(low_NASA, 'floatArray', {'name':'coeffs', 'size':'7'})
    low_floatArray.text = dic['low_coef']
    
    high_NASA = ET.SubElement(thermo, 'NASA', {'Tmin':dic['T_mid'], 'Tmax':dic['T_max'], 'P0':'100000.0'})
    high_floatArray = ET.SubElement(high_NASA, 'floatArray', {'name':'coeffs', 'size':'7'})
    high_floatArray.text = dic['high_coef']

# Write the list of space separated species names in speciesArray.text
speciesArray.text = specie_arr_text.strip()

In [5]:
# Print in pretty format to output file
pretty_xml_str = parseString(ET.tostring(root)).toprettyxml(indent='  ')
with open(f_out, 'w') as f:
    f.write(pretty_xml_str)