### Parsing SeaBASS data from URL
Amir asked me how to parse seabass data. Below is one approach, that I think is general enough.
I use **requests** to get the data from a url
I use **re** to form regex queries to get field names (optionally, units) and figure out when the data parsing shoud start.
Data is read line by line and a dictionary (hash table) is populated with each datum appended to a list corresponding to fieldname key.
*Optionally* the dictionary is loaded onto a **pandas** data which makes data manipulation much easier. I use 3.5 but I believe this should work fine in 2.7.

In [1]:
import re
import requests
import pandas as pd # optional

In [2]:
url='http://seabass.gsfc.nasa.gov/seabass_archive/BIGELOW/BALCH/FERRY_00/2K/s000914w/main-s000914w.txt'

In [3]:
resp = requests.get(url)

In [4]:
stuff = resp.text.splitlines()

In [5]:
resp.close()

In [6]:
# Pre-compute some regex
columns = re.compile('^/fields=(.+)') # to get field/column names
units = re.compile('^/units=(.+)') # to get units -- optional
endHeader = re.compile('^/end_header') # to know when to start storing data

In [7]:
noFields = True
noUnits = True
getData = False
for line in stuff:
    if noFields:
        fieldStr = columns.findall(line)
        if len(fieldStr)>0:
            noFields = False
            fieldList = fieldStr[0].split(',')
            dataDict = dict.fromkeys(fieldList)
            continue # nothing left to do with this line
    if noUnits: # search for units in current line
        unitsStr = units.findall(line)
        if len(unitsStr) > 0:
            noUnits = False
            unitList = unitsStr[0].split(',')
            continue # nothing left to do with this line
    if not getData:
        if endHeader.match(line):
            getData = True
    else:
        dataList = line.split(' ')
        for field,datum in zip(fieldList,dataList):
            if not dataDict[field]:
                dataDict[field] = []
            dataDict[field].append(datum)

In [9]:
df = pd.DataFrame(dataDict,columns=fieldList)

In [10]:
df.head()

Unnamed: 0,year,month,day,hour,minute,second,lat,lon,Wt,sal,...,lt683.9,lsky412.7,lsky444.2,lsky488.9,lsky510.7,lsky554.7,lsky671.1,lsky683.5,senz,relaz
0,2000,9,14,14,6,58,43.77,-66.3495,13.2,31.88,...,0.072,5.56,4.82,3.57,2.91,2.21,0.92,0.762,140,119.27
1,2000,9,14,14,10,14,43.7681,-66.3727,13.7,32.23,...,0.0703,5.63,4.88,3.62,2.97,2.25,0.968,0.772,140,118.384
2,2000,9,14,14,13,20,43.7662,-66.3938,13.65,32.2,...,0.0715,5.67,4.92,3.65,3.03,2.28,1.02,0.778,140,116.894
3,2000,9,14,14,16,23,43.7641,-66.4145,14.37,32.26,...,0.0673,5.72,4.96,3.68,3.1,2.31,1.07,0.79,140,115.158
4,2000,9,14,14,19,20,43.7626,-66.4351,14.69,32.31,...,0.0645,5.71,4.94,3.66,3.06,2.29,1.02,0.783,140,116.57


In [11]:
df.shape

(152, 64)