# FOSDIC Deck 195 Decoding

- digital collection of FOSDIC data can be obtained at [https://rda.ucar.edu/datasets/ds258.4/index.html#!sfol-wl-/data/ds258.4](https://rda.ucar.edu/datasets/ds258.4/index.html#!sfol-wl-/data/ds258.4)
- pdf explaining original deck 195 format can be found at [https://icoads.noaa.gov/reclaim/pdf/](https://icoads.noaa.gov/reclaim/pdf/)

## Decoding Details

- cassette / reel information is maintained in headerlines that begin with '00000195', other details about the FOSDIC conversion format not pertaining to the original records are available at [https://rda.ucar.edu/datasets/ds258.4/docs/fosdic_description.txt](https://rda.ucar.edu/datasets/ds258.4/docs/fosdic_description.txt)

- This project is focused on providing ship type information and connecting to meta information for SST/Seawter Temp Bias (but also to look into air bias).  To that end, present weather and sea state will remain as strings but ship number and temperatures will get converted per description.
- Final entry will enclude the complete FOSDIC record entry for future reference
- Location, date/time will also be converted to geocoordinates.
- The only QC that will happen other than conversion checks, is to remove empty entries

In [1]:
import pandas as pd
import numpy as np

import datetime

In [2]:
with open('../../FOSDIC_COPY/fosdic_cd195','r') as f:
    file = f.readlines()

In [105]:
#define some essential conversion functions from the Deck 195 Manual

def wind_dir(x):
    try:
        return int(x)*10
    except:
        return np.nan

def slp(x):
    try:
        return float(x)/100
    except:
        return np.nan
    
def airtemp(x):
    if x[0] == '0':
        try:
            return (float(x))
        except:
            return np.nan
    elif x[0] != ' ':
        try:
            return (-1*float(x[1::]))
        except:
            return np.nan
    else:
        return np.nan
    
def watertemp(x):
        try:
            return (float(x))
        except:
            return np.nan    

def inport(x): #make anything other than blank "TRUE"
    if x[0] != ' ':  
        return True
    else:
        return False

def geo_loc(quadrant,latitude,longitude):
    try:
        if quadrant == '0':
            return [int(latitude),-1*int(longitude)]
        elif quadrant == '1':
            return [int(latitude),int(longitude)]    
        elif quadrant == '2':
            return [-1*int(latitude),-1*int(longitude)]
        elif quadrant == '3':
            return [-1*int(latitude),int(longitude)]
        else:
            return [np.nan,np.nan]
    except:
            return [np.nan,np.nan]
        

In [25]:
counter = 0
data = {}
for count, row in enumerate(file):
    if (row[:8] == '00000195'):
        print(f'Headerlines: rowumber-{count}')
    elif row == '                                                                                \n': #blank line
        continue
    else:
        #create dictionary where key is row number                    
        data.update({counter:{'shipclass':row[0:2],
                    'shipno':row[2:5],
                    'year':row[5:7],
                    'month':row[7:9],
                    'day':row[9:11],
                    'hour':row[11:13],
                    'quadrant':row[13:14],
                    'lat_coded':row[14:16],
                    'lon_coded':row[16:19],
                    'ship_speed_kts':row[19:21], #knots
                    'ship_course_deg':row[21:23], #degrees
                    'wind_dir_deg':wind_dir(row[23:25]), #*10 as its cut to nearest ten degrees
                    'wind_speed_kts':row[25:27], #knots
                    'sealevelpressure_inHg':slp(row[27:31]), #in
                    'drybulb_temperature_degF':airtemp(row[31:34]), #degF
                    'wetbulb_temperature_degF':airtemp(row[34:37]), #degF
                    'water_injection_temperature_degF':watertemp(row[37:39]), #degF
                    'present_wx':row[39:48], #see encoding but this involves multiple fields: wx, clouds, vis
                    'sea_surface_temperature_degF':watertemp(row[48:50]), #degF
                    'sea_and_swell':row[50:56], #see encoding but this involves multiple fields
                    'inport_obs_indicator':inport(row[56:57]), #X=True,blank=False
                    #58 on - not used
                    'latitude_DegN':geo_loc(row[13:14],row[14:16],row[16:19])[0], #+N,+E
                    'longitude_DegE':geo_loc(row[13:14],row[14:16],row[16:19])[1], #+N,+E
                    'datetime':'19'+row[5:7]+'-'+row[7:9]+'-'+row[9:11]+' '+row[11:13]+':00:00',
                    'rawentry_80char':row
                    }})
        counter +=1

        

Headerlines: rowumber-0
Headerlines: rowumber-11900
Headerlines: rowumber-24532
Headerlines: rowumber-36153
Headerlines: rowumber-48354
Headerlines: rowumber-60146
Headerlines: rowumber-72265
Headerlines: rowumber-84308
Headerlines: rowumber-96273
Headerlines: rowumber-108434
Headerlines: rowumber-120708
Headerlines: rowumber-133038
Headerlines: rowumber-145374
Headerlines: rowumber-157128
Headerlines: rowumber-169710
Headerlines: rowumber-181385
Headerlines: rowumber-193689
Headerlines: rowumber-205917
Headerlines: rowumber-217943
Headerlines: rowumber-229640
Headerlines: rowumber-241251
Headerlines: rowumber-252753
Headerlines: rowumber-264969
Headerlines: rowumber-277353
Headerlines: rowumber-289685
Headerlines: rowumber-301329
Headerlines: rowumber-313251
Headerlines: rowumber-325285
Headerlines: rowumber-337365
Headerlines: rowumber-348882
Headerlines: rowumber-360892
Headerlines: rowumber-372911
Headerlines: rowumber-385279
Headerlines: rowumber-397246
Headerlines: rowumber-40921

In [26]:
df = pd.DataFrame.from_dict(data,orient='index')

In [27]:
df.to_csv('../data/FOSDIC_cd195.csv',index=False)

# Some Initial Stats

Number of unique Ship Classes is: 93 
Number of unique Ship Classes with >10 samples is: 76    
*this does have some clearly challenged values still like, -1 & J1 but represents 638678 of 638709 samples*

All Ship Class Groups are shown below

In [82]:
df.groupby('shipclass').groups.keys()

dict_keys(['  ', ' 1', ' 2', ' 4', ' 5', ' 6', ' 8', ' 9', '&&', '-1', '0 ', '00', '01', '02', '03', '06', '07', '08', '09', '1 ', '1*', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '54', '55', '56', '58', '59', '60', '61', '62', '65', '68', '71', '76', '77', '78', '81', '82', '84', '85', '86', '88', '89', '9 ', '90', '91', '94', '95', '96', '97', '98', 'J1', 'Z5'])

And ship counts within each group

In [89]:
for i,g in df.groupby('shipclass'):
    print(i,g.shipno.unique())

   ['   ']
 1 ['111' '145' '265']
 2 ['221']
 4 [' 76' '394' '012']
 5 ['229']
 6 ['048' '012' '106']
 8 ['131']
 9 ['   ']
&& ['&&&' ')3&']
-1 ['318' '292' '015']
0  ['010']
00 ['007']
01 ['061' '041' '040' '034' '038' '045' '046' '042' '033' '544' '044' '060'
 '056' '058' '059' '036' '035' '063' '262']
02 ['130' '027' '039' '072' '070' '136' '068' '071' '073' '028' '024' '029'
 '038' '032' '035' '031' '069']
03 ['002' '001']
06 ['052' '042' '006' '012' '004' '013' '005' '086' '103' '040' '104' '007'
 '011' '010' '066' '060' '091' '046' '047' '065' '055' '048' '080' '087'
 '051' '105' '054' '008' '050' '062' '053' '056' '041' '082' '058' '095'
 '063' '085' '009' ')04' '043']
07 ['004' '031' '006' '013' '015' '025' '002' '003' '017' '024' '021' '016'
 '009' '014' '020' '011' '012']
08 ['029' '011' '026' '025' '034' '013' '020' '031' '016']
09 ['024' '026' '025' '029' '050']
1  ['129' '242']
1* ['184']
10 ['009' '069' '011' '013' '072' '001' '031' '029' '012' '020' '023' '063'
 '071' '0

Notice there are ship numbers that are encoded incorrectly, as well as ship classes encoded incorrectly (non-numeric characters).  We will populate all these as "unidentifiable vessels" even though you may be able to retrieve a few entries via collocation with other entries in the intial fosdic records.

# Prepare for Merging with Vessel Meta Archive

In [100]:
#declare non-numeric ship and class id's as uncrecoverable meta matchable

df['VesselMeta_Availability']=True

for i,g in df.iterrows():
    try:
        float(g.shipno)
    except:
        df.loc[i,'VesselMeta_Availability'] = False
    try:
        float(g.shipclass)
    except:
        df.loc[i,'VesselMeta_Availability'] = False        
        

In [103]:
pdf = df[df['VesselMeta_Availability']]
for i,g in pdf.groupby('shipclass'):
    print(i,g.shipno.unique())

 1 ['111' '145' '265']
 2 ['221']
 4 [' 76' '394' '012']
 5 ['229']
 6 ['048' '012' '106']
 8 ['131']
-1 ['318' '292' '015']
0  ['010']
00 ['007']
01 ['061' '041' '040' '034' '038' '045' '046' '042' '033' '544' '044' '060'
 '056' '058' '059' '036' '035' '063' '262']
02 ['130' '027' '039' '072' '070' '136' '068' '071' '073' '028' '024' '029'
 '038' '032' '035' '031' '069']
03 ['002' '001']
06 ['052' '042' '006' '012' '004' '013' '005' '086' '103' '040' '104' '007'
 '011' '010' '066' '060' '091' '046' '047' '065' '055' '048' '080' '087'
 '051' '105' '054' '008' '050' '062' '053' '056' '041' '082' '058' '095'
 '063' '085' '009' '043']
07 ['004' '031' '006' '013' '015' '025' '002' '003' '017' '024' '021' '016'
 '009' '014' '020' '011' '012']
08 ['029' '011' '026' '025' '034' '013' '020' '031' '016']
09 ['024' '026' '025' '029' '050']
1  ['129' '242']
10 ['009' '069' '011' '013' '072' '001' '031' '029' '012' '020' '023' '063'
 '071' '018' '028' '096' '103' '100' '104' '068' '090' '097' '016

In [120]:
for i,g in pdf.groupby('shipclass'):
    if g.shipno.count() <=5:
        pdf.loc[g.index,'VesselMeta_Availability'] = False

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


This brings us to 637853 samples which is ~1k samples less than the entire dataset when dropping non-numeric entries.  We will try to match these with known vessels from the Deck 195 MetaData reconcstruction project.

There may be other ID's that should be ommited (e.g. -1, or '0 '), and a few that need to be explored (e.g. is '1','01','1 ' all the same or are they different?).  Sample counts that are very low will throw these out regardless.

Vessel Class ID's assumed in error or not worth exploring due to low counts  
**ClassID / Counts:**   
 1 3  
 2 1  
 4 3  
 5 1  
 6 3  
 8 1  
0  1  
00 1  
1  2  
71 1  
 9 1  
 
This brings us to 637834 entries to try to match to known vessels

In [None]:
#Load excel metadata dictionary to merge with remaining sample data