BB3 Site Separator
Written by Sebastian DiGeronimo

**Important**
This script should read a BB3 file that contains many stations (sometimes many cruises) and separate them into stations. The file format is MBON_MM_DD_YYYYTHHmmss_site_ID (i.e MBON_07_01_20T183009_site_KW2). 

I've tried to circumvent errors by adding exceptions to the list, some examples of errors:
* 01/07/20	10:25:31	470	118	532	128	650	170	536
  01/07/20	10:25:32	470	115	532	147	65:34	470	115	532	136	650	201	536 <-- error

* 01/07/20	10:25:59	470	120	532	200	650	165	534 
  01/07/20	1:00	470	132	532	196	650	450	534 <-- error
  
* 01/07/20	10:26:52	470	113	532	127	650	159	532
  01/07/20	107/20	10:26:56	470	124	532	127	650	155	532 <-- error
  
* 01/07/20	10:27:19	470	107	532	145	650	236	532
  01/07/20	10:27:20	470	110	532	 <-- error
 
* 01/07/20	11:34:51	470	54	532	80	650	364	536
  01/07/20	1132	82	650	362	536 <-- error
  
* 01/07/20	11:36:12	470	92	532	102	650	118	533
  0	650	149	534 <-- error
  
Some problems I don't know how to solve is if the time stamp jumps by more than 5 min thus creating a new station file, when the line after goes back to the previous file:
01/07/20	11:36:36	470	91	532	92	650	113	533
01/07/20	11:46:37	470	91	532	98	650	110	533 <-- this will create a new file, even though its just an error 
01/07/20	11:36:38	470	92	532	94	650	108	532

In [None]:
from datetime import timedelta
from datetime import datetime
import os
from parse import parse

**Definition** to check if a file path for a cruise exist already, if not will make one

In [None]:
def ensure_dir(file_path):
    directory = os.path.dirname(file_path)
    if not os.path.exists(directory):
        os.makedirs(directory)

You may customize the folder of input, usually should be a cruise ID (i.e. WS18256 for Walton Smith 2018 Julian Day)

In [None]:
folder_cust = input("do you want a custom folder name (yes|y|YES)? (default is BB3/)")
if folder_cust == "yes" or folder_cust == "y" or folder_cust == "YES":
    folder_name = input("what do you want directory to be?") + '/'
else:
    folder_name = 'BB3/'        # comment out if want to costumize folder
ensure_dir(folder_name)

This sets the format of the datatime that is read in from the file as well as other parameters for later

In [None]:
# sets datetime as month/day/year hour:minute:second (i.e 7/25/2018 13:35:23)
dt_fmt = "%m/%d/%y  %H:%M:%S"

# is a place holder to use the previous date to check against
dt_prev = ""

# place holder for the file name to be created if has not a read a new station
file_name = ''

# format to read each line to check for datetime due to errors in lines when downloading
# if does not match this will ignore that line
line_fmt = (
    '{:2d}/{:2d}/{:2d}	{:2d}:{:2d}:{:2d}	'
    '{:3d}	{:4d}	{:3d}	{:4d}	{:3d}	{:4d}	{:3d}'
)

This allows you to choose a file to run. 
**This could be improved by having it search all files in a directory within the else statement** Currently, you need can put in file *a prior*

In [None]:
choose_file = input("do you want a specific file ran (yes|y|YES)?")
if choose_file == "yes" or choose_file == "y" or choose_file == "YES":
    sample_file_name = input('Whats the file name?')
else:
    sample_file_name = 'WS20006_full_download.raw'

# open sample file
f = open(sample_file_name,"r")

Read a file

In [None]:
# use readline() to read the first line
line_of_text = f.readline()

while line_of_text:
        parsed_line = parse(line_fmt, line_of_text)
        # print(parsed_line)
        # this checks if the date is in correct format, assumes data will be correct if date is
        try:
            time_of_sample = "{}/{}/{} {}:{}:{}".format(
                parsed_line[0], parsed_line[1], parsed_line[2],
                parsed_line[3], parsed_line[4], parsed_line[5]
            )
            dt = datetime.strptime(time_of_sample, dt_fmt)
        except Exception:  # the errors involves looping the error 46 times, then continuing
            #print(time_of_sample)
            # puts error lines into a file
            error_file = open('files_with_errors.txt', "a")
            error_file.write(line_of_text)
            line_of_text = f.readline()
            continue

        # if open dt_prev, will take current dt and subtract 1 sec to use a comparison
        if dt_prev == "":
            dt_prev = dt - timedelta(seconds=1)
        dt_current = dt

        # checks to make sure time is not negative, would mean error 
        #(i.e. if subtract 2 times and the results is negative time, 1:00:00 - 2:00:00 = -1:00:00)
        
        # TODO: make a way to check the next two to see if the time difference goes back down
        #  an error where the time will jump up more than 10 minutes but then jump back down,
        #  so starts to create another file, but then the time is negative (IGNORE if doesn't make sense)
        if dt_current - dt_prev < timedelta(milliseconds=0):
            print(dt_current, dt_prev) # shows the time of error and will help with diagnosing later
            error_file = open('files_with_errors.txt', "a")
            error_file.write(line_of_text)

            line_of_text = f.readline()
        
        # will look at one line by line to see if they are 5 mins apart. I assume that if > 5 mins will be a new site
        elif dt_current - dt_prev < timedelta(minutes=5):
            # for the next iteration, sets current to prev
            dt_prev = dt_current
            
            # if file name for a station does not exist will create one where you can input the station name
            if file_name == "":
                i = dt_current.strftime("%d_%m_%yT%H%M%S")
                # this is so can look up timestamp on spreadsheet and label site
                print('Old time: ' + str(dt_current))
                # edit in later, to name site during run
                site = input("What is the site ID?")
                file_name = 'MBON_{}_site_{}.txt'.format(i, site)
                newfile = open(folder_name + file_name, "a")
                newfile.write(line_of_text)
            else:
                newfile = open(folder_name + file_name, "a")
                newfile.write(line_of_text)
                
        # if >5 mins will allow a new start for a new station
        else:
            dt_prev = dt_current
            i = dt_current.strftime("%d_%m_%yT%H%M%S")
            # this is so can look up timestamp on spreadsheet and label site
            print('Old time: ' + str(dt_current))
            # edit in later, to name site during run
            site = input("What is the site ID?")
            #site = site + 1         #edit out later
            file_name = 'MBON_{}_site_{}.txt'.format(i,site)
            newfile = open(folder_name + file_name, "a")
            newfile.write(line_of_text)
        line_of_text = f.readline()
f.close()