**Author**: Jesse Woo

This notebook saves all output files into a pickle buffer list file. 

In [1]:
import os
import glob
import ast
import pickle
import numpy as np
from Bio import SeqIO

In [2]:
output_folder_path = "/../team_neural_network/code/utility/Documentation_temp/1b_output" + "/"
path_to_buffer_file = "/../team_neural_network/code/utility/Documentation_temp/1b_buffer" + "/" + "1000_random_sequence_buffer.txt" 
# NOTE: the buffer file need not to be created beforehands. Just write the path
#       and the file name here. The file would be created by the system.

The following cell reads in one-hot encoding files as a list `seq_record_list`.

In [3]:
all_txts = glob.glob(output_folder_path + '*.txt')
seq_record_list = []
i = 0
# Iterate through all one-hot encoding files
for txt_ in all_txts:
    i += 1
    print("files processed: " + str(i))
    with open(txt_, encoding='utf-8') as f:
        # attach the one-hot encoding information of this file to the end of seq_record_list
        seq_record_list += ast.literal_eval(f.read())
print("All Files have been processed! The number of distinct sequences are: " + str(len(seq_record_list)))

files processed: 1
files processed: 2
All Files have been processed! The number of distinct sequences are: 2


The following cell saves `seq_record_list` as a `pickle` buffer so that it can be retreated much faster next time.

In [None]:
with open(path_to_buffer_file, "wb") as buff:
    pickle.dump(seq_record_list, buff)

In [4]:
with open("/../team_neural_network/code/utility/Documentation_temp/1b_buffer/1b_buffer.txt", "rb") as buffer_file:
    print(pickle.load(buffer_file))

[['Region_ID_1|1|dkik|-|2537', '1', 'dkik', [[0, 0, 1, 0], [0, 1, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0], [0, 0, 0, 1], [0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 0, 1], [1, 0, 0, 0], [0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 0, 1], [1, 0, 0, 0]]], ['Region_ID_2|1|dkik|-|2500', '1', 'dkik', [[0, 0, 1, 0], [0, 1, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0], [0, 0, 0, 1], [1, 0, 0, 0], [0, 1, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1], [0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 0, 1], [1, 0, 0, 0]]]]
