<div align="center">

# 🚀 Spearecode Preprocessing 🚀

</div>

<br>

Welcome to the **Spearecode Preprocessing Notebook**! This notebook will guide you through the necessary preprocessing steps to prepare a toy dataset for Language Model training. We will focus on making the dataset more suitable for training by performing the following steps:

1. 📚 **Loading the dataset**: We'll start by importing the dataset from a file or external source.
2. 📦 **Chunking the text**: The dataset will be divided into smaller chunks or segments, making it easier to process during training.
3. 💬 **Tokenization**: Each chunk of text will be split into individual tokens (words or subwords), which are the basic units for language models.
4. 📊 **Basic Exploratory Data Analysis (EDA)**: We'll analyze the dataset's characteristics, such as token frequency, to gain insights and identify potential issues.

After completing the preprocessing and EDA, the toy dataset will be converted into `TFRecords` format. This efficient binary format is designed for use with TensorFlow and will enable seamless integration with your Language Model training pipeline.

Let's dive in and start preprocessing the dataset! 🎉


<br><br>

<div align="center">

# 🌟 Table of Contents 🌟

</div>

---

0. [**Setup**](#setup)
1. [**Loading the Dataset**](#loading-the-dataset)
2. [**Chunking the Text**](#chunking-the-text)
3. [**Tokenization**](#tokenization)
4. [**Basic Exploratory Data Analysis (EDA)**](#basic-eda)
5. [**Converting to TFRecords**](#converting-to-tfrecords)

---



<br>

<div align="center">

## 🛠️ Setup <a name="setup"></a>

</div>

<br>

In this section, we'll import required libraries and methods from our utilities file. We will also define relevant paths and high level information we may need later. We also run a few basic Tensorflow setup steps to ensure optimal and reproducible runs.

In [1]:
# !pip install --upgrade tokenizer-viz

# Regular imports (native python and pypi packages)
import os
import sys
import random
import numpy as np
import pandas as pd
from glob import glob
import tensorflow as tf
import sentencepiece as spm
from IPython.display import HTML, display
from tokenizer_viz import TokenVisualization
from tqdm.notebook import tqdm; tqdm.pandas()

# Add project root into path so imports work
PROJECT_DIR = os.path.dirname(os.getcwd())
sys.path.insert(0, PROJECT_DIR) 

# Our project imports
from spearecode.utils.preprocessing_utils import (
    load_from_txt_file, preprocess_shakespeare, save_to_txt_file, print_check_speare
)
from spearecode.utils.general_utils import (
    tf_xla_jit, tf_set_memory_growth, seed_it_all, flatten_l_o_l, print_ln
)
from spearecode.utils.filtering_utils import (
    save_ds_version, drop_str_from_col_names, pad_truncate_centered,
    get_metadata_df, check_chunks, tokenize, get_n_tokens,
    get_n_lines, get_n_chars
)
from spearecode.utils.tfrecord_utils import write_tfrecords, load_tfrecord_dataset


### DEFINE PATHS --- [PROJECT_DIR="/home/paperspace/home/spearecode"] --- ###
NBS_PATH = os.path.join(PROJECT_DIR, "nbs")
DATA_PATH = os.path.join(PROJECT_DIR, "data")
SS_TEXT_PATH = os.path.join(DATA_PATH, "t8.shakespeare.txt")
PREPROCESSED_FULL_TEXT_PATH = SS_TEXT_PATH.replace(".txt", "_preprocessed.txt")

<br>

<div align="center">

## 📚 Loading the Dataset <a name="loading-the-dataset"></a>

</div>

<br>

In this section, we'll import the dataset from a file or external source. The dataset will be read into memory, allowing us to manipulate and process the text as needed throughout the preprocessing steps.


In [2]:
raw_text = load_from_txt_file(SS_TEXT_PATH)
ss_text = preprocess_shakespeare(raw_text)
save_to_txt_file(ss_text, PREPROCESSED_FULL_TEXT_PATH)
print_check_speare(ss_text)


... DATASET INFO:
	NUMBER OF CHARS --> 5,419,872
	NUMBER OF LINES --> 120,696


... FIRST 1000 CHARACTERS:



1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender heir might bear his memory:
  But thou contracted to thine own bright eyes,
  Feed'st thy light's flame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bud buriest thy content,
  And tender churl mak'st waste in niggarding:
    Pity the world, or else this glutton be,
    To eat the world's due, by the grave and thee.

2
  When forty winters shall besiege thy brow,
  And dig deep trenches in thy beauty's field,
  Thy youth's proud livery so gazed on now,
  Will be a tattered weed of small worth held:  
  Then being asked, where all thy beauty lies,
  Where al

<br>

<div align="center">

## 📦 Chunking the Text <a name="chunking-the-text"></a>

</div>

<br>

Once the dataset is loaded, we'll divide it into smaller chunks or segments. This step is crucial for making the dataset more similar to code files (which is the type of data we will be using during the other parallel streams).

I implement two simple methods:
1. A basic double newline split **(`\n\n`)** resulting in 6294 chunks
2. Using Langchain RecursiveTextSplitter to chunk to a particular text length
    * This allows us to specify our desired text length and even overlap the chunks.
        * Note we allow for a small amount of overlap and this may cause some leakage... but whatever.
    * **We will use this method for our purposes.**
    


In [3]:
def do_rcts_chunking(text, chunk_size=1024, chunk_overlap=128, length_fn=len):
    """
    Perform Recursive Character Text Splitting (RCTS) chunking on the input text.
    
    Args:
        text (str): The input text to be chunked.
        chunk_size (int): The maximum size of each chunk.
        chunk_overlap (int): The number of overlapping characters between adjacent chunks.
        length_fn (callable, optional): Function to calculate the length of the text. Defaults to len.
    
    Returns:
        list: A list of chunked text segments.
    """
    # Import the RecursiveCharacterTextSplitter from langchain.text_splitter module
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    
    # Instantiate the text splitter with the specified parameters
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=length_fn,
    )
    
    # Split the input text into chunks
    docs = text_splitter.create_documents([text])
    
    # Return the list of chunked text segments
    return [x.page_content for x in docs if len(x.page_content)>1]

def do_basic_chunking(text, chunk_delimeter="\n\n", max_length=1800, min_length=300):
    """
    Perform basic chunking on the input text using the specified delimiter.
    
    Args:
        text (str): The input text to be chunked.
        chunk_delimeter (str, optional): The delimiter used to split the text. Defaults to "\n\n".
    
    Returns:
        list: A list of chunked text segments.
    """
    # Split the input text based on the specified delimiter (ensure no empty chunks by stripping from ends)
    raw_docs = text.strip(chunk_delimeter).split(chunk_delimeter)
    tmp_docs = []
    docs = []
    
    while len(raw_docs)>0:
        doc = raw_docs.pop()
        
        if len(doc)>max_length:
            raw_docs+=doc.split("\n")
        elif len(doc)<min_length:
            tmp_docs.append(doc)
        else:
            docs.append(doc)
            
        if len("\n".join(tmp_docs))>min_length:
            docs.append("\n".join(tmp_docs))
            tmp_docs = []
    if tmp_docs:
        docs.append("\n".join(tmp_docs))
    
    # Return the list of chunked text segments
    return docs

In [4]:
# Feel free to pass non-default kwargs 
#    -- otherwise the rcts chunks will overlap by 64 and be 512 characters long
CHUNK_STYLE = "basic" # one of ['basic' | 'rcts']
basic_chunks = do_basic_chunking(ss_text)
rcts_chunks = do_rcts_chunking(ss_text)

print("\n... FIRST BASIC CHUNK ...\n")
print(basic_chunks[0])

print("\n... FIRST RCTS CHUNK ...\n")
print(rcts_chunks[0])

print("\n... EXAMPLE RANDOM BASIC CHUNK ...\n")
print(random.sample(basic_chunks, 1)[0])

print("\n... EXAMPLE RANDOM RCTS CHUNK ...\n")
print(random.sample(rcts_chunks, 1)[0])

print("\n... LAST BASIC CHUNK ...\n")
print(basic_chunks[-1])

print("\n... LAST RCTS CHUNK ...\n")
print(rcts_chunks[-1])



... FIRST BASIC CHUNK ...

  'O, that infected moisture of his eye,
  O, that false fire which in his cheek so glowed,
  O, that forced thunder from his heart did fly,
  O, that sad breath his spongy lungs bestowed,
  O, all that borrowed motion, seeming owed,
  Would yet again betray the fore-betrayed,
  And new pervert a reconciled maid.'

... FIRST RCTS CHUNK ...

1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender heir might bear his memory:
  But thou contracted to thine own bright eyes,
  Feed'st thy light's flame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bud buriest thy content,
  And tender churl mak'st waste in niggarding:
    Pity the world, or else this glutton be,
    To eat the world's due, by the g

<br>

<div align="center">

## 💬 Tokenization <a name="tokenization"></a>

</div>

<br>

In this section, we'll tokenize the text, which involves splitting the chunks into individual tokens (words or subwords). Tokenization is an essential step in preprocessing, as it helps the Language Model understand the basic units of the text and learn meaningful patterns.

* We will train our tokenizer on the non-chunked dataset (after basic preprocessing), however, we will 


In [5]:
# Setup model directory if not already setup
MODEL_DIR = os.path.join(os.path.dirname(DATA_PATH), "models")
if not os.path.isdir(MODEL_DIR): os.makedirs(MODEL_DIR, exist_ok=True)

# User defined parameters (matching alphafold and code tokenization standards)
MODEL_PATH = os.path.join(MODEL_DIR, 'spearecode')
USER_DEFINED_SYMBOLS = ["\n","\t","\r","\f","\v", "[MASK]"]
VOCAB_SIZE = 8_000
CHAR_COVERAGE = 1.0000
PREPEND_CLS = False
if PREPEND_CLS: USER_DEFINED_SYMBOLS.append(["CLS"])


# Tokenizer parameters (and some defaults)
base_tokenizer_kwargs = dict(
    input = PREPROCESSED_FULL_TEXT_PATH,
    vocab_size=VOCAB_SIZE,
    character_coverage=CHAR_COVERAGE,
    pad_id=0, unk_id=1, bos_id=2, eos_id=3,
    remove_extra_whitespaces=False,
    allow_whitespace_only_pieces=True,
    add_dummy_prefix=False,
    user_defined_symbols=USER_DEFINED_SYMBOLS,
    normalization_rule_name="identity",
    num_threads=os.cpu_count(),
)

unigram_tokenizer_kwargs = base_tokenizer_kwargs.copy()
unigram_tokenizer_kwargs.update(dict(
    model_prefix=MODEL_PATH+"_uni",
    model_type="unigram",
))

bpe_tokenizer_kwargs = base_tokenizer_kwargs.copy()
bpe_tokenizer_kwargs.update(dict(
    model_prefix=MODEL_PATH+"_bpe",
    model_type="bpe",
))

# train_tokenizer(ALL_TXT_PATHS, MODEL_PATH, VOCAB_SIZE, TOKENIZER_STYLE)
spm.SentencePieceTrainer.Train(**unigram_tokenizer_kwargs)
spm.SentencePieceTrainer.Train(**bpe_tokenizer_kwargs)

sp_uni = spm.SentencePieceProcessor()
sp_uni.load(f'{unigram_tokenizer_kwargs["model_prefix"]}.model')
uni_encoder = lambda x: sp_uni.encode(x)
uni_decoder = lambda x: sp_uni.decode(x)

sp_bpe = spm.SentencePieceProcessor()
sp_bpe.load(f'{bpe_tokenizer_kwargs["model_prefix"]}.model')
bpe_encoder = lambda x: sp_bpe.encode(x)
bpe_decoder = lambda x: sp_bpe.decode(x)

sentencepiece_trainer.cc(77) LOG(INFO) Starts training with : 
trainer_spec {
  input: /home/paperspace/home/spearecode/data/t8.shakespeare_preprocessed.txt
  input_format: 
  model_prefix: /home/paperspace/home/spearecode/models/spearecode_uni
  model_type: UNIGRAM
  vocab_size: 8000
  self_test_sample_size: 0
  character_coverage: 1
  input_sentence_size: 0
  shuffle_input_sentence: 1
  seed_sentencepiece_size: 1000000
  shrinking_factor: 0.75
  max_sentence_length: 4192
  num_threads: 8
  num_sub_iterations: 2
  max_sentencepiece_length: 16
  split_by_unicode_script: 1
  split_by_number: 1
  split_by_whitespace: 1
  split_digits: 0
  treat_whitespace_as_suffix: 0
  allow_whitespace_only_pieces: 1
  user_defined_symbols: 

  user_defined_symbols: 	
  user_defined_symbols: 
  user_defined_symbols: 
  user_defined_symbols: 
  user_defined_symbols: [MASK]
  required_chars: 
  byte_fallback: 0
  vocabulary_output_piece_score: 1
  train_extremely_large_corpus: 0
  hard_vocab_limit: 1
  

In [6]:
print("\n... BPE TOKENIZATION:")
bpe_token_viz = TokenVisualization(
    encoder=bpe_encoder,
    decoder=bpe_decoder,
    background_color="#FBFBFB"
)
_ = bpe_token_viz.visualize(basic_chunks[0], display_inline=True)

print("\n... UNIGRAM TOKENIZATION:")
uni_token_viz = TokenVisualization(
    encoder=uni_encoder,
    decoder=uni_decoder,
    background_color="#FBFBFB"
)
_ = uni_token_viz.visualize(basic_chunks[0], display_inline=True)


### FOR FUN AND VIZ ###
print("\n... CHAR TOKENIZATION:")
dumb_char_map_s2i = {x:i for i, x in enumerate(set(basic_chunks[0]))}
dumb_char_map_i2s = {v:k for k,v in dumb_char_map_s2i.items()}
char_token_viz = TokenVisualization(
    encoder=lambda x: [dumb_char_map_s2i.get(_x) for _x in x] if type(x)==str else dumb_char_map_s2i.get(x),
    decoder=lambda x: "".join([dumb_char_map_i2s.get(_x) for _x in x]) if type(x)==list else dumb_char_map_i2s.get(x),
    background_color="#FBFBFB"
)
_ = char_token_viz.visualize(basic_chunks[0], display_inline=True)

tive symbols. max_freq=62 min_freq=36
bpe_model_trainer.cc(258) LOG(INFO) Added: freq=61 size=3920 all=37742 active=1946 piece=▁fle
bpe_model_trainer.cc(258) LOG(INFO) Added: freq=61 size=3940 all=37797 active=2001 piece=▁weigh
bpe_model_trainer.cc(258) LOG(INFO) Added: freq=60 size=3960 all=37856 active=2060 piece=▁Qu
bpe_model_trainer.cc(258) LOG(INFO) Added: freq=60 size=3980 all=37951 active=2155 piece=itude
bpe_model_trainer.cc(258) LOG(INFO) Added: freq=60 size=4000 all=37987 active=2191 piece=▁anger
bpe_model_trainer.cc(167) LOG(INFO) Updating active symbols. max_freq=60 min_freq=35
bpe_model_trainer.cc(258) LOG(INFO) Added: freq=60 size=4020 all=37986 active=1898 piece=▁breathe
bpe_model_trainer.cc(258) LOG(INFO) Added: freq=59 size=4040 all=38049 active=1961 piece=ience
bpe_model_trainer.cc(258) LOG(INFO) Added: freq=59 size=4060 all=38073 active=1985 piece=▁lawful
bpe_model_trainer.cc(258) LOG(INFO) Added: freq=58 size=4080 all=38145 active=2057 piece=▁cat
bpe_model_trainer.c


... BPE TOKENIZATION:



... UNIGRAM TOKENIZATION:



... CHAR TOKENIZATION:


<br>

<div align="center">

## 📊 Basic Exploratory Data Analysis (EDA) <a name="basic-eda"></a>

</div>

<br>

Here, we'll perform a basic EDA on the dataset to gain insights and identify potential issues. This analysis may include examining token frequency, distribution of chunk lengths, and other relevant characteristics. This information can be helpful in understanding the dataset's structure and guiding further preprocessing decisions.

We will utilize the metadata columns we create to create different versions of the dataset:

<br>

<table>
  <thead>
    <tr>
      <th style="text-align: center; font-weight: bold; width: 15%;">Version</th>
      <th style="text-align: center; font-weight: bold; width: 85%;">Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center;"><strong>v1</strong></td>
        <td>No filtering, no chunks removed. We just generate metadata and (optional) prepend a <b>[CLS]</b> token to every chunk.</td>
    </tr>
    <tr>
      <td style="text-align: center;"><strong>v2</strong></td>
      <td>Split into individual datasets for bpe, unigram within each chunking technique (4 datasets total)<br>Rename columns to not specify tokenization method to allow for more generalization across interaction</td>
    </tr>
    <tr>
      <td style="text-align: center;"><strong>v3</strong></td>
      <td>Drop small chunks<br>Drop really big chunks</td>
    </tr>
    <tr>
      <td style="text-align: center;"><strong>v4</strong></td>
      <td>Evenly pad/truncated tokenized sequences up to reasonable length (close to max length --> 90th percentile?)</td>
    </tr>
  </tbody>
</table>

<br>



In [7]:
# 1. Setup - Create directories if necessary    
DATASET_DIR = os.path.join(DATA_PATH, "datasets")
if not os.path.isdir(DATASET_DIR): os.makedirs(DATASET_DIR, exist_ok=True)
META_DIR = os.path.join(DATASET_DIR, "meta")
if not os.path.isdir(META_DIR): os.makedirs(META_DIR, exist_ok=True)


# 2. Dataframe and metadata creation
#       - Instantiate
#       - Create metadata columns
#       - Create metadata dataframe (optional)
basic_chunk_df = pd.DataFrame({"content":basic_chunks})
rcts_chunk_df = pd.DataFrame({"content":rcts_chunks})

for _df in [basic_chunk_df, rcts_chunk_df]:
    if PREPEND_CLS: _df["content"] = "[CLS] "+_df["content"]
    _df["uni_token_content"] = _df["content"].progress_apply(lambda x: tokenize(x, uni_encoder))
    _df["bpe_token_content"] = _df["content"].progress_apply(lambda x: tokenize(x, bpe_encoder))
    _df["n_uni_tokens"] = _df["uni_token_content"].apply(get_n_tokens)
    _df["n_bpe_tokens"] = _df["bpe_token_content"].apply(get_n_tokens)
    _df["n_chars"] = _df["content"].apply(get_n_chars)
    _df["n_lines"] = _df["content"].apply(get_n_lines)
    _df["valid_uni_chunk"] = _df["n_uni_tokens"].apply(check_chunks)
    _df["valid_bpe_chunk"] = _df["n_bpe_tokens"].apply(check_chunks)

basic_chunk_df_meta = get_metadata_df(basic_chunk_df)
rcts_chunk_df_meta = get_metadata_df(rcts_chunk_df)


# 3. Versioning
    
######################################## v1 ########################################
# Save the previously created datasets along with the manually created metadata
####################################################################################
save_ds_version(rcts_chunk_df, "rcts", version_str="v1", meta_dir=META_DIR, ds_dir=DATASET_DIR, meta_df=rcts_chunk_df_meta)
save_ds_version(basic_chunk_df, "basic", version_str="v1", meta_dir=META_DIR, ds_dir=DATASET_DIR, meta_df=basic_chunk_df_meta)
####################################################################################

######################################## v2 ########################################
# Split bpe and unigram into their own dataframes (meta is generated automatically)
####################################################################################
rcts_uni_chunk_df = rcts_chunk_df.copy().drop(columns=[_c for _c in rcts_chunk_df.columns if "bpe" in _c])
basic_uni_chunk_df = basic_chunk_df.copy().drop(columns=[_c for _c in basic_chunk_df.columns if "bpe" in _c])
rcts_bpe_chunk_df = rcts_chunk_df.copy().drop(columns=[_c for _c in rcts_chunk_df.columns if "uni" in _c])
basic_bpe_chunk_df = basic_chunk_df.copy().drop(columns=[_c for _c in basic_chunk_df.columns if "uni" in _c])

# Rename columns
rcts_uni_chunk_df = drop_str_from_col_names(rcts_uni_chunk_df, "uni")
rcts_bpe_chunk_df = drop_str_from_col_names(rcts_bpe_chunk_df, "bpe")
basic_uni_chunk_df = drop_str_from_col_names(basic_uni_chunk_df, "uni")
basic_bpe_chunk_df = drop_str_from_col_names(basic_bpe_chunk_df, "bpe")

save_ds_version(rcts_uni_chunk_df, "rcts_uni", version_str="v2", meta_dir=META_DIR, ds_dir=DATASET_DIR)
save_ds_version(rcts_bpe_chunk_df, "rcts_bpe", version_str="v2", meta_dir=META_DIR, ds_dir=DATASET_DIR)
save_ds_version(basic_uni_chunk_df, "basic_uni", version_str="v2", meta_dir=META_DIR, ds_dir=DATASET_DIR)
save_ds_version(basic_bpe_chunk_df, "basic_bpe", version_str="v2", meta_dir=META_DIR, ds_dir=DATASET_DIR)
####################################################################################

######################################## v3 ########################################
# Filtering (big and little get dropped)
####################################################################################

# Filter and drop valid chunk col
for _df in [rcts_uni_chunk_df, rcts_bpe_chunk_df, basic_uni_chunk_df, basic_bpe_chunk_df]:
    _df = _df[_df.valid_chunk].drop(columns=["valid_chunk"]).reset_index(drop=True)

# Save
save_ds_version(rcts_uni_chunk_df, "rcts_uni", version_str="v3", meta_dir=META_DIR, ds_dir=DATASET_DIR)
save_ds_version(rcts_bpe_chunk_df, "rcts_bpe", version_str="v3", meta_dir=META_DIR, ds_dir=DATASET_DIR)
save_ds_version(basic_uni_chunk_df, "basic_uni", version_str="v3", meta_dir=META_DIR, ds_dir=DATASET_DIR)
save_ds_version(basic_bpe_chunk_df, "basic_bpe", version_str="v3", meta_dir=META_DIR, ds_dir=DATASET_DIR)
####################################################################################

# ######################################## v4 ########################################
# # Padding and truncation --> basic upper limit of 384 (assuming context lengths of 64-128)
# ####################################################################################
FIXED_CHUNK_SIZE = 384

for _df in [rcts_uni_chunk_df, rcts_bpe_chunk_df, basic_uni_chunk_df, basic_bpe_chunk_df]:
    _df["token_content"] = _df["token_content"].apply(lambda x: pad_truncate_centered(x, FIXED_CHUNK_SIZE))

save_ds_version(rcts_uni_chunk_df, "rcts_uni", version_str="v4", meta_dir=META_DIR, ds_dir=DATASET_DIR)
save_ds_version(rcts_bpe_chunk_df, "rcts_bpe", version_str="v4", meta_dir=META_DIR, ds_dir=DATASET_DIR)
save_ds_version(basic_uni_chunk_df, "basic_uni", version_str="v4", meta_dir=META_DIR, ds_dir=DATASET_DIR)
save_ds_version(basic_bpe_chunk_df, "basic_bpe", version_str="v4", meta_dir=META_DIR, ds_dir=DATASET_DIR)
# ####################################################################################

  0%|          | 0/13896 [00:00<?, ?it/s]

  0%|          | 0/13896 [00:00<?, ?it/s]

  0%|          | 0/7699 [00:00<?, ?it/s]

  0%|          | 0/7699 [00:00<?, ?it/s]

<br>

<div align="center">

## 💾 Converting to TFRecords <a name="converting-to-tfrecords"></a>

</div>

<br>

Finally, after completing the preprocessing steps and EDA, we'll convert the toy dataset into the `TFRecords` format. This efficient binary format is designed for use with TensorFlow and will enable seamless integration with your Language Model training pipeline.



In [8]:
# Define tfrecord creation constants
TFRECORD_DIR = os.path.join(DATASET_DIR, "tfrecords")
N_PER = 100 # artificially low to replicate tfrecord amounts expected
VERSION_TO_USE = "v4"

for _df, _suffix in zip([rcts_uni_chunk_df, rcts_bpe_chunk_df, basic_uni_chunk_df, basic_bpe_chunk_df], 
                        ["rcts_uni", "rcts_bpe", "basic_uni", "basic_bpe"]):
    # Create the respective tfrecords
    write_tfrecords(
        ds=_df["token_content"],  n_ex=len(_df),  
        output_suffix=_suffix,  version_str=VERSION_TO_USE, 
        n_ex_per_rec=N_PER, out_dir=TFRECORD_DIR, 
    )

Writing TFRecords:   0%|                                 | 0/77 [00:00<?, ?it/s]


... Writing TFRecord 1 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14012.78it/s][A



... Writing TFRecord 2 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13551.43it/s][A



... Writing TFRecord 3 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13672.47it/s][A



... Writing TFRecord 4 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13768.97it/s][A



... Writing TFRecord 5 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13846.70it/s][A



... Writing TFRecord 6 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13750.01it/s][A



... Writing TFRecord 7 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13438.97it/s][A



... Writing TFRecord 8 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13557.13it/s][A



... Writing TFRecord 9 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13628.49it/s][A



... Writing TFRecord 10 of 77 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9763.28it/s][A
Writing TFRecords:  13%|███                     | 10/77 [00:00<00:00, 77.23it/s]


... Writing TFRecord 11 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12995.92it/s][A



... Writing TFRecord 12 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13694.79it/s][A



... Writing TFRecord 13 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13722.57it/s][A



... Writing TFRecord 14 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13315.25it/s][A



... Writing TFRecord 15 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13504.31it/s][A



... Writing TFRecord 16 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13696.13it/s][A



... Writing TFRecord 17 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13588.31it/s][A



... Writing TFRecord 18 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13825.25it/s][A



... Writing TFRecord 19 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10151.77it/s][A



... Writing TFRecord 20 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13410.19it/s][A
Writing TFRecords:  26%|██████▏                 | 20/77 [00:00<00:00, 85.06it/s]


... Writing TFRecord 21 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13456.65it/s][A



... Writing TFRecord 22 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13900.39it/s][A



... Writing TFRecord 23 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13015.68it/s][A



... Writing TFRecord 24 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13726.61it/s][A



... Writing TFRecord 25 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13730.66it/s][A



... Writing TFRecord 26 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13595.36it/s][A



... Writing TFRecord 27 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13906.84it/s][A



... Writing TFRecord 28 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13757.68it/s][A



... Writing TFRecord 29 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13704.64it/s][A



... Writing TFRecord 30 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13448.89it/s][A
Writing TFRecords:  39%|█████████▎              | 30/77 [00:00<00:00, 88.79it/s]


... Writing TFRecord 31 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13308.91it/s][A



... Writing TFRecord 32 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14260.03it/s][A



... Writing TFRecord 33 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14116.53it/s][A



... Writing TFRecord 34 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13838.02it/s][A



... Writing TFRecord 35 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13910.99it/s][A



... Writing TFRecord 36 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13848.99it/s][A



... Writing TFRecord 37 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13501.27it/s][A



... Writing TFRecord 38 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13688.53it/s][A



... Writing TFRecord 39 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11921.73it/s][A



... Writing TFRecord 40 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14030.59it/s][A
Writing TFRecords:  52%|████████████▍           | 40/77 [00:00<00:00, 92.21it/s]


... Writing TFRecord 41 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14064.93it/s][A



... Writing TFRecord 42 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14197.29it/s][A



... Writing TFRecord 43 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13752.72it/s][A



... Writing TFRecord 44 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14317.96it/s][A



... Writing TFRecord 45 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13759.94it/s][A



... Writing TFRecord 46 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13859.97it/s][A



... Writing TFRecord 47 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14243.09it/s][A



... Writing TFRecord 48 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14267.31it/s][A



... Writing TFRecord 49 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13963.79it/s][A



... Writing TFRecord 50 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13790.25it/s][A



... Writing TFRecord 51 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13896.25it/s][A
Writing TFRecords:  66%|███████████████▉        | 51/77 [00:00<00:00, 96.10it/s]


... Writing TFRecord 52 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13664.01it/s][A



... Writing TFRecord 53 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13552.75it/s][A



... Writing TFRecord 54 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14057.39it/s][A



... Writing TFRecord 55 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13891.18it/s][A



... Writing TFRecord 56 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13934.56it/s][A



... Writing TFRecord 57 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14080.52it/s][A



... Writing TFRecord 58 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13959.14it/s][A



... Writing TFRecord 59 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13248.38it/s][A



... Writing TFRecord 60 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13324.13it/s][A



... Writing TFRecord 61 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13073.29it/s][A
Writing TFRecords:  79%|███████████████████     | 61/77 [00:00<00:00, 95.44it/s]


... Writing TFRecord 62 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13532.63it/s][A



... Writing TFRecord 63 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14140.80it/s][A



... Writing TFRecord 64 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12805.08it/s][A



... Writing TFRecord 65 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13524.34it/s][A



... Writing TFRecord 66 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14083.83it/s][A



... Writing TFRecord 67 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13049.70it/s][A



... Writing TFRecord 68 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13876.02it/s][A



... Writing TFRecord 69 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13995.94it/s][A



... Writing TFRecord 70 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14098.03it/s][A



... Writing TFRecord 71 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13069.22it/s][A
Writing TFRecords:  92%|██████████████████████▏ | 71/77 [00:00<00:00, 95.38it/s]


... Writing TFRecord 72 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14339.01it/s][A



... Writing TFRecord 73 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14082.41it/s][A



... Writing TFRecord 74 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12820.34it/s][A



... Writing TFRecord 75 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13685.41it/s][A



... Writing TFRecord 76 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13753.62it/s][A



... Writing TFRecord 77 of 77 (100 per TFRecord)...




 99%|██████████████████████████████████████▌| 99/100 [00:00<00:00, 14090.61it/s][A
Writing TFRecords: 100%|████████████████████████| 77/77 [00:00<00:00, 93.36it/s]
Writing TFRecords:   0%|                                 | 0/77 [00:00<?, ?it/s]


... Writing TFRecord 1 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13781.19it/s][A



... Writing TFRecord 2 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13710.46it/s][A



... Writing TFRecord 3 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13050.11it/s][A



... Writing TFRecord 4 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13823.88it/s][A



... Writing TFRecord 5 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14112.26it/s][A



... Writing TFRecord 6 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13938.27it/s][A



... Writing TFRecord 7 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10862.70it/s][A



... Writing TFRecord 8 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13370.43it/s][A



... Writing TFRecord 9 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14389.19it/s][A



... Writing TFRecord 10 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14229.07it/s][A
Writing TFRecords:  13%|███                     | 10/77 [00:00<00:00, 99.24it/s]


... Writing TFRecord 11 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14424.82it/s][A



... Writing TFRecord 12 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14435.74it/s][A



... Writing TFRecord 13 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14141.28it/s][A



... Writing TFRecord 14 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12897.61it/s][A



... Writing TFRecord 15 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13582.15it/s][A



... Writing TFRecord 16 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13808.86it/s][A



... Writing TFRecord 17 of 77 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9936.99it/s][A



... Writing TFRecord 18 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10874.52it/s][A



... Writing TFRecord 19 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13302.16it/s][A



... Writing TFRecord 20 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13372.99it/s][A
Writing TFRecords:  26%|██████▏                 | 20/77 [00:00<00:00, 94.62it/s]


... Writing TFRecord 21 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13923.00it/s][A



... Writing TFRecord 22 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14016.99it/s][A



... Writing TFRecord 23 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13246.70it/s][A



... Writing TFRecord 24 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13406.33it/s][A



... Writing TFRecord 25 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13644.89it/s][A



... Writing TFRecord 26 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13251.31it/s][A



... Writing TFRecord 27 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13613.45it/s][A



... Writing TFRecord 28 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13248.38it/s][A



... Writing TFRecord 29 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13455.79it/s][A



... Writing TFRecord 30 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13561.07it/s][A
Writing TFRecords:  39%|█████████▎              | 30/77 [00:00<00:00, 95.82it/s]


... Writing TFRecord 31 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13965.65it/s][A



... Writing TFRecord 32 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13655.11it/s][A



... Writing TFRecord 33 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11487.78it/s][A



... Writing TFRecord 34 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12895.63it/s][A



... Writing TFRecord 35 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13453.20it/s][A



... Writing TFRecord 36 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13893.94it/s][A



... Writing TFRecord 37 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14024.96it/s][A



... Writing TFRecord 38 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11755.01it/s][A



... Writing TFRecord 39 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13002.37it/s][A



... Writing TFRecord 40 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13760.84it/s][A
Writing TFRecords:  52%|████████████▍           | 40/77 [00:00<00:00, 94.75it/s]


... Writing TFRecord 41 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13424.35it/s][A



... Writing TFRecord 42 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12710.40it/s][A



... Writing TFRecord 43 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13716.74it/s][A



... Writing TFRecord 44 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14220.39it/s][A



... Writing TFRecord 45 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13974.03it/s][A



... Writing TFRecord 46 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13241.69it/s][A



... Writing TFRecord 47 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14169.47it/s][A



... Writing TFRecord 48 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10871.71it/s][A



... Writing TFRecord 49 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10768.98it/s][A



... Writing TFRecord 50 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14330.19it/s][A
Writing TFRecords:  65%|███████████████▌        | 50/77 [00:00<00:00, 96.41it/s]


... Writing TFRecord 51 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14282.86it/s][A



... Writing TFRecord 52 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14170.90it/s][A



... Writing TFRecord 53 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14273.14it/s][A



... Writing TFRecord 54 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14413.42it/s][A



... Writing TFRecord 55 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14086.19it/s][A



... Writing TFRecord 56 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14378.33it/s][A



... Writing TFRecord 57 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13797.96it/s][A



... Writing TFRecord 58 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13894.87it/s][A



... Writing TFRecord 59 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14018.40it/s][A



... Writing TFRecord 60 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10862.13it/s][A
Writing TFRecords:  78%|██████████████████▋     | 60/77 [00:00<00:00, 97.12it/s]


... Writing TFRecord 61 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13802.05it/s][A



... Writing TFRecord 62 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13771.68it/s][A



... Writing TFRecord 63 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13843.50it/s][A



... Writing TFRecord 64 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13737.40it/s][A



... Writing TFRecord 65 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12436.78it/s][A



... Writing TFRecord 66 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13090.84it/s][A



... Writing TFRecord 67 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13717.63it/s][A



... Writing TFRecord 68 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13801.14it/s][A



... Writing TFRecord 69 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13755.88it/s][A



... Writing TFRecord 70 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13710.46it/s][A
Writing TFRecords:  91%|█████████████████████▊  | 70/77 [00:00<00:00, 96.74it/s]


... Writing TFRecord 71 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13497.36it/s][A



... Writing TFRecord 72 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13498.23it/s][A



... Writing TFRecord 73 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13536.56it/s][A



... Writing TFRecord 74 of 77 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13069.62it/s][A



... Writing TFRecord 75 of 77 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9394.16it/s][A



... Writing TFRecord 76 of 77 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9808.94it/s][A



... Writing TFRecord 77 of 77 (100 per TFRecord)...




 99%|██████████████████████████████████████▌| 99/100 [00:00<00:00, 13742.71it/s][A
Writing TFRecords: 100%|████████████████████████| 77/77 [00:00<00:00, 95.51it/s]
Writing TFRecords:   0%|                                | 0/139 [00:00<?, ?it/s]


... Writing TFRecord 1 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13889.80it/s][A



... Writing TFRecord 2 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13894.40it/s][A



... Writing TFRecord 3 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13759.94it/s][A



... Writing TFRecord 4 of 139 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9946.89it/s][A



... Writing TFRecord 5 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11336.26it/s][A



... Writing TFRecord 6 of 139 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9128.57it/s][A



... Writing TFRecord 7 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10289.99it/s][A



... Writing TFRecord 8 of 139 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9369.82it/s][A



... Writing TFRecord 9 of 139 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9756.24it/s][A
Writing TFRecords:   6%|█▌                      | 9/139 [00:00<00:01, 79.17it/s]


... Writing TFRecord 10 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14072.01it/s][A



... Writing TFRecord 11 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13889.34it/s][A



... Writing TFRecord 12 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14268.28it/s][A



... Writing TFRecord 13 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14321.87it/s][A



... Writing TFRecord 14 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13514.32it/s][A



... Writing TFRecord 15 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14157.99it/s][A



... Writing TFRecord 16 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14186.72it/s][A



... Writing TFRecord 17 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14072.48it/s][A



... Writing TFRecord 18 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13859.05it/s][A



... Writing TFRecord 19 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14387.21it/s][A
Writing TFRecords:  14%|███▏                   | 19/139 [00:00<00:01, 88.84it/s]


... Writing TFRecord 20 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14186.24it/s][A



... Writing TFRecord 21 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14265.85it/s][A



... Writing TFRecord 22 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14137.94it/s][A



... Writing TFRecord 23 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14372.91it/s][A



... Writing TFRecord 24 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13785.72it/s][A



... Writing TFRecord 25 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14013.71it/s][A



... Writing TFRecord 26 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13628.05it/s][A



... Writing TFRecord 27 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14274.11it/s][A



... Writing TFRecord 28 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13476.11it/s][A



... Writing TFRecord 29 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14217.02it/s][A
Writing TFRecords:  21%|████▊                  | 29/139 [00:00<00:01, 93.05it/s]


... Writing TFRecord 30 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12380.61it/s][A



... Writing TFRecord 31 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13188.81it/s][A



... Writing TFRecord 32 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14247.44it/s][A



... Writing TFRecord 33 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13470.91it/s][A



... Writing TFRecord 34 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13633.81it/s][A



... Writing TFRecord 35 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12655.94it/s][A



... Writing TFRecord 36 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12886.52it/s][A



... Writing TFRecord 37 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11194.36it/s][A



... Writing TFRecord 38 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13330.49it/s][A



... Writing TFRecord 39 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11172.60it/s][A
Writing TFRecords:  28%|██████▍                | 39/139 [00:00<00:01, 92.58it/s]


... Writing TFRecord 40 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11597.05it/s][A



... Writing TFRecord 41 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13378.11it/s][A



... Writing TFRecord 42 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12247.22it/s][A



... Writing TFRecord 43 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13934.10it/s][A



... Writing TFRecord 44 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12417.27it/s][A



... Writing TFRecord 45 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12331.11it/s][A



... Writing TFRecord 46 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11595.76it/s][A



... Writing TFRecord 47 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12887.31it/s][A



... Writing TFRecord 48 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13317.79it/s][A



... Writing TFRecord 49 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12557.80it/s][A
Writing TFRecords:  35%|████████               | 49/139 [00:00<00:00, 90.55it/s]


... Writing TFRecord 50 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14228.59it/s][A



... Writing TFRecord 51 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14374.39it/s][A



... Writing TFRecord 52 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13774.85it/s][A



... Writing TFRecord 53 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13747.31it/s][A



... Writing TFRecord 54 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13839.39it/s][A



... Writing TFRecord 55 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14180.96it/s][A



... Writing TFRecord 56 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14014.65it/s][A



... Writing TFRecord 57 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13822.06it/s][A



... Writing TFRecord 58 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14086.19it/s][A



... Writing TFRecord 59 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12789.07it/s][A
Writing TFRecords:  42%|█████████▊             | 59/139 [00:00<00:00, 93.55it/s]


... Writing TFRecord 60 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14023.08it/s][A



... Writing TFRecord 61 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14373.90it/s][A



... Writing TFRecord 62 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14245.02it/s][A



... Writing TFRecord 63 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14110.36it/s][A



... Writing TFRecord 64 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14366.51it/s][A



... Writing TFRecord 65 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13968.91it/s][A



... Writing TFRecord 66 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14316.99it/s][A



... Writing TFRecord 67 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14154.64it/s][A



... Writing TFRecord 68 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14185.28it/s][A



... Writing TFRecord 69 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14438.72it/s][A



... Writing TFRecord 70 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14512.16it/s][A
Writing TFRecords:  50%|███████████▌           | 70/139 [00:00<00:00, 96.42it/s]


... Writing TFRecord 71 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13973.56it/s][A



... Writing TFRecord 72 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14006.22it/s][A



... Writing TFRecord 73 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13916.53it/s][A



... Writing TFRecord 74 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14380.80it/s][A



... Writing TFRecord 75 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13528.70it/s][A



... Writing TFRecord 76 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13699.71it/s][A



... Writing TFRecord 77 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14288.21it/s][A



... Writing TFRecord 78 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13900.85it/s][A



... Writing TFRecord 79 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14171.38it/s][A



... Writing TFRecord 80 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14506.64it/s][A
Writing TFRecords:  58%|█████████████▏         | 80/139 [00:00<00:00, 97.22it/s]


... Writing TFRecord 81 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11290.79it/s][A



... Writing TFRecord 82 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14141.28it/s][A



... Writing TFRecord 83 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14053.62it/s][A



... Writing TFRecord 84 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10983.30it/s][A



... Writing TFRecord 85 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12444.90it/s][A



... Writing TFRecord 86 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10877.06it/s][A



... Writing TFRecord 87 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11411.83it/s][A



... Writing TFRecord 88 of 139 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9827.10it/s][A



... Writing TFRecord 89 of 139 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9816.98it/s][A



... Writing TFRecord 90 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10308.71it/s][A
Writing TFRecords:  65%|██████████████▉        | 90/139 [00:00<00:00, 92.35it/s]


... Writing TFRecord 91 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13776.66it/s][A



... Writing TFRecord 92 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14281.40it/s][A



... Writing TFRecord 93 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14012.78it/s][A



... Writing TFRecord 94 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13421.34it/s][A



... Writing TFRecord 95 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13847.16it/s][A



... Writing TFRecord 96 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14123.66it/s][A



... Writing TFRecord 97 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11728.05it/s][A



... Writing TFRecord 98 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12036.00it/s][A



... Writing TFRecord 99 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10357.58it/s][A



... Writing TFRecord 100 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14268.77it/s][A
Writing TFRecords:  72%|███████████████▊      | 100/139 [00:01<00:00, 92.60it/s]


... Writing TFRecord 101 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14666.42it/s][A



... Writing TFRecord 102 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12866.75it/s][A



... Writing TFRecord 103 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14158.47it/s][A



... Writing TFRecord 104 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13547.49it/s][A



... Writing TFRecord 105 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14314.05it/s][A



... Writing TFRecord 106 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13490.85it/s][A



... Writing TFRecord 107 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14533.78it/s][A



... Writing TFRecord 108 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14297.46it/s][A



... Writing TFRecord 109 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14194.88it/s][A



... Writing TFRecord 110 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14552.94it/s][A



... Writing TFRecord 111 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13923.46it/s][A
Writing TFRecords:  80%|█████████████████▌    | 111/139 [00:01<00:00, 95.48it/s]


... Writing TFRecord 112 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12404.41it/s][A



... Writing TFRecord 113 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13056.61it/s][A



... Writing TFRecord 114 of 139 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9933.69it/s][A



... Writing TFRecord 115 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11414.63it/s][A



... Writing TFRecord 116 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12542.02it/s][A



... Writing TFRecord 117 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14299.90it/s][A



... Writing TFRecord 118 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13117.04it/s][A



... Writing TFRecord 119 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14215.09it/s][A



... Writing TFRecord 120 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13231.66it/s][A



... Writing TFRecord 121 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12993.91it/s][A
Writing TFRecords:  87%|███████████████████▏  | 121/139 [00:01<00:00, 94.45it/s]


... Writing TFRecord 122 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13884.75it/s][A



... Writing TFRecord 123 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14250.83it/s][A



... Writing TFRecord 124 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12311.20it/s][A



... Writing TFRecord 125 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13908.23it/s][A



... Writing TFRecord 126 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13241.69it/s][A



... Writing TFRecord 127 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13103.11it/s][A



... Writing TFRecord 128 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13291.62it/s][A



... Writing TFRecord 129 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13686.75it/s][A



... Writing TFRecord 130 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14024.02it/s][A



... Writing TFRecord 131 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14027.77it/s][A



... Writing TFRecord 132 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13968.91it/s][A
Writing TFRecords:  95%|████████████████████▉ | 132/139 [00:01<00:00, 96.55it/s]


... Writing TFRecord 133 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13975.42it/s][A



... Writing TFRecord 134 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14082.41it/s][A



... Writing TFRecord 135 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13788.43it/s][A



... Writing TFRecord 136 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12144.38it/s][A



... Writing TFRecord 137 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10882.99it/s][A



... Writing TFRecord 138 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12800.00it/s][A



... Writing TFRecord 139 of 139 (100 per TFRecord)...




 96%|█████████████████████████████████████▍ | 96/100 [00:00<00:00, 13454.06it/s][A
Writing TFRecords: 100%|██████████████████████| 139/139 [00:01<00:00, 93.85it/s]
Writing TFRecords:   0%|                                | 0/139 [00:00<?, ?it/s]


... Writing TFRecord 1 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13089.20it/s][A



... Writing TFRecord 2 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13601.53it/s][A



... Writing TFRecord 3 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13242.52it/s][A



... Writing TFRecord 4 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13588.75it/s][A



... Writing TFRecord 5 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10849.77it/s][A



... Writing TFRecord 6 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13846.70it/s][A



... Writing TFRecord 7 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13532.63it/s][A



... Writing TFRecord 8 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14363.07it/s][A



... Writing TFRecord 9 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14147.00it/s][A



... Writing TFRecord 10 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14564.06it/s][A
Writing TFRecords:   7%|█▋                     | 10/139 [00:00<00:01, 96.48it/s]


... Writing TFRecord 11 of 139 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 8237.05it/s][A



... Writing TFRecord 12 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11872.80it/s][A



... Writing TFRecord 13 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13550.12it/s][A



... Writing TFRecord 14 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13803.86it/s][A



... Writing TFRecord 15 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13857.22it/s][A



... Writing TFRecord 16 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13811.14it/s][A



... Writing TFRecord 17 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13884.75it/s][A



... Writing TFRecord 18 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13995.48it/s][A



... Writing TFRecord 19 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13981.48it/s][A



... Writing TFRecord 20 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14054.57it/s][A
Writing TFRecords:  14%|███▎                   | 20/139 [00:00<00:01, 94.03it/s]


... Writing TFRecord 21 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13864.55it/s][A



... Writing TFRecord 22 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13964.26it/s][A



... Writing TFRecord 23 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13887.96it/s][A



... Writing TFRecord 24 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10351.45it/s][A



... Writing TFRecord 25 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13932.71it/s][A



... Writing TFRecord 26 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14118.91it/s][A



... Writing TFRecord 27 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13854.02it/s][A



... Writing TFRecord 28 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13249.63it/s][A



... Writing TFRecord 29 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13414.48it/s][A



... Writing TFRecord 30 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14057.86it/s][A
Writing TFRecords:  22%|████▉                  | 30/139 [00:00<00:01, 95.34it/s]


... Writing TFRecord 31 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13880.61it/s][A



... Writing TFRecord 32 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14106.09it/s][A



... Writing TFRecord 33 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13941.51it/s][A



... Writing TFRecord 34 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13811.14it/s][A



... Writing TFRecord 35 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14033.87it/s][A



... Writing TFRecord 36 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14127.47it/s][A



... Writing TFRecord 37 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13704.19it/s][A



... Writing TFRecord 38 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13889.80it/s][A



... Writing TFRecord 39 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13909.15it/s][A



... Writing TFRecord 40 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13498.23it/s][A
Writing TFRecords:  29%|██████▌                | 40/139 [00:00<00:01, 96.35it/s]


... Writing TFRecord 41 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13468.32it/s][A



... Writing TFRecord 42 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14358.15it/s][A



... Writing TFRecord 43 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14194.40it/s][A



... Writing TFRecord 44 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14044.68it/s][A



... Writing TFRecord 45 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12754.85it/s][A



... Writing TFRecord 46 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13861.80it/s][A



... Writing TFRecord 47 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13832.54it/s][A



... Writing TFRecord 48 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13105.97it/s][A



... Writing TFRecord 49 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12325.31it/s][A



... Writing TFRecord 50 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13390.07it/s][A
Writing TFRecords:  36%|████████▎              | 50/139 [00:00<00:00, 95.42it/s]


... Writing TFRecord 51 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14177.61it/s][A



... Writing TFRecord 52 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12333.65it/s][A



... Writing TFRecord 53 of 139 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9258.54it/s][A



... Writing TFRecord 54 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12200.19it/s][A



... Writing TFRecord 55 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13441.56it/s][A



... Writing TFRecord 56 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11630.81it/s][A



... Writing TFRecord 57 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11556.15it/s][A



... Writing TFRecord 58 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11436.10it/s][A



... Writing TFRecord 59 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13451.04it/s][A



... Writing TFRecord 60 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13922.54it/s][A
Writing TFRecords:  43%|█████████▉             | 60/139 [00:00<00:00, 93.15it/s]


... Writing TFRecord 61 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12260.82it/s][A



... Writing TFRecord 62 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13198.35it/s][A



... Writing TFRecord 63 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13421.77it/s][A



... Writing TFRecord 64 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13106.38it/s][A



... Writing TFRecord 65 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 15174.76it/s][A



... Writing TFRecord 66 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13204.58it/s][A



... Writing TFRecord 67 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13640.90it/s][A



... Writing TFRecord 68 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13656.89it/s][A



... Writing TFRecord 69 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11675.49it/s][A



... Writing TFRecord 70 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13819.78it/s][A
Writing TFRecords:  50%|███████████▌           | 70/139 [00:00<00:00, 92.72it/s]


... Writing TFRecord 71 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12994.31it/s][A



... Writing TFRecord 72 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11658.29it/s][A



... Writing TFRecord 73 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11489.67it/s][A



... Writing TFRecord 74 of 139 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 9906.48it/s][A



... Writing TFRecord 75 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14007.16it/s][A



... Writing TFRecord 76 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13347.88it/s][A



... Writing TFRecord 77 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12974.21it/s][A



... Writing TFRecord 78 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13081.04it/s][A



... Writing TFRecord 79 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13155.71it/s][A



... Writing TFRecord 80 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13084.30it/s][A
Writing TFRecords:  58%|█████████████▏         | 80/139 [00:00<00:00, 91.36it/s]


... Writing TFRecord 81 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12985.46it/s][A



... Writing TFRecord 82 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13556.25it/s][A



... Writing TFRecord 83 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13154.47it/s][A



... Writing TFRecord 84 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13544.87it/s][A



... Writing TFRecord 85 of 139 (100 per TFRecord)...




100%|███████████████████████████████████████| 100/100 [00:00<00:00, 8983.30it/s][A



... Writing TFRecord 86 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13891.64it/s][A



... Writing TFRecord 87 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13885.67it/s][A



... Writing TFRecord 88 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13792.06it/s][A



... Writing TFRecord 89 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13928.55it/s][A



... Writing TFRecord 90 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14273.62it/s][A
Writing TFRecords:  65%|██████████████▉        | 90/139 [00:00<00:00, 90.91it/s]


... Writing TFRecord 91 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14515.17it/s][A



... Writing TFRecord 92 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12822.30it/s][A



... Writing TFRecord 93 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14539.83it/s][A



... Writing TFRecord 94 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13222.48it/s][A



... Writing TFRecord 95 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14123.66it/s][A



... Writing TFRecord 96 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13671.13it/s][A



... Writing TFRecord 97 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12732.39it/s][A



... Writing TFRecord 98 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12617.86it/s][A



... Writing TFRecord 99 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13301.32it/s][A



... Writing TFRecord 100 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12992.70it/s][A
Writing TFRecords:  72%|███████████████▊      | 100/139 [00:01<00:00, 92.33it/s]


... Writing TFRecord 101 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11638.88it/s][A



... Writing TFRecord 102 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 10661.14it/s][A



... Writing TFRecord 103 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11675.82it/s][A



... Writing TFRecord 104 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11651.49it/s][A



... Writing TFRecord 105 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11493.76it/s][A



... Writing TFRecord 106 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12767.66it/s][A



... Writing TFRecord 107 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 11629.52it/s][A



... Writing TFRecord 108 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 15021.50it/s][A



... Writing TFRecord 109 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12578.89it/s][A



... Writing TFRecord 110 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13865.93it/s][A
Writing TFRecords:  79%|█████████████████▍    | 110/139 [00:01<00:00, 90.13it/s]


... Writing TFRecord 111 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13878.77it/s][A



... Writing TFRecord 112 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13814.78it/s][A



... Writing TFRecord 113 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13496.06it/s][A



... Writing TFRecord 114 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13773.04it/s][A



... Writing TFRecord 115 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13830.26it/s][A



... Writing TFRecord 116 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13575.12it/s][A



... Writing TFRecord 117 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13535.69it/s][A



... Writing TFRecord 118 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13971.70it/s][A



... Writing TFRecord 119 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13442.42it/s][A



... Writing TFRecord 120 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13961.47it/s][A
Writing TFRecords:  86%|██████████████████▉   | 120/139 [00:01<00:00, 92.34it/s]


... Writing TFRecord 121 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13633.36it/s][A



... Writing TFRecord 122 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13693.00it/s][A



... Writing TFRecord 123 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13377.25it/s][A



... Writing TFRecord 124 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13565.02it/s][A



... Writing TFRecord 125 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13372.14it/s][A



... Writing TFRecord 126 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13734.25it/s][A



... Writing TFRecord 127 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 13508.23it/s][A



... Writing TFRecord 128 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 12162.69it/s][A



... Writing TFRecord 129 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14570.13it/s][A



... Writing TFRecord 130 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14170.42it/s][A
Writing TFRecords:  94%|████████████████████▌ | 130/139 [00:01<00:00, 92.89it/s]


... Writing TFRecord 131 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14148.44it/s][A



... Writing TFRecord 132 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14062.58it/s][A



... Writing TFRecord 133 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14361.10it/s][A



... Writing TFRecord 134 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14416.88it/s][A



... Writing TFRecord 135 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14191.52it/s][A



... Writing TFRecord 136 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14174.73it/s][A



... Writing TFRecord 137 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14216.53it/s][A



... Writing TFRecord 138 of 139 (100 per TFRecord)...




100%|██████████████████████████████████████| 100/100 [00:00<00:00, 14284.32it/s][A



... Writing TFRecord 139 of 139 (100 per TFRecord)...




 96%|█████████████████████████████████████▍ | 96/100 [00:00<00:00, 13901.85it/s][A
Writing TFRecords: 100%|██████████████████████| 139/139 [00:01<00:00, 93.35it/s]


<br>

**Check dataset**

In [9]:
DEMO_DS_CHUNK_STYLE, DEMO_DS_TOK_STYLE, DEMO_DS_VERSION = "rcts", "bpe", "v4"
DEMO_TFREC_PATHS = sorted(glob(os.path.join(
    TFRECORD_DIR, f"{DEMO_DS_CHUNK_STYLE}_{DEMO_DS_TOK_STYLE}_{DEMO_DS_VERSION}", "*.tfrec"
)))

# Get respective tooling
demo_ds = load_tfrecord_dataset(DEMO_TFREC_PATHS)
viz_tool = bpe_token_viz if DEMO_DS_TOK_STYLE=="bpe" else uni_token_viz

# Check and compare
print("\n... FROM TFRECORD ...\n")
display(HTML(viz_tool.visualize(bpe_decoder(next(iter(demo_ds)).numpy().tolist()))))

# TODO make modular... not important
print("\n... FROM PANDAS DATAFRAME ...\n")
display(HTML(viz_tool.visualize(bpe_decoder(rcts_bpe_chunk_df["token_content"][0]))))


... FROM TFRECORD ...




... FROM PANDAS DATAFRAME ...



<b>WandB Notes</b>

* Code save
* Logging (HTML)
* WandB conifg (run config)
* WandB callback (super?)
* run.save (model summary)

