# Dataset Preparation for Phoneme Recognition on TIMIT

## Goals

- Exploring TIMIT dataset.
- Splitting the dataset into train and test subset. 
- Some data is filtered according to [[1]](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6638947)
    - SA records are removed from training dataset
    - **Only core test subset** with 24 speakers are exported for testing.
    - Records missing phonetic-file (phoneme labels) are removed.

- Building phoneme records: (phoneme, path-to-audio, start-index, end-index)
- Remap phoneme from 61 to 39 classes.
- Export as PyTorch serialized object using torch.save().


**Naming Conventions**

- Small letter variables are local to a section
- Capitalized variables are used across sections
- Functions are always assumed to be used across sections

# Environment Setup

In [1]:
import torch

import os
import pandas as pd
import librosa
import IPython.display as ipd
from tqdm.auto import tqdm


if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

print('Using PyTorch version:', torch.__version__, ' Device:', device)

Using PyTorch version: 2.0.1  Device: cuda


# Loading Dataset

The DARPA TIMIT dataset can be downloaded from 
[here](https://www.kaggle.com/datasets/mfekadu/darpa-timit-acousticphonetic-continuous-speech).
After downloading it, simply unzip it into `./dataset/TIMIT`.

In [2]:
Timit_path = "./dataset/TIMIT/"
Data_path = "./dataset/TIMIT/data/"
DF_train = pd.read_csv(os.path.join(Timit_path, 'train_data.csv'))
DF_test = pd.read_csv(os.path.join(Timit_path, 'test_data.csv'))

In [3]:
# Remove empty rows
DF_train = DF_train.dropna(how='all')  # train_data.csv has some rows with all missing fields, that's why how='all'
DF_test = DF_test.dropna(how='any')    # test_data.csv has some weird rows with '\\\\\', that's why how='any'
print('DF_train:', DF_train.shape)
print('DF_test :', DF_test.shape)

DF_train: (23100, 12)
DF_test : (8400, 12)


In [4]:
# Remove is_converted_audio rows. These rows points to audio files that 
# has gone through some post-processing after recording and contain the 
# same sentence spoken by the same speaker. Thus, they are redundant.
DF_train = DF_train[DF_train['is_converted_audio'] == False]
DF_test = DF_test[DF_test['is_converted_audio'] == False]
print('DF_train:', DF_train.shape)
print('DF_test :', DF_test.shape)

DF_train: (18480, 12)
DF_test : (6720, 12)


## Merge Entries

One entry in the dataframe represents one file. Audio, word, and phonetics are stored separately in different files. We need to combine them into a single record.

In [5]:
def mergeEntries(df_dataset):
    merged_ds = {}  
    
    for idx, row in tqdm(df_dataset.iterrows(), total=len(df_dataset)):
        path = row['path_from_data_dir']
        entry_id = path.split('.')[0]     # remove the file-extension and use the basename as unique ID
        
        # create the entry if doesn't exist
        if entry_id not in merged_ds:
            merged_ds[entry_id] = {}
            
        # Add different file types under the same ID
        if row['is_audio'] is True:
            merged_ds[entry_id]['audio_file'] = os.path.join(Data_path, path)
        elif row['is_word_file'] is True:
            merged_ds[entry_id]['word_file'] = os.path.join(Data_path, path)
        elif row['is_phonetic_file'] is True:
            merged_ds[entry_id]['phonetic_file'] = os.path.join(Data_path, path)
        elif row['is_sentence_file'] is True:
            merged_ds[entry_id]['sentence_file'] = os.path.join(Data_path, path)
    return merged_ds


# Merge the dataset entries
Merged_train = mergeEntries(DF_train)
Merged_test = mergeEntries(DF_test)
print('Merged_train:', len(Merged_train))
print('Merged_test :', len(Merged_test))

  0%|          | 0/18480 [00:00<?, ?it/s]

  0%|          | 0/6720 [00:00<?, ?it/s]

Merged_train: 4620
Merged_test : 1680


In [6]:
# Check the structure of a record
record_keys = list(Merged_train.keys())
print('Record keys:', record_keys[:3], '...')

record = Merged_train[record_keys[0]]
fields = list(record.keys())
print('fields:', fields)

print('\nrecord values:')
for k,v in record.items():
    print(f'{k:>20}:  {v}')

Record keys: ['TRAIN/DR4/MMDM0/SI1311', 'TRAIN/DR4/MMDM0/SX321', 'TRAIN/DR4/MMDM0/SI681'] ...
fields: ['phonetic_file', 'word_file', 'sentence_file', 'audio_file']

record values:
       phonetic_file:  ./dataset/TIMIT/data/TRAIN/DR4/MMDM0/SI1311.PHN
           word_file:  ./dataset/TIMIT/data/TRAIN/DR4/MMDM0/SI1311.WRD
       sentence_file:  ./dataset/TIMIT/data/TRAIN/DR4/MMDM0/SI1311.TXT
          audio_file:  ./dataset/TIMIT/data/TRAIN/DR4/MMDM0/SI1311.WAV


# Filter Dataset

Some records are removed according to the paper [[1]](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6638947)
to perform an apple-to-apple comparison.

In [7]:
# Given the key (file path without extension), returns the speaker-name (with gender prefix)
def getSpeakerName(dataset_key):
    dirsonly = os.path.dirname(dataset_key)
    dirs_list = dirsonly.split(os.sep)
    return dirs_list[-1]    # last directory is the speaker name


# Given a dataset dictionary, returns the set of speakers.
def getAllSpeakers(dict_dataset):
    speakers = set()
    for key in dict_dataset:
        name = getSpeakerName(key)
        speakers.add(name)         
    return speakers


# Test getSpeakers()
speakers = list(getAllSpeakers(Merged_train))
print('Speaker count:', len(speakers))
print('speakers[0]:', speakers[0])

Speaker count: 462
speakers[0]: FMEM0


## Remove the SA records from datasets

In [8]:
# extract the keys of SA records
def getSAKeys(dict_dataset):
    sa_keys = []
    for key in dict_dataset:
        basename = os.path.basename(key)   
        # check if it is an 'SA' record, if yes save it
        if basename[0:2] == 'SA':
            sa_keys.append(key)       
    return sa_keys


# Delete the SA records in train dataset
sa_keys = getSAKeys(Merged_train)
print(f'INFO: {len(sa_keys)} "SA" records found in Merged_train')

print('Merged_train count before delete:', len(Merged_train))
for k in sa_keys: del(Merged_train[k])
print('Merged_train count after delete :', len(Merged_train))

# Check the speaker count
speakers = list(getAllSpeakers(Merged_train))
print('Merged_train speaker count:', len(speakers))

INFO: 924 "SA" records found in Merged_train
Merged_train count before delete: 4620
Merged_train count after delete : 3696
Merged_train speaker count: 462


In [9]:
# Delete the SA records in test dataset
print('')
sa_keys = getSAKeys(Merged_test)
print(f'INFO: {len(sa_keys)} "SA" records found in Merged_test')

print('Merged_test count before delete:', len(Merged_test))
for k in sa_keys: del(Merged_test[k])
print('Merged_test count after delete :', len(Merged_test))

# Check the speaker count
speakers = list(getAllSpeakers(Merged_test))
print('Merged_test speaker count:', len(speakers))


INFO: 336 "SA" records found in Merged_test
Merged_test count before delete: 1680
Merged_test count after delete : 1344
Merged_test speaker count: 168


## Keep the core test set only

In [10]:
# Given a list/set of speaker and a dataset dictionary, deletes the records of those speakers
def deleteSpeakers(speaker_names, dict_dataset):
    del_keys = []
    # Get the keys corresponding to given names
    for key in dict_dataset:
        name = getSpeakerName(key)
        if name in speaker_names:
            del_keys.append(key)
    # delete those keys
    print(f"INFO: Deleting {len(del_keys)} records")
    for key in del_keys: del(dict_dataset[key])
    return del_keys
    

In [11]:
# This list is taken from "TIMIT/TESTSET.DOC"
# M/F prefixes are added based on the gender, to make them correspond to the speaker-name directory in the dataset
core_test_speakers = {
    "MDAB0", "MWBT0",     "FELC0",
    "MTAS1", "MWEW0",     "FPAS0",
    "MJMP0", "MLNT0",     "FPKT0",
    "MLLL0", "MTLS0",     "FJLM0",
    "MBPM0", "MKLT0",     "FNLP0",
    "MCMJ0", "MJDH0",     "FMGD0",
    "MGRT0", "MNJM0",     "FDHC0",
    "MJLN0", "MPAM0",     "FMLD0",
}


# compute the set of speakers to delete
all_speakers = getAllSpeakers(Merged_test)
del_speakers = all_speakers - core_test_speakers  # delete everyone else but core-test-speakers
del_keys = deleteSpeakers(del_speakers, Merged_test)
assert len(del_keys) == len(del_speakers)*8, "8 record per speaker are supposed to be deleted!"

# show remaining
rem_speakers = getAllSpeakers(Merged_test)
print('Remaining test records:', len(Merged_test))
print('Remaining Speakers:', len(rem_speakers), '\n', rem_speakers)

INFO: Deleting 1152 records
Remaining test records: 192
Remaining Speakers: 24 
 {'MTAS1', 'MNJM0', 'FJLM0', 'MTLS0', 'FMLD0', 'FDHC0', 'MKLT0', 'MCMJ0', 'MJMP0', 'MDAB0', 'FPAS0', 'FMGD0', 'FNLP0', 'MJDH0', 'FPKT0', 'MLNT0', 'MWEW0', 'MJLN0', 'FELC0', 'MPAM0', 'MBPM0', 'MWBT0', 'MGRT0', 'MLLL0'}


## Remove records missing phonetic file

Cannot run training/testing without phoneme labels.

In [12]:
# Given a dataset dictionary, removes the records missing phonetic files
def delNoPhone(dict_dataset):
    del_keys = []
    for key, record in dict_dataset.items():
        if 'phonetic_file' not in record:
            del_keys.append(key)
    print(f'INFO: Deleting {len(del_keys)} records')
    for key in del_keys: del(dict_dataset[key])
    return del_keys


# Delete missing phonetic file records
delNoPhone(Merged_train)
delNoPhone(Merged_test)

print('Merged_train:', len(Merged_train))
print('Merged_test :', len(Merged_test))


INFO: Deleting 2352 records
INFO: Deleting 0 records
Merged_train: 1344
Merged_test : 192


## Dataset audio durations

In [13]:
# Given a dataset dictionary, returns the total audio length of the entire dataset in seconds
def getDatasetDuration(dict_dataset):
    total_duration = 0
    for entry in tqdm(dict_dataset.values()):
        audio_path = entry['audio_file']
        audio_data, rate = librosa.load(audio_path, sr=None)
        duration = len(audio_data) / rate
        total_duration += duration
    return int(total_duration)


# Show the dataset audio durations in minutes
print(f"Duration of Train: {getDatasetDuration(Merged_train) // 60} mins")
print(f"Duration of Test : {getDatasetDuration(Merged_test) // 60} mins")

  0%|          | 0/1344 [00:00<?, ?it/s]

Duration of Train: 68 mins


  0%|          | 0/192 [00:00<?, ?it/s]

Duration of Test : 9 mins


# Build the Phoneme records

Each audio file contains a sentence. Thus, a record in the dataset is a "sentence record." We want to build phoneme records, which can be used to lookup the audio corresponding to the phoneme in the record.  
The structure of the phoneme record is: (phoneme, path-to-audio, start-index, end-index)

In [14]:
# Extract a "sentence record" from the train_dataset
train_keys = list(Merged_train.keys())
sentence_record = Merged_train[train_keys[10]]
print('sentence_record:')
ipd.display(sentence_record)

with open(sentence_record['sentence_file']) as infile:
    line = infile.readline()
    print(line)
wave, rate = librosa.load(sentence_record['audio_file'], sr=None)
ipd.Audio(wave, rate=rate)

sentence_record:


{'phonetic_file': './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.PHN',
 'word_file': './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.WRD',
 'sentence_file': './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.TXT',
 'audio_file': './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.WAV'}

0 58164 The coyote, bobcat, and hyena are wild animals.



In [15]:
# Function to convert a sentence record into a list of phoneme record.
# Returns a list of tuples: (phoneme, path-to-audio, start-index, end-index)
def makePhoneRecords(sentence_record):
    audio_path = sentence_record['audio_file']
    phone_path  = sentence_record['phonetic_file']
    phone_records = []
    with open(phone_path, 'r') as infile:
        for line in infile:
            start, end, phone = line.split()
            phone_records.append( [phone, audio_path, int(start), int(end)] )
    return phone_records


# Test above function
phone_records = makePhoneRecords(sentence_record)
print('Phoneme records:', len(phone_records))
ipd.display(phone_records[:10])

phone_rec = phone_records[4]
phone, audio_path, start, end = phone_rec
wave, rate = librosa.load(audio_path, sr=None)
wave = wave[start:end]   # slice the phoneme only
print(phone)      # phoneme label
ipd.Audio(wave, rate=rate)

Phoneme records: 38


[['h#', './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.WAV', 0, 1936],
 ['dh', './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.WAV', 1936, 2450],
 ['ix', './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.WAV', 2450, 3000],
 ['kcl', './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.WAV', 3000, 4370],
 ['k', './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.WAV', 4370, 5720],
 ['ay', './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.WAV', 5720, 8605],
 ['ow', './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.WAV', 8605, 10920],
 ['dx', './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.WAV', 10920, 11400],
 ['iy', './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.WAV', 11400, 13880],
 ['pau', './dataset/TIMIT/data/TRAIN/DR4/MCSS0/SX120.WAV', 13880, 18360]]

k


In [16]:
# given a sentence dictionary dataset, returns a list of phoneme records
def makePhoneDataset(dict_dataset):
    phone_ds = []
    for key, sent_rec in dict_dataset.items():
        phone_rec_list = makePhoneRecords(sent_rec)
        phone_ds.extend(phone_rec_list)
    return phone_ds


# Convert to phoneme datasets
Train_phone_ds = makePhoneDataset(Merged_train)
Test_phone_ds = makePhoneDataset(Merged_test)

print('Train_phone_ds:', len(Train_phone_ds))
print('Test_phone_ds :', len(Test_phone_ds))

ipd.display(Train_phone_ds[:3])
ipd.display(Test_phone_ds[:3])

Train_phone_ds: 51848
Test_phone_ds : 7333


[['h#', './dataset/TIMIT/data/TRAIN/DR4/MMDM0/SI1311.WAV', 0, 2680],
 ['s', './dataset/TIMIT/data/TRAIN/DR4/MMDM0/SI1311.WAV', 2680, 5640],
 ['ao', './dataset/TIMIT/data/TRAIN/DR4/MMDM0/SI1311.WAV', 5640, 7853]]

[['h#', './dataset/TIMIT/data/TEST/DR4/MTLS0/SX290.WAV', 0, 2370],
 ['dh', './dataset/TIMIT/data/TEST/DR4/MTLS0/SX290.WAV', 2370, 2780],
 ['ih', './dataset/TIMIT/data/TEST/DR4/MTLS0/SX290.WAV', 2780, 3880]]

# Remap Phonemes 61 to 39 classes

As we can see the result below, there are 61 phones. However, we don't need to use all of them. "tcl", for example, is just a pause where there is a "t". So, let's keep it up and simplify them a bit.

**NOTE:** The 61 phones defined in TIMIT dataset are remapped to 39 English phones. For details checkout the papers 
[[1]](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=46546)
[[2]](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6638947)

In [17]:
# Given a list of phone-records, remaps the labels using the phone_map dictionary.
# list_phoneRecords[i]: (phoneme, path-to-audio, start-index, end-index)
# phone_map: {old_phone: new_phone}
def remapLabels(list_phoneRecords, phone_map):
    for rec in list_phoneRecords:
        rec[0] = phone_map[rec[0]]
    return list_phoneRecords


# Given a list of phone-records, returns the set of all labels
def getAllLabels(list_phoneRecords):
    labels = set()
    for rec in list_phoneRecords:
        labels.add(rec[0])
    return labels
    
    
# Test above functions
all_labels = getAllLabels(Test_phone_ds)
print('all_labels:', len(all_labels), all_labels)


test_reclist = [['a', 'dummy', 0, 1], 
                ['b', 'dummy', 2, 3]]
remap = {'a': 'x', 'b': 'y'}

print('Before remap:', test_reclist)
remapLabels(test_reclist, remap)
print('After remap:', test_reclist)

all_labels: 61 {'d', 'w', 'ax-h', 'uh', 's', 'epi', 'el', 'ix', 't', 'ng', 'aw', 'eh', 'dcl', 'k', 'oy', 'nx', 'p', 'ux', 'hh', 'sh', 'm', 'ay', 'er', 'b', 'dx', 'r', 'ey', 'ih', 'kcl', 'f', 'en', 'bcl', 'hv', 'h#', 'pcl', 'dh', 'ax', 'v', 'q', 'l', 'ae', 'uw', 'g', 'em', 'eng', 'jh', 'th', 'y', 'ow', 'gcl', 'zh', 'aa', 'n', 'pau', 'tcl', 'ah', 'ch', 'ao', 'iy', 'z', 'axr'}
Before remap: [['a', 'dummy', 0, 1], ['b', 'dummy', 2, 3]]
After remap: [['x', 'dummy', 0, 1], ['y', 'dummy', 2, 3]]


In [18]:
# TimitBet 61 phoneme mapping to 39 phonemes
# by Lee, K.-F., & Hon, H.-W. (1989). Speaker-independent phone recognition using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(11), 1641–1648. doi:10.1109/29.46546 
phon61_map39 = {
    'iy':'iy',  'ih':'ih',   'eh':'eh',  'ae':'ae',    'ix':'ih',   'ax':'ah',   'ah':'ah',   'uw':'uw',
    'ux':'uw',  'uh':'uh',   'ao':'aa',  'aa':'aa',    'ey':'ey',   'ay':'ay',   'oy':'oy',   'aw':'aw',
    'ow':'ow',  'l':'l',     'el':'l',   'r':'r',      'y':'y',     'w':'w',     'er':'er',   'axr':'er',
    'm':'m',    'em':'m',    'n':'n',    'nx':'n',     'en':'n',    'ng':'ng',   'eng':'ng',  'ch':'ch',
    'jh':'jh',  'dh':'dh',   'b':'b',    'd':'d',      'dx':'dx',   'g':'g',     'p':'p',     't':'t',
    'k':'k',    'z':'z',     'zh':'sh',  'v':'v',      'f':'f',     'th':'th',   's':'s',     'sh':'sh',
    'hh':'hh',  'hv':'hh',   'pcl':'h#', 'tcl':'h#',   'kcl':'h#',  'qcl':'h#',  'bcl':'h#',  'dcl':'h#',
    'gcl':'h#', 'h#':'h#',   '#h':'h#',  'pau':'h#',   'epi': 'h#', 'nx':'n',    'ax-h':'ah', 'q':'h#' 
}


# Check the status before remap
all_train_labels = getAllLabels(Train_phone_ds)
all_test_labels  = getAllLabels(Test_phone_ds)
print('Before remap ----')
print('\nall_train_labels:', len(all_train_labels), '\n', all_train_labels)
print('\nall_test_labels:', len(all_test_labels), '\n', all_test_labels)

Before remap ----
all_train_labels: 61 
 {'d', 'w', 'ax-h', 'uh', 's', 'epi', 'el', 'ix', 'ng', 't', 'aw', 'eh', 'dcl', 'k', 'nx', 'oy', 'p', 'ux', 'hh', 'sh', 'ay', 'm', 'b', 'er', 'dx', 'r', 'ey', 'ih', 'kcl', 'hv', 'en', 'f', 'bcl', 'h#', 'pcl', 'dh', 'ae', 'v', 'q', 'l', 'ax', 'uw', 'g', 'em', 'eng', 'jh', 'th', 'y', 'gcl', 'ow', 'zh', 'aa', 'n', 'pau', 'tcl', 'ah', 'ch', 'ao', 'iy', 'z', 'axr'}

all_test_labels: 61 
 {'d', 'w', 'ax-h', 'uh', 's', 'epi', 'el', 'ix', 't', 'ng', 'aw', 'eh', 'dcl', 'k', 'oy', 'nx', 'p', 'ux', 'hh', 'sh', 'm', 'ay', 'er', 'b', 'dx', 'r', 'ey', 'ih', 'kcl', 'f', 'en', 'bcl', 'hv', 'h#', 'pcl', 'dh', 'ax', 'v', 'q', 'l', 'ae', 'uw', 'g', 'em', 'eng', 'jh', 'th', 'y', 'ow', 'gcl', 'zh', 'aa', 'n', 'pau', 'tcl', 'ah', 'ch', 'ao', 'iy', 'z', 'axr'}


In [19]:
# Remap and check
remapLabels(Train_phone_ds, phon61_map39)
remapLabels(Test_phone_ds, phon61_map39)

all_train_labels = getAllLabels(Train_phone_ds)
all_test_labels  = getAllLabels(Test_phone_ds)
print('\nAfter remap ----')
print('\nall_train_labels:', len(all_train_labels), '\n', all_train_labels)
print('\nall_test_labels:', len(all_test_labels), '\n', all_test_labels)

print('\nTrain:'); ipd.display(Train_phone_ds[:3])
print('Test:'); ipd.display(Test_phone_ds[:3])


After remap ----

all_train_labels: 39 
 {'d', 'w', 'uh', 's', 't', 'ng', 'aw', 'eh', 'k', 'oy', 'p', 'hh', 'sh', 'ay', 'm', 'b', 'er', 'dx', 'r', 'ey', 'ih', 'f', 'h#', 'ae', 'dh', 'v', 'l', 'uw', 'g', 'jh', 'th', 'y', 'ow', 'aa', 'n', 'ah', 'ch', 'iy', 'z'}

all_test_labels: 39 
 {'d', 'w', 'uh', 's', 't', 'ng', 'aw', 'eh', 'k', 'oy', 'p', 'hh', 'sh', 'ay', 'm', 'er', 'b', 'dx', 'r', 'ey', 'ih', 'f', 'h#', 'ae', 'dh', 'v', 'l', 'uw', 'g', 'jh', 'th', 'y', 'ow', 'aa', 'n', 'ah', 'ch', 'iy', 'z'}

Train:


[['h#', './dataset/TIMIT/data/TRAIN/DR4/MMDM0/SI1311.WAV', 0, 2680],
 ['s', './dataset/TIMIT/data/TRAIN/DR4/MMDM0/SI1311.WAV', 2680, 5640],
 ['aa', './dataset/TIMIT/data/TRAIN/DR4/MMDM0/SI1311.WAV', 5640, 7853]]

Test:


[['h#', './dataset/TIMIT/data/TEST/DR4/MTLS0/SX290.WAV', 0, 2370],
 ['dh', './dataset/TIMIT/data/TEST/DR4/MTLS0/SX290.WAV', 2370, 2780],
 ['ih', './dataset/TIMIT/data/TEST/DR4/MTLS0/SX290.WAV', 2780, 3880]]

# Export the datasets

In [21]:
# Export the dataset with necessary information for the next notebook
note = '''
Notes:
- Phoneme record structure: (phoneme, path-to-audio, start-index, end-index)
- The dataset into train and test subsets.
- Some data is filtered according to a reference paper
    - SA records are removed from both datasets.
    - Only core test subset with 24 speakers are exported for testing.
    - Records missing phonetic-file (phoneme labels) are removed.
- Remapped phoneme from 61 to 39 classes.
'''

export_dict = {
    'note'  : note,
    'train' : Train_phone_ds,
    'test'  : Test_phone_ds,
}


# Save into an external file
save_path = './session/curated-dataset.pt'
torch.save(export_dict, save_path)
!ls -ltrh './session/'

total 1.3M
-rw-rw-r-- 1 makabir makabir  310 Jun 24 16:57 README.md
-rw-rw-r-- 1 makabir makabir 1.3M Jun 26 12:52 curated-dataset.pt
