Skip to content
This repository has been archived by the owner on Dec 21, 2023. It is now read-only.

Cannot train Sound Classifier: around 7000 examples Killed - turicreate.sound_classifier.create #3381

Open
Ignasinou opened this issue Jan 27, 2021 · 4 comments

Comments

@Ignasinou
Copy link

Hi, I'm trying to train a sound classifier model using around 7000 examples. The examples belong to two different classes. The SFrame is created from 2 different folders (the label is taken from the folder's name). Each example is between 1 and 5 seconds, and the sample rate is 44100. The split between train and test is computed correctly and I managed to save both SFrames, but then the process is killed when the training process starts, for 7000 examples it takes around 2 minutes to be killed.

model = tc.sound_classifier.create(data, target='label', feature='audio', validation_set=None, max_iterations=10)

As I have noticed in other reported issues, sometimes setting validation_data = None resolves the problem but not in my case. In my case, I tried with 2000 examples and it successfully trained the model, but when increasing the amount of data (e.g. 3000 examples) the process is killed after a while.

I am using the GPU: tc.config.set_num_gpus(-1)

Memory specifications:

GPUs = 1
vCPU = 4
Mem (GiB) = 16
GPU Memory (GiB) = 16

@TobyRoseman
Copy link
Collaborator

Sorry you're having this issue. Please give us a bit more information.

Are you using macOS or Linux?

Also please copy and paste the complete output. I want to see where it is failing.

@Ignasinou
Copy link
Author

Hi, thanks for the quick reply. I am using an EC2 amazon instance with Linux. The output that I am getting is Killed, probably related to a memory issue.

This is the code that I am using, almost the same as the tutorial.

`def load_irmas_dataset(data_dir):

# Load the input data from the directories.
data = tc.load_audio(data_dir, with_path=True, random_order = True)
data['label'] = data['path'].apply(lambda p: os.path.basename(os.path.dirname(p)))
print('data loaded')
train_set, test_set = data.random_split(0.8, seed=42)
print('data split')
test_set.save('m2_tc_test_20210128.sframe')
print('test data saved')
train_set.save('m2_tc_train_20210128.sframe')
print('train data saved')

model = tc.sound_classifier.create(train_set, target='label', feature='audio', validation_set=None, max_iterations=50)

model.save('m2_tc_model_20210128.sframe.model')
model.export_coreml('m2_tc_model_20210128.mlmodel')`

The code runs until this line model = tc.sound_classifier.create(train_set, target='label', feature='audio', validation_set=None, max_iterations=50)

If the number of files is even larger, the kernel is killed before, when saving the SFrames. It might be related to some inconsistency in the data? This should not be the case, the data is between 1 and 5 seconds duration and 44100 sr. But I have not checked if there are any examples with empty audio data.

@TobyRoseman
Copy link
Collaborator

Hi @Ignasinou - Please post the complete output of your call to tc.sound_classifier.create. The sound classifier gets created in three different stages. I want to see what stage is getting killed. There may be a workaround.

@Ignasinou
Copy link
Author

Sorry for the late reply. The output only shows Killed. I guess it gets killed in the first stage, or even before the first stage (?). I will try again during this week and paste the exact output and try to be more specific, but it doesn't show information regarding the stages, not like in other cases (using less data) where it shows the prints from the different stages.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants