Run very long on Training pipeline #12

truongnmt · 2018-04-30T23:49:42Z

I'm following this tutorial for detecting Atrial fibrillation but when run Training pipeline it take so much time.

I'm using Tesla K80 and I leave it ran all night, more than 7 hours, but now it's still running.
In this block it's running 1000 epochs:

template_train_ppl = (
    ds.Pipeline()
      .init_model("dynamic", DirichletModel, name="dirichlet", config=model_config)
      .init_variable("loss_history", init_on_each_run=list)
      .load(components=["signal", "meta"], fmt="wfdb")
      .load(components="target", fmt="csv", src=LABELS_PATH)
      .drop_labels(["~"])
      .rename_labels({"N": "NO", "O": "NO"})
      .flip_signals()
      .random_resample_signals("normal", loc=300, scale=10)
      .random_split_signals(2048, {"A": 9, "NO": 3})
      .binarize_labels()
      .train_model("dirichlet", make_data=concatenate_ecg_batch,
                   fetches="loss", save_to=V("loss_history"), mode="a")
      .run(batch_size=BATCH_SIZE, shuffle=True, drop_last=True, n_epochs=N_EPOCH, lazy=True)
)

train_ppl = (eds.train >> template_train_ppl).run()

Do you thing that we have smt not right here?
Or does the framework have something indicate that it's running, maybe print the number of current epoch it's running?
And btw when I run I see this in terminal FYI:

2018-04-30 16:19:57.630351: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.71GiB. The caller indicates that this
 is not a failure, but may mean that there could be performance gains if more memory is available.

The text was updated successfully, but these errors were encountered:

roman-kh · 2018-05-01T00:24:32Z

You might add a line

.call(lambda _, v: print(v[-1]), v=V('loss_history'))

before run(batch_size=BATCH_SIZE,...).

It will print a loss function value at each iteration.

truongnmt · 2018-05-01T00:39:30Z

Thanks a lot, it worked!!!
And btw, it has just finished 1000 epochs 🔥 🔥 🔥

emadahmed97 · 2018-05-14T01:58:38Z

@truongnmt How long did it take you?

truongnmt · 2018-05-14T03:58:42Z

@emadahmed97 took me about 8 or 9 hours dude

truongnmt closed this as completed May 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run very long on Training pipeline #12

Run very long on Training pipeline #12

truongnmt commented Apr 30, 2018 •

edited

Loading

roman-kh commented May 1, 2018

truongnmt commented May 1, 2018

emadahmed97 commented May 14, 2018

truongnmt commented May 14, 2018 •

edited

Loading

Run very long on Training pipeline #12

Run very long on Training pipeline #12

Comments

truongnmt commented Apr 30, 2018 • edited Loading

roman-kh commented May 1, 2018

truongnmt commented May 1, 2018

emadahmed97 commented May 14, 2018

truongnmt commented May 14, 2018 • edited Loading

truongnmt commented Apr 30, 2018 •

edited

Loading

truongnmt commented May 14, 2018 •

edited

Loading