Navigation Menu

Skip to content
This repository has been archived by the owner on Dec 21, 2023. It is now read-only.

Killed upon reaching iteration limit #1075

Closed
hipwelljo opened this issue Sep 8, 2018 · 10 comments
Closed

Killed upon reaching iteration limit #1075

hipwelljo opened this issue Sep 8, 2018 · 10 comments

Comments

@hipwelljo
Copy link

hipwelljo commented Sep 8, 2018

Using Turi Create 5.0, I seem to be encountering the same issue #361 that others saw with a previous release. It completes training due to reaching the iteration limit, and then logs Killed: 9. It does not evaluate the model and log the accuracy.

model = tc.image_classifier.create(train_data, target='label', model='squeezenet_v1.1')

predictions = model.classify(test_data)

metrics = model.evaluate(test_data)
print "Accuracy: %s" % metrics['accuracy']

The output is:

Logistic regression:
--------------------------------------------------------
Number of examples          : 49264
Number of classes           : 5742
Number of feature columns   : 1
Number of unpacked features : 1000
Number of coefficients      : 5746741
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes   | Step size | Elapsed Time | Training Accuracy | Validation Accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
| 0         | 1        | NaN       | 1410.938340  | 0.000142          | 0.000410            |
| 1         | 4        | 0.000101  | 6993.121902  | 0.000284          | 0.000000            |
| 2         | 6        | 1.000000  | 11684.099158 | 0.008038          | 0.002048            |
| 3         | 7        | 1.000000  | 14630.014436 | 0.056796          | 0.013923            |
| 4         | 8        | 1.000000  | 17736.467060 | 0.026713          | 0.005733            |
| 5         | 9        | 1.000000  | 20273.712870 | 0.072791          | 0.022113            |
| 6         | 10       | 1.000000  | 22618.025692 | 0.112049          | 0.037674            |
| 7         | 11       | 1.000000  | 25096.515562 | 0.178528          | 0.066339            |
| 8         | 12       | 1.000000  | 27513.523471 | 0.241698          | 0.088043            |
| 9         | 13       | 1.000000  | 29956.004078 | 0.290983          | 0.110156            |
| 10        | 14       | 1.000000  | 32559.100921 | 0.343476          | 0.139640            |
+-----------+----------+-----------+--------------+-------------------+---------------------+
Completed (Iteration limit reached).
This model may not be optimal. To improve it, consider increasing `max_iterations`.
Killed: 9
@srikris
Copy link
Contributor

srikris commented Sep 8, 2018

Can you share the data by any chance. We can investigate what's going on.

@hipwelljo
Copy link
Author

I can't post it publicly, is there a way I can send it to you directly?

@srikris
Copy link
Contributor

srikris commented Jan 10, 2019

@hipwelljo Can you confirm its fixed in the latest release.

@hipwelljo
Copy link
Author

It appears to be fixed. It has reached iteration 14 after 12 hours. I don't plan to let it finish. 😛

@TobyRoseman
Copy link
Collaborator

@hipwelljo - thanks for the update. I'll close this issue but let us know if you have the problem again.

@TobyRoseman
Copy link
Collaborator

Reopening since this issue only occurred at the end of training, probably when calculating validation set statistics.

@TobyRoseman TobyRoseman reopened this Jan 11, 2019
@hipwelljo
Copy link
Author

hipwelljo commented Jan 13, 2019

Ok I set max_iterations to 10 and tried it again. It logged

Logistic regression:
--------------------------------------------------------
Number of examples          : 49029
Number of classes           : 5742
Number of feature columns   : 1
Number of unpacked features : 1000
Number of coefficients      : 5746741
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes   | Step size | Elapsed Time | Training Accuracy | Validation Accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
| 0         | 1        | NaN       | 1190.438703  | 0.000245          | 0.000000            |
| 1         | 4        | 0.000102  | 6172.879564  | 0.000510          | 0.000000            |
| 2         | 6        | 1.000000  | 9557.887708  | 0.006935          | 0.000745            |
| 3         | 7        | 1.000000  | 11956.632976 | 0.054111          | 0.014898            |
| 4         | 8        | 1.000000  | 14345.693528 | 0.020682          | 0.006331            |
| 5         | 9        | 1.000000  | 16971.418269 | 0.054580          | 0.020484            |
| 6         | 10       | 1.000000  | 19478.623501 | 0.089784          | 0.031657            |
| 7         | 11       | 1.000000  | 22022.317719 | 0.151339          | 0.066294            |
| 8         | 12       | 1.000000  | 24646.645556 | 0.226193          | 0.096834            |
| 9         | 13       | 1.000000  | 27294.776593 | 0.264823          | 0.104283            |
| 10        | 14       | 1.000000  | 29925.663033 | 0.317302          | 0.128119            |
+-----------+----------+-----------+--------------+-------------------+---------------------+
Completed (Iteration limit reached).
This model may not be optimal. To improve it, consider increasing `max_iterations`.

then after a long while logged Killed: 9 :(

I verified I'm using v5.2.1.

As an aside it would be good to log what it's working on, like Calculating validation set statistics or whatever to not give the impression that it's fully completed as the previous line reads. :)

@nickjong nickjong added this to the 5.4 milestone Jan 17, 2019
@srikris
Copy link
Contributor

srikris commented Jan 17, 2019

We've identified the root cause and we have a plan to fix this. Milestone is set for 5.4. Thanks everyone!

@nickjong nickjong assigned hoytak and unassigned TobyRoseman Feb 7, 2019
@hoytak
Copy link
Collaborator

hoytak commented Feb 21, 2019

We believe this was fixed with #1402.

@hoytak
Copy link
Collaborator

hoytak commented Mar 14, 2019

In our internal tests, this has been resolved -- #1402 indeed seemed to fix the issue. Please reopen if there are more issues after 5.4 is released.

@hoytak hoytak closed this as completed Mar 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants