Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run TensorFlow graphs without Estimator #243

Closed
grahammorehead opened this issue May 1, 2018 · 1 comment
Closed

Can't run TensorFlow graphs without Estimator #243

grahammorehead opened this issue May 1, 2018 · 1 comment

Comments

@grahammorehead
Copy link

I have a unique model paradigm that does not fit the structure of an Estimator, i.e. There is no model_fn() definition such that I can get the correct inputs/outputs. In other words, even under the rubric of custom estimators, I cannot use a tf.Estimator, and instead of have had to code everything using the low-level TF API.

Is there a low-level version of the Python SageMaker API such that I can define my graph and run it, using feed_dict as needed, and all the other low-level TF API features?

@djarpin
Copy link
Contributor

djarpin commented May 2, 2018

Thanks @grahammorehead .

Currently, SageMaker's pre-built TensorFlow container requires you to operate within the tf.Estimator paradigm. This is a constraint we're looking to remove in the future, but don't have an existing workaround for.

You could still use SageMaker by bringing your own TensorFlow container. We've actually open sourced our TensorFlow container here, which could give you a starting point to work from.

Best of luck, and thanks for using SageMaker!

@djarpin djarpin closed this as completed Jun 4, 2018
atqy pushed a commit to atqy/amazon-sagemaker-examples that referenced this issue Aug 16, 2022
* Save histograms for weights and gradients

* Use standard TF summary function

* undo line break changes

* fix cases when bool tensor was being passed to add_histogram, and fix tests

* Fix region bug and update tb_writer construction

* Include summaries if any write_histogram was set to True

* Refactor writers in core

* set default step to 0

* Use new writer in hook

* Cherry picking change of refactor writers

* set default step to 0

* remove histogram related stuff

* rename IndexUtil

* Fix imports

* remove import of re

* Fix import of summary proto

* Fix step usage in writers

* Fix step usage by event file writer

* Remove direcotry in tensorboard directory, and add collection name as prefix for summaries created

* Fix import errors

* Fix resnet example which did not have str2bool args

* Fix core test

* Fix core test

* Indentation and move some code to a new function

* Merged Vikas' branch on tb data read

* Add untested support to read tensorboard data

* Write mode and mode_step for summaries, and fix the error of multiple global steps being assigned to same train step

* remove unnecessary file

* remove test script

* Remove changes to imagenet script

* working scalars

* Change path of tornasole event files

* Have new index file per mode for tensorboard events

* Move tensor values to different file

* move to outside tensors folder

* Change frequencies for tf examples

* Introduce CollectionKeys

* Merging export as json

* Make histogram a reduction config property, and add save_raw_tensor field to reduction config. Verified the usage  for tensorflow. Also some cleanup with respect to save config in save manager

* Fix bug in loading collections

* Fix writing tensorboard data in global mode

* Add graph support to pytorch models. Copied some new protos, and a couple of files from torch.tensorboard.

* Working graph export for mxnet

* Save graph correctly for mxnet

* undo utils change worker pid

* fix import

* fix import

* do not flush index writer

* remove data files

* Fix save config issue

* make save_histogram a property of collection

* Fix save config bugs, and add scalar support to TF

* Skip summaries whose tensors are unreachable in graph, and avoid adding histogram when original collection is not included

* Move histogram creation to writer instead of event_file_writer, refactor should_save_collection in save manager, add save_scalar methods to MXNet and Pytorch

* WIP tensor scalar support

* undo add of data

* remove test

* use correct writer

* Make saving scalars work, and added type checks

* Writing scalars and tensors supported. tested in tensorboard. need to test through trials

* WIP testing steps

* remove save scalar and tensor for now because of step number issues. work on trial loading tensorboard data and come back to this

* Working reads in non index mode

* Tensorboard reads working with indexing

* cleanup index file location function

* Make pytorch tests working

* Reduce length of test_estimator_modes, and add tf tensorboard test

* Add basic scalar summary test

* Untested completed reads of tensorboard data

* Add more tensorboard tests for trial

* fix test when reading event files for tensorboard from s3

* Fixed a reduction test

* Fix reduction test in TF

* Fix merge of a test

* fix logger import, and default save/reduction config in save manager

* Fix reduction save_raw_tensor in TF

* Some cleanup of prepare and collection includes

* fix tf tests

* Fix all tests

* Add tensorboard index test

* Fix tensorboard test wrt optimizer_variables

* not save histogram for strings

* remove when nan support

* add hash

* Fix collection checks in xgboost

* add xgboost tests

* Typo

* Update hook.py (aws#243)

* reduce length of test and add / to prefix

* WIP move to tornasole hist summaries for TF

* Change collections_to_save_for_step, make TF use custom histograms, refactor to _save_tensor method for all frameworks

* rename to save_for_tensor

* undo some files

* undo some files

* Update tests.sh

* remove pytorch graph support

* remove mxnet graph support

* cleanup

* remove tf tensorboard duplicated test

* Fix bug of tb writer not being closed after exporting graph

* WIP fixing tests

* Remove read changes

* fix value types remaining in code

* fix tests

* catch exception when nan

* use make_numpy_array for xgboost

* Fix xgboost error where collections_in_set was empty but not none

* change log

* remove summary collections

* tweak dry run behavior

* Fix dry run flag

* undo move of steps to own file

* Delete steps.py

* fix import

* fix import in test

* cleanup

* remove index for tensorboard data

* Address review comments

* Update hook.py
atqy pushed a commit to atqy/amazon-sagemaker-examples that referenced this issue Aug 16, 2022
* Save histograms for weights and gradients

* Use standard TF summary function

* undo line break changes

* fix cases when bool tensor was being passed to add_histogram, and fix tests

* Fix region bug and update tb_writer construction

* Include summaries if any write_histogram was set to True

* Refactor writers in core

* set default step to 0

* Use new writer in hook

* Cherry picking change of refactor writers

* set default step to 0

* remove histogram related stuff

* rename IndexUtil

* Fix imports

* remove import of re

* Fix import of summary proto

* Fix step usage in writers

* Fix step usage by event file writer

* Remove direcotry in tensorboard directory, and add collection name as prefix for summaries created

* Fix import errors

* Fix resnet example which did not have str2bool args

* Fix core test

* Fix core test

* Indentation and move some code to a new function

* Merged Vikas' branch on tb data read

* Add untested support to read tensorboard data

* Write mode and mode_step for summaries, and fix the error of multiple global steps being assigned to same train step

* remove unnecessary file

* remove test script

* Remove changes to imagenet script

* working scalars

* Change path of tornasole event files

* Have new index file per mode for tensorboard events

* Move tensor values to different file

* move to outside tensors folder

* Change frequencies for tf examples

* Introduce CollectionKeys

* Merging export as json

* Make histogram a reduction config property, and add save_raw_tensor field to reduction config. Verified the usage  for tensorflow. Also some cleanup with respect to save config in save manager

* Fix bug in loading collections

* Fix writing tensorboard data in global mode

* Add graph support to pytorch models. Copied some new protos, and a couple of files from torch.tensorboard.

* Working graph export for mxnet

* Save graph correctly for mxnet

* undo utils change worker pid

* fix import

* fix import

* do not flush index writer

* remove data files

* Fix save config issue

* make save_histogram a property of collection

* Fix save config bugs, and add scalar support to TF

* Skip summaries whose tensors are unreachable in graph, and avoid adding histogram when original collection is not included

* Move histogram creation to writer instead of event_file_writer, refactor should_save_collection in save manager, add save_scalar methods to MXNet and Pytorch

* WIP tensor scalar support

* undo add of data

* remove test

* use correct writer

* Make saving scalars work, and added type checks

* Writing scalars and tensors supported. tested in tensorboard. need to test through trials

* WIP testing steps

* remove save scalar and tensor for now because of step number issues. work on trial loading tensorboard data and come back to this

* Working reads in non index mode

* Tensorboard reads working with indexing

* cleanup index file location function

* Make pytorch tests working

* Reduce length of test_estimator_modes, and add tf tensorboard test

* Add basic scalar summary test

* Untested completed reads of tensorboard data

* Add more tensorboard tests for trial

* fix test when reading event files for tensorboard from s3

* Fixed a reduction test

* Fix reduction test in TF

* Fix merge of a test

* fix logger import, and default save/reduction config in save manager

* Fix reduction save_raw_tensor in TF

* Some cleanup of prepare and collection includes

* fix tf tests

* Fix all tests

* Add tensorboard index test

* Fix tensorboard test wrt optimizer_variables

* not save histogram for strings

* remove when nan support

* add hash

* Fix collection checks in xgboost

* add xgboost tests

* Typo

* Update hook.py (aws#243)

* reduce length of test and add / to prefix

* WIP move to tornasole hist summaries for TF

* Change collections_to_save_for_step, make TF use custom histograms, refactor to _save_tensor method for all frameworks

* rename to save_for_tensor

* undo some files

* undo some files

* Update tests.sh

* remove pytorch graph support

* remove mxnet graph support

* Revert "remove mxnet graph support"

This reverts commit 56754da7b44ce7276cf6c9830fd7b0308061ef55.

* Revert "remove pytorch graph support"

This reverts commit d5c49def8fb369f95282b384dc0bc8a9928ae941.

* remove old files

* fix export of models

* Create __init__.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants