-
Notifications
You must be signed in to change notification settings - Fork 74.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Redesigning TensorFlow's input pipelines #7951
Comments
A must-have for one of our use cases is ad-hoc creation of data elements via a callback function (which creates tensors on the fly, e.g. using py_func() or through some other means). More specifically, we currently have a use case where we employ two queues; an outer one, using a string_input_producer (with shuffling), where each string denotes/points to a "data set", and the inner queue is then produced by generating a variable amount of samples from each "data set". Which and how many samples are generated differs per epoch (potentially conditional on past training behavior). Actually, we don't even use the nomenclature of an epoch anymore, since the same data is never seen twice, and above mentioned generation/sampling goes beyond the usual data augmentation. Long story short: With a slightly out-of-the-ordinary use case, we've been hit by pretty much all of the problems you have mentioned above, and our workarounds have not been pretty. We'd be extremely happy to see a very flexible mechanism, where such cases are supported, and data generation doesn't have to be shoehorned into forced-upon concepts like epochs, finitely repeating queues, etc. (although they can be modeled by its primitives). Edit: Things we still need of course, include multi-threaded data generation, and multi-threaded random shuffle producer-consumer queues. But without the bane of GIL -- maybe via easy C++/Eigen hooks and thread control on the native side? Back and forth, via pybind? Edit2: The new input pipeline should also take support for variable-sized tensors (i.e. different per example) into account, for both training and inference, e.g. in a fully-convolutional setting. |
@kmhofmann We'll certainly support We're planning to have a few nested iteration primitives, so that you can write a function mapping an element (e.g. a string representing your outer "data set") to a Let me know if any of this is unclear! |
Oh good timing! Now I can stop writing my own (horrible) dataset class. Many of the things said already resonate with my experience. To the extent possible, I would like to code dataset-independent tensorflow computations. I don't want to have 3 different gan classes: each with their own create graph and fit methods, simply because one dataset doesn't fit in memory, the other is an np.array, and the other is generated on the fly. The use case that affects me the most is [do n times: train for k iter/epoch, validate model, repeat]. There are clear problems with queues like you said. A minor issue for which I offer no solution is that while the scheduling(how long to train for before validate) is done perhaps by some method of a model class that I would like to be dataset-independent, whether it makes sense to talk in terms of iter or epoch is determined by the dataset--ruining some of the independence. Some other ideas I jotted down while brainstorming my own class:
|
|
Good point. One thing is that currently the queue operations are "baked in" the computation graph, so it's hard to modify anything on the go. A higher abstraction can make it much easier without considering using control flows or other hacks. |
For a lot of my use cases, my input data is either 1. not on the file system, or 2. require complex preprocessing unavailable in TensorFlow. For both cases the existing input pipeline cannot help, so I use an input thread with enqueue/feed_dict + a training thread with dequeue a lot. Let's assume that in most cases, you don't need to use the model itself to produce data (though sometimes it's not true). Then a solution I really like to see, is to be able to receive(similar to dequeue) tensors from a different process. (Like #4836)
These are the features I really missed from a private system I've been using. |
@mrry One "data set" can be composed of anything between ~500-30,000 dynamically generated samples. At the moment, we don't perform specific operations at the end of each data set, i.e. everything gets put into the same (large) random shuffle queue, to mix samples between data sets. But I could also imagine cases where separation of sets might be helpful. |
Please support reading hdf5 file directly. |
Personally, I'm a very big fan of the |
I am glad to see this initiative. The input pipeline is definitely the I'd like:
Here my attempt to translate to small in-memory dataset some of the routines available in the TF's input pipeline for large dataset. Here some examples of what I would like to see available in TensorFlow:
These routines demonstrate how far you can go with just simple iterators over list of indices. |
+1 to something like feed_dict. That's the only way to learn by interacting with external world (training robot arms, Atari games, Universe ). It could be made more efficient by avoiding copies. Like PyTorch, whose Tensors share memory buffers with underlying numpy arrays |
I don't know TF as well as others here, so please take my comments with some skepticism:
|
right now there are two very divergent paths to getting data into Tensorflow: feed_dict and queues. queues are wonderful until you don't have a way to manipulate your data natively -- for example, if you want to load a .wav file, chop it into parts, and convert it to a spectrogram. at that point, you have to write a C++ op (doable, but a context switch + it makes a very inflexible pipeline) or pop back into Python land (slower, but very easy and flexible). it seems like the best compromise between speed and flexibility is to create a TF queue and then make a bunch of Python threads that feed it with data. this allows you to do flexible data processing in Python (roughly parallelized on the CPU, apart from GIL issues) while maintaining some amount of speed benefit. what if you just formalized that? the interface would be: push_data, end_of_data (for signaling the end of an epoch), and a dequeue_batch function that feeds the model. then your code could just load data in Python and stuff it onto the queue in parallel, while the model sits totally separate from all of that. |
We should make feed_dict faster (likely by not copying the numpy.arrays like @yaroslavvb mentioned), but that's orthogonal to this change. No matter how much we optimize it, feed_dict will never be the fastest way to feed data into a training job. |
|
The fastest option would be to create a TensorFlow op that maintains state, takes actions as input, and generates the training data. Then add a placeholder to specify the action. My guess is that you're looking for something that can be done completely in Python, though. There may be some mid-point between the two. |
I am not sure it this concept has been brought up yet, but I will at least put the problem in my own terms. In dealing with RL problems and the training replay buffer, I couldn't find an easy way to use the Queues to speed up this feeding of samples through the feed_dict. Also, when randomly creating a sample set, it seemed like the samples were consumed when I wanted them left in the buffer. What I was hoping to do is feed (possibly through feed_dict, or file) a Queue with a new sample and once the size of the buffer is exceeded, the oldest sample is removed from the buffer. So some concept of "sample age" would be nice. I am sure using a circular buffer will work to fix to a number of samples, but "age" might be of interest as well, maybe passed as part of the tuple, but in the RL case, simply the sequence of the sample being added might cover the age (FIFO). Again, it may have just not been clear to me how to use the queues, but being able to randomly pull a mini-batch from this sample buffer and not remove the samples so a new set of samples can be collected (possibly with prior sampled examples) would be nice. |
I may not understand the distributed settings that TF data input pipeline API is targeting to solve. Is it possible to have a simple API design as Pytorch does: only three simple classes. I can pick up pytorch's dataset API in 5 minutes and it's good enough for all the popular academic datasets. http://pytorch.org/docs/data.html It's great to see new efforts to solve the pain points in TF dataset API. Looking forward to a simple/beautiful/flexible API with minimum number of classes/concepts introduced. Thanks. |
I second @lming's sentiment above. Our biggest issue with the current data loading scheme is just that it's very complicated and involves a lot of new concepts. We don't find it spectacularly difficult to write a multithreaded data loader ourselves in Python, and generally we don't find it overly difficult to ensure that our data loading and preprocessing runs sufficiently quickly that it doesn't actually bottleneck training. Where we're stuck is that to optimally follow recommendations, we end up in an awkward situation, one of:
The Python threading API isn't perfect, but in general when we're doing mostly non-GIL-taking tasks in NumPy or whatever, the TF queue API seems more of a burden than a help. |
Just want to add something here, I implemented a multiprocess-based data feeding pipeline for multi-task learning. It can achieve avg. GPU utilization >90% and quad-core CPU utilization >95%. Less prone to memory leak and particularly good for days-long training. Not saying it's perfect, but at least works much better than current TF queue API in my case. If anyone interested: https://hanxiao.github.io/2017/07/07/Get-10x-Speedup-in-Tensorflow-Multi-Task-Learning-using-Python-Multiprocessing/ |
That was already done in TensorPack for a while now by @ppwwyyxx. There you also get further speedup using ZMQ -- plus it has a nice interface using Python generators. For me, the way tensorpack handles input data, is the most elegant way. I hope to see something like this in a future TF. |
It would be great to have GPU resident queues. |
@xieqihuiPG See StagingArea and MapStagingArea |
Would greatly appreciate:
|
#11591 We need efficient sampling/shuffling for large datasets |
What about supporting custom ops to create a Dataset? For example, let's say I have a Python function which returns a new batch on each call (a generator). I want to wrap this function using I currently use this method with |
@mrry This is great work and definitely very useful for creating nice learning APIs on top of TensorFlow. However, I have a couple main concerns:
Also, if my description is terribly unclear, please let me know and I'll try to clarify. |
The new input pipelines are great! But unfortunately, we are unable to use them for large-scale training because our data preprocessing is quite costly and needs to be distributed across multiple machines--or we just haven't figured out the right way to do it. We have thus been using the old # Set up queues
kwargs = {'capacity': ..., 'dtypes': ..., 'names': ..., 'shapes': ...}
train_queue = tf.FIFOQueue(**kwargs)
valid_queue = tf.FIFOQueue(**kwargs)
queue_index = tf.Variable(0, trainable=False)
queue = tf.QueueBase.from_list(queue_index, [train_queue, valid_queue])
batch_size = ...
batch = queue.dequeue_many(batch_size)
# Build model
output = build_model(batch['X'])
loss = evaluate_loss(output, batch['y'])
# Fill queues
train_data_stream = some_iterable_for_training_data()
validation_data_stream = some_iterable_for_training_data()
start_filling_queues_in_background_thread(train_queue, train_data_stream)
start_filling_queues_in_background_thread(validation_queue, validation_data_stream) Having two different queues with The
None of these are satisfactory and it would be great to see either the ability to construct Looking forward to hear whether we've just not been using the datasets API right. |
I think the queues are nice enough. I'd like to see two things improved though: An easier way of inputting data from native python other than using placeholders, and managing threads. Maybe a class Anyway, that's just my thoughts. It's probably a lot harder than this due to the static requirements of tensorflow. Maybe you just have to provide the sizes when you create the |
It is currently not well supported with MonitoredTrainingSessions. See especially: tensorflow/tensorflow#7951 With the previous approach training "restarted" at step 0 with the test because the graph changed which led to conflicting summaries.
I am using the new api As you can see, the real-world problems are more than just feeding into a series of images or texts. So I would really appreciate if you could let me to feed the data freely in terms of when and how. I can image two options. One is efficient distributed reading through |
use placeholder as input to a queue, and the model reads inputs from the queue, then use a session run thread to feed inputs(maybe produced by hadoop mapreduce) to the queue. use staging area you can even hide all preprocessing and input time. |
With this change, it becomes possible to use a Python generator as the source dataset for a `tf.contrib.data` input pipeline. This enables easier integration with non-TensorFlow data sources. The generator can yield a nested structure of NumPy arrays, or values convertible to NumPy arrays. This addresses a concern raised in issue tensorflow#7951. PiperOrigin-RevId: 165663857
I'm trying to test the example in the doc But seems that this call is passing only 1 argument to the function:
Instead the function is defined with two parameter |
@eaplatanios one relevant PR for zip/unzip is #10837 |
END_PUBLIC --- Commit 575bd01 authored by Vijay Vasudevan<vrv@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove /replica:0 declaration in device functions and allow them to be freely bound based on cluster names present. When more than one value matches, it will choose the first lexicographically available device that matches the specification, which in practice will do pretty much the same thing as hardcoding /replica:0. PiperOrigin-RevId: 165766815 --- Commit d685bbc authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Benchmarks with backprop enabled (and removes overhead). Before: np.array([[3]]) took 1.50us (30000 iterations) Tensor([[3]]) took 16.30us (30000 iterations) MatMul [2, 2]: np.dot took 0.61us (30000 iterations) MatMul [2, 2]: tf.matmul took 60.53us (30000 iterations) MatMul [2, 2]: gen_math_ops.mat_mul took 25.72us (30000 iterations) MatMul [2, 2]: TFE_Py_Execute took 2.82us (30000 iterations) MatMul [2, 2]: defun(tf.matmul) took 45.70us (30000 iterations) MatMul [100, 784]: np.dot took 383.32us (1000 iterations) MatMul [100, 784]: tf.matmul took 350.35us (1000 iterations) MatMul [100, 784]: gen_math_ops.mat_mul took 315.97us (1000 iterations) MatMul [100, 784]: TFE_Py_Execute took 249.42us (1000 iterations) MatMul [100, 784]: defun(tf.matmul) took 280.95us (1000 iterations) If backprop is enabled: np.array([[3]]) took 0.83us (30000 iterations) Tensor([[3]]) took 15.21us (30000 iterations) MatMul [2, 2]: np.dot took 0.63us (30000 iterations) MatMul [2, 2]: tf.matmul took 76.31us (30000 iterations) MatMul [2, 2]: gen_math_ops.mat_mul took 38.66us (30000 iterations) MatMul [2, 2]: TFE_Py_Execute took 2.31us (30000 iterations) MatMul [2, 2]: defun(tf.matmul) took 51.96us (30000 iterations) MatMul [100, 784]: np.dot took 378.34us (1000 iterations) MatMul [100, 784]: tf.matmul took 352.09us (1000 iterations) MatMul [100, 784]: gen_math_ops.mat_mul took 364.28us (1000 iterations) MatMul [100, 784]: TFE_Py_Execute took 350.68us (1000 iterations) MatMul [100, 784]: defun(tf.matmul) took 377.19us (1000 iterations) After: np.array([[3]]) took 0.86us (30000 iterations) Tensor([[3]]) took 15.19us (30000 iterations) MatMul [2, 2]: np.dot took 0.60us (30000 iterations) MatMul [2, 2]: tf.matmul took 64.51us (30000 iterations) MatMul [2, 2]: gen_math_ops.mat_mul took 28.34us (30000 iterations) MatMul [2, 2]: TFE_Py_Execute took 2.38us (30000 iterations) MatMul [2, 2]: defun(tf.matmul) took 48.50us (30000 iterations) MatMul [100, 784]: np.dot took 475.27us (1000 iterations) MatMul [100, 784]: tf.matmul took 399.50us (1000 iterations) MatMul [100, 784]: gen_math_ops.mat_mul took 307.80us (1000 iterations) MatMul [100, 784]: TFE_Py_Execute took 272.83us (1000 iterations) MatMul [100, 784]: defun(tf.matmul) took 350.06us (1000 iterations) PiperOrigin-RevId: 165765641 --- Commit d902bab authored by David Majnemer<majnemer@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Algebraic simplifier incorrectly transformed convolutions into bitcasts PiperOrigin-RevId: 165765575 --- Commit 8e78e10 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: disable test temporarily PiperOrigin-RevId: 165763204 --- Commit a271c37 authored by Benoit Steiner<bsteiner@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Small improvements to the arithmetic optimizer PiperOrigin-RevId: 165760972 --- Commit b640959 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Convert some tests to cover both eager and graph. PiperOrigin-RevId: 165760364 --- Commit 5ead764 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Reduce XLA compile time by ~7% for a convolutional image model: * Added CompactPointerSet<T>, which is optimized for set size <= 1. * Changed expensive CHECKs to DCHECKS in buffer_assignment.cc * Reserve space in DFS state array before starting DFS. * Use unsigned arithmetic in DFS state maintenance. * HloInstruction: - Moved frequently used fields to start for better cache locality. - Use InlinedVector instead of vector for operand array. - Use InlinedVector instead of vector for DFS stack. * Pre-compute "is array" and "is tuple" for LogicalBuffer. * PointsToSet: - Combine two ShapeTrees into one. - Use CompactPointerSet instead of std::set to hold sources. - Use CompactPointerSet instead of std::set to hold flattened buffers. * ShapeTree: use unique_ptr instead of optional for shape storage (reduces size and destruction overhead). * Add proper const qualifiers to some FlatSet iterator methods. Co-author=jeff PiperOrigin-RevId: 165759117 --- Commit a0544b0 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Make TPU symbols more easily accessible from contrib. PiperOrigin-RevId: 165753322 --- Commit cdc08af authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Slightly relax numeric tolerance for sinlge precision tests of matrix_solve_ls (and tighten it for double precision). PiperOrigin-RevId: 165750936 --- Commit eebcc86 authored by Jianwei Xie<xiejw@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fixed the race condition between multi eval step increments. PiperOrigin-RevId: 165750595 --- Commit bbc0b84 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 165748384 --- Commit 65f87c9 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Change device string in RecvNodeDescriptor in VirtualScheduler from const reference to const as the RecvNodeDescriptor (and cached_recv_nodes map) outlives device string from the NodeDef. PiperOrigin-RevId: 165748244 --- Commit 57b0276 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update ops-related pbtxt files. PiperOrigin-RevId: 165747467 --- Commit 64e5442 authored by Derek Murray<mrry@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [tf.contrib.data] Fix nested dictionary handling in dataset elements. Backports recent changes to the core version of the nest.py library. Fixes tensorflow#12372. PiperOrigin-RevId: 165746517 --- Commit 378463a authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Make tf.eye accept Python integer shapes and avoid generating unnecessary shape handling ops. Clean up test and add tests with placeholders. PiperOrigin-RevId: 165746090 --- Commit 109ecf8 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add support for complex in matrix_solve_ls_op. Split into separate files for each data type to speed up build. PiperOrigin-RevId: 165744539 --- Commit 5144130 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Internal change. PiperOrigin-RevId: 165737455 --- Commit d0cb32c authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Docstring for ResourceVariable. PiperOrigin-RevId: 165735441 --- Commit 32f4c5b authored by Chris Leary<leary@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add IsFinite op in tf2xla. PiperOrigin-RevId: 165734702 --- Commit 5f5c3eb authored by Mark Daoust<markdaoust@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Move "supervisor.md" from programmer's guide to api_guides. PiperOrigin-RevId: 165732026 --- Commit d001b58 authored by Derek Murray<mrry@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [tf.contrib.data] Fix handling of multi-output tf.py_func() in Dataset.map(). If the `map_func` returns a list of tensors, the current code will attempt to stack it into a single tensor and raise an unintuitive error. Some multi-output ops (such as `tf.py_func()`) return lists of typically-not-stackable tensors. This change treats lists returned from `map_func` as tuples; users who were relying on this auto-stacking behavior should manually call `tf.stack()` (or `tf.convert_to_tensor()`) on the list being returned. Fixes tensorflow#12396. PiperOrigin-RevId: 165731970 --- Commit e6c60fb authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix flakyness, sometimes the op takes ms to run. PiperOrigin-RevId: 165728705 --- Commit 360bff8 authored by Ali Yahya<alive@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Makes tape.watch() work with ResourceVariables. To this end, also adds a property, `device`, to TensorNode. PiperOrigin-RevId: 165726368 --- Commit 80bd004 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Implements SVDF model for keyword spotting tutorial. PiperOrigin-RevId: 165725938 --- Commit aaabf6b authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix bug: Using a ComputationDataHandle from the wrong ComputationBuilder. PiperOrigin-RevId: 165724017 --- Commit 107d165 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Use 2-arg TraceMe constructor to prevent unnecessary StrCat computation when tracing is disabled. PiperOrigin-RevId: 165722280 --- Commit 7d01f89cc authored by Pete Warden<petewarden@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Android demo app for speech recognition PiperOrigin-RevId: 165714459 --- Commit a672932 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Deletes convert_n_to_eager_tensor. Moves convert_to_eager_tensor to constant_op. PiperOrigin-RevId: 165704074 --- Commit 573b303 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BUILD cleanup in tensorflow/core/kernels PiperOrigin-RevId: 165688864 --- Commit 711be6a authored by Derek Murray<mrry@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: `Dataset.from_generator()` constructs a dataset from a Python generator. With this change, it becomes possible to use a Python generator as the source dataset for a `tf.contrib.data` input pipeline. This enables easier integration with non-TensorFlow data sources. The generator can yield a nested structure of NumPy arrays, or values convertible to NumPy arrays. This addresses a concern raised in issue tensorflow#7951. PiperOrigin-RevId: 165663857 --- Commit 00594ec authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: New landing page and leftnav for Programmer's Guide. PiperOrigin-RevId: 165660897 --- Commit 7359fec authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Implement Batchnorm Inference by expanding them into smaller ops. 1. Add batch norm inference support in batchnorm_rewriter 2. Connect xla's batchnorm inference to tf's FusedBatchNorm RELNOTES: n/a PiperOrigin-RevId: 165655351 --- Commit f0da8bf authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [Rematerialization] Reconsider to remat operations with control dependencies We added a conservartive logic to not rematerialize operations with control dependencies since the rematerialized operations could result in undesired ordering. However, we now realize that when we remat an operation, we also copy the dependencies of them, which guarantees the rematerialized operation has the same constraint as the original operation. PiperOrigin-RevId: 165654629 --- Commit a122587 authored by Chris Leary<leary@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Propagate error code in computation replay tool. PiperOrigin-RevId: 165654497 --- Commit 513def0 authored by Benoit Steiner<bsteiner@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fixed BuildOpInfoWithoutDevice PiperOrigin-RevId: 165653933 --- Commit d7e425f authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix linear algebra benchmarks. PiperOrigin-RevId: 165653891 --- Commit 465c408 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix the shape information propagation for Enter op. PiperOrigin-RevId: 165653579 --- Commit c0198fd authored by Derek Murray<derek.murray@gmail.com> Committed by gunan<gunan@google.com>: [CMake] Add missing dependencies on boosted_trees protos and other fixes (tensorflow#12315) * [CMake] Add missing dependencies * Avoid rebuilding boosted_trees protos for Python. * Add GPU implementation ZeroInitializerOp to the CMake build. --- Commit 641943f authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update ops-related pbtxt files. PiperOrigin-RevId: 165652758 --- Commit e313464 authored by Jonathan Hseu<jhseu@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: TPUEstimator: Fix the outfeed thread join. PiperOrigin-RevId: 165651781 --- Commit 565a9d3 authored by Vijay Vasudevan<vrv@google.com> Committed by Andrew Harp<andrewharp@users.noreply.github.com>: Add missing 'type' keyword to ArgumentParser add_argument (tensorflow#12275) Fixes tensorflow#12210 --- Commit 19a5572 authored by Rohan Jain<rohanj@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Allowing functions to run across devices. This change expands the ProcessFunctionLibraryRuntime library to Instantiate and Run functions on different devices. When a FunctionLibraryRuntime encounters a function with a target that is another device, it delegates Instantiate() and Run() calls to the ProcessFunctionLibraryRuntime. This change also moves the table_ containing all function instantiations to the PFLR instead of the FunctionLibraryRuntime. PiperOrigin-RevId: 165651194 --- Commit 8c0853d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add a test for negative and zero pow() input. PiperOrigin-RevId: 165650096 --- Commit a3c4e98 authored by Pete Warden<petewarden@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fixed input shape for freezing audio graphs PiperOrigin-RevId: 165649546 --- Commit 9b9e598 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add a call_logit_fn utility for logit_fn's, similar to Estimator's _call_model_fn. PiperOrigin-RevId: 165649388 --- Commit 4ff1f44 authored by Amit Patankar<amitpatankar@google.com> Committed by Amit Patankar<amitpatankar@google.com>: Remove the script as well if building tf_nightly. --- Commit 373d789 authored by Amit Patankar<amitpatankar@google.com> Committed by Amit Patankar<amitpatankar@google.com>: Adding the break. --- Commit 0139ac9 authored by Amit Patankar<amitpatankar@google.com> Committed by Amit Patankar<amitpatankar@google.com>: Remove tensorboard as a required package if we are building tf_nightly. --- Commit a92bd5d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BEGIN_PUBLIC Automated g4 rollback of changelist 165630063 PiperOrigin-RevId: 165957821
@mrry Have you tested this section of the documentation with python3? |
@bhack I haven't been able to make it work with more than one parameter in return from the function given to py_func. I'm using python3 and didn't tried with python2. |
@AMairesse The first problem was solved with 2139e7d |
@bhack Thanks, will try that soon, I was using a workaround which I'm not proud of :-) |
@AMairesse I suggest you to notify this in #11786 |
need more operator for image process, like map_coordinates, so We can build image augmentation pipe line only use tensorflow And Dataset do not stably init variable defined in the map function as #12648 |
I'd like to re-raise an earlier performance-related question by @kratzert that seems to have fallen out of focus. The performance gain of using the new Dataset API is negligible. @ppwwyyxx stated that queues and StagingArea can still be used with the Dataset API, but I still haven't seen a working example of this. Do we have one? What purpose does the new API serve if one must still include queues, data_flow_ops or StagingArea complexities? |
@vvekic, I experimented a bit with queues and the Dataset API after realising in horror that of the 0.8s/step in my inference loop, 0.2s is data fetching (with GPU at 0% utilization), raising to almost 2 seconds if the HDD is being used by something else at the same time.
I still have to run this on a big dataset and check if there's any performance improvement, but at least it seems to execute correctly. The catch is, I couldn't find a way to iterate over the data more than once (which luckily enough is not my use-case), because the only iterator that won't raise an error when the |
I don't know if I'm right here, but I have a question about the dataset API. My dataset contains one column with sequences and one with sequence length which i want treat different, because i want to pad the sequences. Is it possible to address a single column in the dataset so that it is treated different from the other column? E.g.: two_column_dataset = ... # This contains the column sequence and sequence length
first_column_dataset = two_column_dataset[0].padded_batch(64, ...) # Pad only first column
second_column_dataset = two_column_dataset[1].batch(64) # Get corresponding sequence length for sequences
two_column_dataset = Dataset.zip((first_column_dataset, second_column_dataset)) Edit: After writing this, i found it out: def flat_map_func(sequence, sequence_length):
first_column_dataset = Dataset.from_tensors(sequence).padded_batch(64, ...)
second_column_dataset = Dataset.from_tensors(sequence_length).padded_batch(64)
zipped_dataset = Dataset.zip((first_column_dataset, second_column_dataset))
return zipped_dataset
two_column_dataset = two_column_dataset.flat_map(flat_map_func) |
This issue thread is becoming a bit unwieldy and it's getting hard to keep track of the individual discussions, so I'm going to lock it after responding to a few of the recent comments. Please feel free to open a new issue about any specific topics of feature requests related to In response to a few recent questions:
Thanks again to all of you for your continued interest in this part of TensorFlow! |
[TL;DR: We're designing a new input pipeline API for TensorFlow, and we'd like to collect your feature requests on this issue.]
We've noticed that one of the biggest challenges in getting started with TensorFlow is how to load your own data into your programs. While TensorFlow has several methods that can be used to build complex input pipelines (such as
tf.train.string_input_producer()
,tf.train.batch()
, etc.), they were designed for a particular use case (processing a static set of files repeatedly), and the average user experience with these methods is not great. For example:tf.errors.OutOfRangeError
).tf.train.start_queue_runners(sess)
: in fact, they hang indefinitely and deadlock the user program.We're decided to start from a clean slate and redesign the input pipeline API. The existing methods will remain until TF 2.0 (at least), but we are planning to add a new set of methods for loading and manipulating datasets. We're still preparing a detailed design, which we plan to share soon, but we anticipate that there will be two new APIs:
Dataset
represents a collection of data elements. Each element can be a tuple of one or more tensors (e.g. an image and its label). We will provide methods for creating datasets from tensors, and deriving them from another dataset (e.g. by slicing its elements, repeating its elements, shuffling its elements, batching its elements, mapping a function over its elements, etc.).Iterator
can be created from aDataset
. An iterator represents the current position within a dataset, and exposes an operation (liketf.QueueBase.dequeue()
) that can be run to get the next element. There will be explicit operations for initializing an iterator, so that it can be reused after you have processed all of the elements in a dataset.A similar pattern turns up in many different settings, including Java's Stream API, Scala's collections (and hence Spark's RDDs), and .NET's Language Integrated Query.
We're announcing this plan early because we want to collect feedback on what features you—as TensorFlow users—would like to see in an input pipeline API. What other pain points have we missed? What features do you miss from other systems? What other suggestions do you have?
We look forward to hearing from you!
The text was updated successfully, but these errors were encountered: