We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After rebatch(), data iterator get_next() produce an error:
F tensorflow/core/framework/tensor.cc:833] Check failed: limit <= dim0_size (8194 vs. 8193)
no error
Step 1: Generate a parquet file by running following code
import numpy as np import pandas as pd import random data_list = [] for i in range(1, 10000): int_feature = random.randint(1, 100) # float_feature = random.random() array_feature = [random.randint(1, 10) for x in range(0, 4)] data_list.append([int_feature, array_feature]) df = pd.DataFrame(data_list, columns=["int_feature", "array_feature"]) df.to_parquet("parquet_sample_file.parquet")
Step 2: Load generated parquet in step 1 by HybridBackend
import tensorflow as tf import hybridbackend.tensorflow as hb filenames_ds = tf.data.Dataset.from_tensor_slices(['file1.snappy.parquet', 'file2.snappy.parquet', ... 'fileN.snappy.parquet']) hb_fields = [] hb_fields.append(hb.data.DataFrame.Field("feature1", tf.int64, ragged_rank=0)) hb_fields.append(hb.data.DataFrame.Field("feature2", tf.float32, ragged_rank=1)) hb_fields.append(hb.data.DataFrame.Field("feature3", tf.int64, ragged_rank=1)) ds = filenames_ds.apply(hb.data.read_parquet(8192, hb_fields, num_parallel_reads=tf.data.experimental.AUTOTUNE)) iterator = ds.apply(hb.data.rebatch(8192, fields=hb_fields)) it = iterator.make_one_shot_iterator() item = it.get_next() batch_size_dict = {} with tf.Session() as sess: print("====== start ======") total_batch_size = 0 while True: try: batch = sess.run(item) batch_size = len(batch['mod_series']) batch_size_dict[batch_size] = batch_size_dict.get(batch_size, 0) + 1 except tf.errors.OutOfRangeError: break
Running above code in a pyhon3 shell, an error shall be thrown:
Yes
The text was updated successfully, but these errors were encountered:
Thanks for reporting, can you provide a sample file for reproducing this issue?
Sorry, something went wrong.
(1) Generate a parquet file by running following code
(2) Load generated parquet file by HybridBackend will reproduce this issue
filenames_ds = tf.data.Dataset.from_tensor_slices(["parquet_sample_file.parquet"]) hb_fields = [] hb_fields.append(hb.data.DataFrame.Field("int_feature", tf.int64, ragged_rank=0)) # hb_fields.append(hb.data.DataFrame.Field("float_feature", tf.float32, ragged_rank=0)) hb_fields.append(hb.data.DataFrame.Field("array_feature", tf.int64, ragged_rank=1)) iterator = filenames_ds.apply(hb.data.read_parquet(100, hb_fields, num_parallel_reads=tf.data.experimental.AUTOTUNE)) iterator = iterator.apply(hb.data.rebatch(100, fields=hb_fields)).repeat(30) iterator = iterator.make_one_shot_iterator() item = iterator.get_next() with tf.Session() as sess: print("====== start ======") total_batch_size = 0 while True: try: a = sess.run(item) except tf.errors.OutOfRangeError: break
2sin18
Successfully merging a pull request may close this issue.
Current behavior
After rebatch(), data iterator get_next() produce an error:
Expected behavior
no error
System information
Code to reproduce
Step 1: Generate a parquet file by running following code
Step 2: Load generated parquet in step 1 by HybridBackend
Running above code in a pyhon3 shell, an error shall be thrown:
Willing to contribute
Yes
The text was updated successfully, but these errors were encountered: