# Introduction #

[TensorFlow Datasets](https://www.tensorflow.org/datasets/overview) (TFDS) is a library implementing the [Extract, Transform, Load](https://en.wikipedia.org/wiki/Extract%2C_transform%2C_load) process for Tensorflow. It contains utilities to assist in downloading and preparing data for use with [`tf.data.Dataset`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) pipelines. When prepared with TFDS, it is easy to use the dataset with TensorFlow models on Cloud TPUs.

In [1]:
!pip install --upgrade tensorflow-datasets # if it's not installed already
!cd /kaggle/working # decide where to create your data directory
!mkdir -p datasets # name doesn't matter
%cd datasets

Collecting tensorflow-datasets
  Downloading tensorflow_datasets-4.5.2-py3-none-any.whl (4.2 MB)
[K     |████████████████████████████████| 4.2 MB 515 kB/s 
Collecting protobuf>=3.12.2
  Downloading protobuf-3.19.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 65.5 MB/s 
Collecting importlib-resources; python_version < "3.9"
  Downloading importlib_resources-5.4.0-py3-none-any.whl (28 kB)
Collecting tensorflow-metadata
  Downloading tensorflow_metadata-1.6.0-py3-none-any.whl (48 kB)
[K     |████████████████████████████████| 48 kB 4.1 MB/s 
Collecting googleapis-common-protos<2,>=1.52.0
  Downloading googleapis_common_protos-1.54.0-py2.py3-none-any.whl (207 kB)
[K     |████████████████████████████████| 207 kB 63.3 MB/s 
[31mERROR: google-cloud-pubsub 1.4.3 has requirement google-api-core[grpc]<1.17.0,>=1.14.0, but you'll have google-api-core 1.17.0 which is incompatible.[0m
[31mERROR: allennlp 0.9

In [2]:
import tensorflow as tf
import tensorflow_datasets as tfds

In [3]:
builder = tfds.builder('xnli', data_dir='/kaggle/working/datasets')
builder.download_and_prepare() # this can take a few minutes

[1mDownloading and preparing dataset 17.04 MiB (download: 17.04 MiB, generated: 29.62 MiB, total: 46.65 MiB) to /kaggle/working/datasets/xnli/1.1.0...[0m


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Dl Completed...', max=1.0, style=Progre…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Dl Size...', max=1.0, style=ProgressSty…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Extraction completed...', max=1.0, styl…






HBox(children=(FloatProgress(value=0.0, description='Generating splits...', max=2.0, style=ProgressStyle(descr…

HBox(children=(FloatProgress(value=0.0, description='Generating test examples...', max=5010.0, style=ProgressS…

HBox(children=(FloatProgress(value=0.0, description='Shuffling /kaggle/working/datasets/xnli/1.1.0.incompleteP…

HBox(children=(FloatProgress(value=0.0, description='Generating validation examples...', max=2490.0, style=Pro…

HBox(children=(FloatProgress(value=0.0, description='Shuffling /kaggle/working/datasets/xnli/1.1.0.incompleteP…

[1mDataset xnli downloaded and prepared to /kaggle/working/datasets/xnli/1.1.0. Subsequent calls will reuse this data.[0m


In [4]:
ds = tfds.load('xnli', split='test', data_dir='/kaggle/working/datasets')
ds = ds.shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
for example in ds.take(1):
  print(example['premise'], example['hypothesis'], example['label'])

{'ar': <tf.Tensor: shape=(32,), dtype=string, numpy=
array([b'\xd9\x8a\xd8\xa8\xd8\xaf\xd9\x88 \xd8\xa3\xd9\x86\xd9\x87\xd8\xa7 \xd8\xa7\xd9\x84\xd9\x85\xd8\xad\xd8\xb7\xd8\xa9 \xd8\xa8\xd9\x8a\xd9\x86 \xd8\xa7\xd9\x84\xd8\xb4\xd8\xb1\xd9\x83 \xd9\x88\xd8\xa7\xd9\x84\xd8\xaa\xd9\x88\xd8\xad\xd9\x8a\xd8\xaf\xd8\x8c \xd9\x88\xd9\x87\xd9\x88 \xd9\x85\xd9\x81\xd9\x87\xd9\x88\xd9\x85 \xd9\x85\xd9\x81\xd9\x8a\xd8\xaf \xd9\x8a\xd9\x82\xd8\xaf\xd9\x85 \xd8\xad\xd9\x84\xd9\x82\xd8\xa9 \xd8\xa7\xd9\x84\xd9\x88\xd8\xb5\xd9\x84 \xd8\xa7\xd9\x84\xd9\x85\xd9\x81\xd9\x82\xd9\x88\xd8\xaf\xd8\xa9 \xd9\x81\xd9\x8a \xd8\xa7\xd9\x84\xd8\xb9\xd9\x85\xd9\x84\xd9\x8a\xd8\xa9 \xd8\xa7\xd9\x84\xd8\xaa\xd8\xb7\xd9\x88\xd8\xb1\xd9\x8a\xd8\xa9.',
       b'\xd9\x88\xd9\x85\xd9\x86 \xd8\xa7\xd9\x84\xd9\x88\xd8\xa7\xd8\xb6\xd8\xad \xd8\xa7\xd9\x86 \xd8\xaa\xd8\xb9\xd8\xb2\xd9\x8a\xd8\xb2 \xd9\x82\xd9\x8a\xd9\x85\xd8\xa9 \xd8\xa7\xd9\x84\xd9\x85\xd8\xb9\xd9\x87\xd8\xaf \xd8\xa7\xd9\x84\xd8\xa7\xd9\x85\xd8\xb1\xd9\x8a