# TensorFlow Datasets

**TensorFlow Datasets** provides a collection of over 100 datasets you can download with one line of code in this notebook. And you don't need TensorFlow to use these datasets - they can be converted to Numpy arrays with one more line of code.

### Available Datasets & Documentation
**A description of each of the datasets and more information about download options is available in the [TensorFlow Datasets Library](https://www.tensorflow.org/datasets/catalog/overview)**

[**TensorFlow Datasets Documentation**](https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/overview.ipynb#scrollTo=6XvCUmCEd4Dm)

#### Installing TensorFlow 2.0 on a Mac
To install TensorFlow 2.0 into an Anaconda virtual environment just run the following code from the terminal. In the example **tf** is the name of the virtual environment I want to create.
```
conda create -n tf tensorflow
conda activate tf
```

In [1]:
import pandas as pd
import tensorflow as tf
import tensorflow_datasets as tfds

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



In [2]:
# Set the destination for dataset downloads.
DATASETS_DIRECTORY = '/Users/garyb/Develop/TF2/tensorflow_datasets'

In [3]:
print('{} datasets are currently available'.format(len(tfds.list_builders())))

101 datasets are currently available


In [4]:
for dataset in tfds.list_builders():
    print(dataset)

abstract_reasoning
aflw2k3d
amazon_us_reviews
bair_robot_pushing_small
bigearthnet
binarized_mnist
binary_alpha_digits
caltech101
caltech_birds2010
caltech_birds2011
cats_vs_dogs
celeb_a
celeb_a_hq
chexpert
cifar10
cifar100
cifar10_corrupted
clevr
cnn_dailymail
coco
coco2014
coil100
colorectal_histology
colorectal_histology_large
curated_breast_imaging_ddsm
cycle_gan
deep_weeds
definite_pronoun_resolution
diabetic_retinopathy_detection
downsampled_imagenet
dsprites
dtd
dummy_dataset_shared_generator
dummy_mnist
emnist
eurosat
fashion_mnist
flores
food101
gap
glue
groove
higgs
horses_or_humans
image_label_folder
imagenet2012
imagenet2012_corrupted
imdb_reviews
iris
kitti
kmnist
lfw
lm1b
lsun
mnist
mnist_corrupted
moving_mnist
multi_nli
nsynth
omniglot
open_images_v4
oxford_flowers102
oxford_iiit_pet
para_crawl
patch_camelyon
pet_finder
quickdraw_bitmap
resisc45
rock_paper_scissors
rock_you
scene_parse150
shapes3d
smallnorb
snli
so2sat
squad
stanford_dogs
stanford_online_products
starcra

### Construct a ```tf.data.Dataset```
Setting ```with_info``` to ```True``` will result in the creation of two files in the data directory:
- ```dataset_info.json```
- ```image.image.json```

In [5]:
# The "titanic" is a relatively small dataset to confirm that your setup is working,
ds, info = tfds.load(name='titanic', with_info=True, data_dir=DATASETS_DIRECTORY, split=tfds.Split.TRAIN)



builder: DatasetBuilder, dataset builder for this info.
description: str, description of this dataset.
features: tfds.features.FeaturesDict, Information on the feature dict of the tf.data.Dataset() object from the builder.as_dataset() method.
supervised_keys: tuple of (input_key, target_key), Specifies the input feature and the label for supervised learning, if applicable for the dataset. The keys correspond to the feature names to select in info.features. When calling tfds.core.DatasetBuilder.as_dataset() with as_supervised=True, the tf.data.Dataset object will yield the (input, target) defined here.
urls: list(str), optional, the homepage(s) for this dataset.
citation: str, optional, the citation to use for this dataset.
metadata: tfds.core.Metadata, additonal object which will be stored/restored with the dataset. This allows for storing additional information with the dataset.
redistribution_info: dict, optional, information needed for redistribution, as specified in dataset_info_pb2.RedistributionInfo. The content of the license subfield will automatically be written to a LICENSE file stored with the dataset.

Look at the dataset info:

In [6]:
print(info)

tfds.core.DatasetInfo(
    name='titanic',
    version=1.0.0,
    description='Dataset describing the survival status of individual passengers on the Titanic. Missing values in the original dataset are represented using ?. Float and int missing values are replaced with -1, string missing values are replaced with 'Unknown'.',
    urls=['https://www.openml.org/d/40945'],
    features=FeaturesDict({
        'features': FeaturesDict({
            'age': Tensor(shape=(), dtype=tf.float32),
            'boat': Tensor(shape=(), dtype=tf.string),
            'body': Tensor(shape=(), dtype=tf.int32),
            'cabin': Tensor(shape=(), dtype=tf.string),
            'embarked': ClassLabel(shape=(), dtype=tf.int64, num_classes=4),
            'fare': Tensor(shape=(), dtype=tf.float32),
            'home.dest': Tensor(shape=(), dtype=tf.string),
            'name': Tensor(shape=(), dtype=tf.string),
            'parch': Tensor(shape=(), dtype=tf.int32),
            'pclass': ClassLabel(shape=(),

#### Numpy
Convert your tf.data.Dataset to an iterable of NumPy arrays then convert that into a dataframe

In [7]:
# g is a generator
g = tfds.as_numpy(ds)

In [8]:
df = pd.DataFrame(g)

In [9]:
df.head()

Unnamed: 0,features,survived
0,"{'age': 35.0, 'boat': b'Unknown', 'body': -1, ...",0
1,"{'age': 20.0, 'boat': b'C', 'body': -1, 'cabin...",1
2,"{'age': -1.0, 'boat': b'Unknown', 'body': -1, ...",0
3,"{'age': -1.0, 'boat': b'Unknown', 'body': -1, ...",0
4,"{'age': -1.0, 'boat': b'Unknown', 'body': -1, ...",0
