# Getting Started with TensorFlow 2.0 in 7 Days
## 1.4 Getting Data into TensorFlow

In [1]:
# install tensorflow
!pip install tf-nightly-2.0-preview

Collecting tf-nightly-2.0-preview
[?25l  Downloading https://files.pythonhosted.org/packages/80/92/7ae5e1499112fcaca72d8f6df47ce4206143c2dbd7c7cd1de29305ade060/tf_nightly_2.0_preview-2.0.0.dev20190405-cp36-cp36m-manylinux1_x86_64.whl (96.1MB)
[K    100% |████████████████████████████████| 96.1MB 289kB/s 
Collecting tensorflow-estimator-2.0-preview (from tf-nightly-2.0-preview)
[?25l  Downloading https://files.pythonhosted.org/packages/b1/d8/563f4a419f9db1d7c5b947fbc22d5d51bc2d11a8a1e194a5355858fa8cbf/tensorflow_estimator_2.0_preview-1.14.0.dev2019040900-py2.py3-none-any.whl (356kB)
[K    100% |████████████████████████████████| 358kB 21.0MB/s 
Collecting tb-nightly<1.15.0a0,>=1.14.0a0 (from tf-nightly-2.0-preview)
[?25l  Downloading https://files.pythonhosted.org/packages/5d/17/a3d05a0664c11703259aa79d2b58b871b3bb1fff24153f75db04540489db/tb_nightly-1.14.0a20190319-py3-none-any.whl (3.0MB)
[K    100% |████████████████████████████████| 3.0MB 11.3MB/s 
Collecting google-pasta>=0.1.2 (

In [2]:
!pip install pandas==0.24

Collecting pandas==0.24
[?25l  Downloading https://files.pythonhosted.org/packages/f9/e1/4a63ed31e1b1362d40ce845a5735c717a959bda992669468dae3420af2cd/pandas-0.24.0-cp36-cp36m-manylinux1_x86_64.whl (10.1MB)
[K    100% |████████████████████████████████| 10.1MB 4.1MB/s 
[31mfastai 1.0.51 has requirement numpy>=1.15, but you'll have numpy 1.14.6 which is incompatible.[0m
Installing collected packages: pandas
  Found existing installation: pandas 0.22.0
    Uninstalling pandas-0.22.0:
      Successfully uninstalled pandas-0.22.0
Successfully installed pandas-0.24.0


In [0]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd

In [0]:
file_path = keras.utils.get_file("iris.data", "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data")

In [0]:
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
df = pd.read_csv(file_path, names=column_names)

In [0]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
sepal_length    150 non-null float64
sepal_width     150 non-null float64
petal_length    150 non-null float64
petal_width     150 non-null float64
class           150 non-null object
dtypes: float64(4), object(1)
memory usage: 5.9+ KB


We are interested in the `class` column, which has an `object` type. Let's take a look at some rows.

In [0]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


We can't work with the `class` column as is, and last time we simply dropped it. However, we can't just discard data. What we need to do is implement __one-hot encoding__

In [0]:
df_one_hot = pd.get_dummies(df, prefix=None, columns=['class'])
df_one_hot.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class_Iris-setosa,class_Iris-versicolor,class_Iris-virginica
0,5.1,3.5,1.4,0.2,1,0,0
1,4.9,3.0,1.4,0.2,1,0,0
2,4.7,3.2,1.3,0.2,1,0,0
3,4.6,3.1,1.5,0.2,1,0,0
4,5.0,3.6,1.4,0.2,1,0,0


In [0]:
df_one_hot = pd.get_dummies(df, prefix=None, columns=['class'], drop_first=True)
df_one_hot.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class_Iris-versicolor,class_Iris-virginica
0,5.1,3.5,1.4,0.2,0,0
1,4.9,3.0,1.4,0.2,0,0
2,4.7,3.2,1.3,0.2,0,0
3,4.6,3.1,1.5,0.2,0,0
4,5.0,3.6,1.4,0.2,0,0


In [0]:
df_one_hot.tail()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class_Iris-versicolor,class_Iris-virginica
145,6.7,3.0,5.2,2.3,0,1
146,6.3,2.5,5.0,1.9,0,1
147,6.5,3.0,5.2,2.0,0,1
148,6.2,3.4,5.4,2.3,0,1
149,5.9,3.0,5.1,1.8,0,1
