# Getting Started with TensorFlow 2.0 in 7 Days
## 1.4 Getting Data into TensorFlow

In [1]:
# install tensorflow
!pip install tensorflow==2.0.0-beta0



_install pandas library_

In [2]:
!pip install pandas==0.24

Collecting pandas==0.24
[?25l  Downloading https://files.pythonhosted.org/packages/f9/e1/4a63ed31e1b1362d40ce845a5735c717a959bda992669468dae3420af2cd/pandas-0.24.0-cp36-cp36m-manylinux1_x86_64.whl (10.1MB)
[K     |████████████████████████████████| 10.1MB 2.7MB/s 
Installing collected packages: pandas
  Found existing installation: pandas 0.24.2
    Uninstalling pandas-0.24.2:
      Successfully uninstalled pandas-0.24.2
Successfully installed pandas-0.24.0


In [0]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd

If the file we want to work on is hosted remotely, we can fetch it using `keras.utils.get_file`. A reference to the downloaded file is available for use in subsequent functions.

In [4]:
file_path = keras.utils.get_file("iris.data", "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data")

Downloading data from https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data


The file we downloaded does not have headers. We need to label the columns before we read the file. After declaring a list of column names, we proceed to read the file using the pandas function read_csv.

In [0]:
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
df = pd.read_csv(file_path, names=column_names)

The `info` method will give us a summary of what was read into our dataframe.

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
sepal_length    150 non-null float64
sepal_width     150 non-null float64
petal_length    150 non-null float64
petal_width     150 non-null float64
class           150 non-null object
dtypes: float64(4), object(1)
memory usage: 5.9+ KB


We can inspect our dataframe by asking for some records.

In [7]:
df.head(n=10)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


In [8]:
print(type(df))

<class 'pandas.core.frame.DataFrame'>


In [9]:
x = df[column_names[:-1]]
print(type(x))

<class 'pandas.core.frame.DataFrame'>


In [10]:
x.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


We can convert the dataframe to a numpy array.

In [11]:
np_array = x[:5].to_numpy()
print(np_array)

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]


In [12]:
tensor = tf.multiply(np_array, 1.0)
print(tensor)

tf.Tensor(
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]], shape=(5, 4), dtype=float64)


In [0]:
?pd.read_csv()

pd.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, skip_footer=0, doublequote=True, delim_whitespace=False, as_recarray=None, compact_ints=None, use_unsigned=None, low_memory=True, buffer_lines=None, memory_map=False, float_precision=None)