# Chapter 3
# Understanding TensorFlow Basics

---

In [0]:
import tensorflow as tf
import numpy as np

## Computation Graphs

TensorFlow allows us to implement machine learning algorithms by creating and
computing operations that interact with one another. These interactions form what
we call a “computation graph,” with which we can intuitively represent complicated
functional architectures.

### What is a Computation Graph

A graph refers to a set of interconnected
entities, commonly called nodes or vertices. These nodes are connected to each
other via edges. In a dataflow graph, the edges allow data to “flow” from one node to
another in a directed manner.

### The Benefits of Graph Computations

TensorFlow optimizes its computations based on the graph’s connectivity. Each graph
has its own set of node dependencies. When the input of node y is affected by the
output of node x, we say that node y is dependent on node x. We call it a direct
dependency when the two are connected via an edge, and an indirect dependency
otherwise.

We can always identify the full set of dependencies for each node in the graph. This is
a fundamental characteristic of the graph-based computation format. Being able to
locate dependencies between units of our model allows us to both distribute computations
across available resources and avoid performing redundant computations of
irrelevant subsets, resulting in a faster and more efficient way of computing things.

---

## Graphs, Sessions, and Fetches

Roughly speaking, working with TensorFlow involves two main phases: (1) constructing
a graph and (2) executing it.

### Creating a Graph

Right after we import TensorFlow (with import tensorflow as tf), a specific
empty default graph is formed. All the nodes we create are automatically associated
with that default graph.

Using the tf.\<operator\> methods, we will create six nodes assigned to arbitrarily
named variables. The contents of these variables should be regarded as the output of
the operations, and not the operations themselves. For now we refer to both the operations
and their outputs with the names of their corresponding variables.

In [0]:
a = tf.constant(9)
b = tf.constant(6)
c = tf.constant(3)

d = tf.multiply(a, b)
e = tf.add(c, b)
f = tf.subtract(d, e)

with tf.Session() as sess:
    print(sess.run(f))

45


First, we launch the graph in a tf.Session. A Session object is the part of the TensorFlow
API that communicates between Python objects and data on our end, and
the actual computational system where memory is allocated for the objects we define,
intermediate variables are stored, and finally results are fetched for us.

The execution itself is then done with the .run() method of the Session
object. When called, this method completes one set of computations in our graph in
the following manner: it starts at the requested output(s) and then works backward,
computing nodes that must be executed according to the set of dependencies. Therefore,
the part of the graph that will be computed depends on our output query.

### Constructing and Managing Our Graph

As mentioned, as soon as we import TensorFlow, a default graph is automatically created
for us. We can create additional graphs and control their association with some
given operations. tf.Graph() creates a new graph, represented as a TensorFlow
object. In this example we create another graph and assign it to the variable g:

In [0]:
print(tf.get_default_graph())

g = tf.Graph()
print(g)

<tensorflow.python.framework.ops.Graph object at 0x7f800312e9b0>
<tensorflow.python.framework.ops.Graph object at 0x7f800318c550>


At this point we have two graphs: the default graph and the empty graph in g. Both
are revealed as TensorFlow objects when printed. Since g hasn’t been assigned as the
default graph, any operation we create will not be associated with it, but rather with
the default one.

In [0]:
g = tf.Graph()
a = tf.constant(3)

print(a.graph is g)
print(a.graph is tf.get_default_graph())

False
True


To make sure our constructed nodes are associated with the right graph we can construct
them using a very useful Python construct: the with statement.

We use the with statement together with the as_default() command, which returns
a context manager that makes this graph the default one. This comes in handy when
working with multiple graphs:

In [0]:
g1 = tf.get_default_graph()
g2 = tf.Graph()

print(g1 is tf.get_default_graph())

with g2.as_default():
    print(g2 is tf.get_default_graph())

print(g2 is tf.get_default_graph())

True
True
False


### Fetches

In our initial graph example, we request one specific node (node f) by passing the
variable it was assigned to as an argument to the sess.run() method. This argument
is called fetches, corresponding to the elements of the graph we wish to compute.
We can also ask sess.run() for multiple nodes’ outputs simply by inputting a
list of requested nodes:

In [0]:
with tf.Session() as sess:
    fetches = [a, b, c, d, e, f]
    output = sess.run(fetches)

print(output)

[3, 6, 3, 54, 9, 45]


We mentioned that TensorFlow computes only the essential nodes according to the
set of dependencies. This is also manifested in our example: when we ask for the output
of node d, only the outputs of nodes a and b are computed. This is a great advantage of TensorFlow—it doesn’t matter
how big and complicated our graph is as a whole, since we can run just a small portion
of it as needed.


---

## Flowing Tensors

### Nodes are Operations, Edges are Tensor Objects

TensorFlow is designed such that first a skeleton graph is created with all of its components.
At this point no actual data flows in it and no computations take place. It is
only upon execution, when we run the session, that data enters the graph and computations occur. This way, computations can be much more
efficient, taking the entire graph structure into consideration.

tf.constant() created a node with the corresponding
passed value. Printing the output of the constructor, we see that it’s actually
a Tensor object instance. These objects have methods and attributes that control their
behavior and that can be defined upon creation.

In [0]:
c = tf.constant(9.0)
print(c)

Tensor("Const:0", shape=(), dtype=float32)


Each Tensor object in TensorFlow has attributes such as name, shape, and dtype that
help identify and set the characteristics of that object. These attributes are optional when creating a node, and are set automatically by TensorFlow when missing. In the
next section we will take a look at these attributes. We will do so by looking at Tensor
objects created by ops known as source operations. Source operations are operations
that create data, usually without using any previously processed inputs. With these
operations we can create scalars, as we already encountered with the tf.constant()
method, as well as arrays and other types of data.

### Data Types

The basic units of data that pass through a graph are numerical, Boolean, or string
elements. When we print out the Tensor object c from our last code example, we see
that its data type is a floating-point number. Since we didn’t specify the type of data,
TensorFlow inferred it automatically. For example 5 is regarded as an integer, while
anything with a decimal point, like 5.1, is regarded as a floating-point number.

In [0]:
# we can set data type by specifying attribute dtype
c = tf.constant(9, dtype=tf.float32)
print(c)
print(c.name)
print(c.shape)
print(c.dtype)

Tensor("Const_1:0", shape=(), dtype=float32)
Const_1:0
()
<dtype: 'float32'>


#### Casting

It is important to make sure our data types match throughout the graph—performing
an operation with two nonmatching data types will result in an exception. To change
the data type setting of a Tensor object, we can use the tf.cast() operation, passing
the relevant Tensor and the new data type of interest as the first and second arguments,
respectively:

In [0]:
x = tf.constant([1, 2, 3], name='x', dtype=tf.float32)
print(x.name, x.shape, x.dtype)
x = tf.cast(x, dtype=tf.int64)
print(x.name, x.shape, x.dtype)

x:0 (3,) <dtype: 'float32'>
Cast:0 (3,) <dtype: 'int64'>


### Tensor Arrays and Shapes

In [0]:
c = tf.constant(np.random.randint(0, 10, size=(2, 2)))
print(c.shape)
c = tf.constant(np.random.randint(0, 10, size=(3, 3, 3)))
print(c.get_shape())

(2, 2)
(3, 3, 3)


Random-number generators have special importance as they are used in many cases
to create the initial values for TensorFlow Variables, which will be introduced
shortly. For example, we can generate random numbers from a normal distribution
using tf.random.normal(), passing the shape, mean, and standard deviation as the
first, second, and third arguments, respectively. Another two examples for useful random
initializers are the truncated normal that, as its name implies, cuts off all values
below and above two standard deviations from the mean, and the uniform initializer
that samples values uniformly within some interval [a,b).

In [0]:
seq = tf.linspace(0.0, 10.0, 10, name='seq')

with tf.Session() as sess:
    print(sess.run(seq))

[ 0.         1.1111112  2.2222223  3.3333335  4.4444447  5.555556
  6.666667   7.777778   8.888889  10.       ]


A feature that is convenient to use when we want to explore the data content of an
object is tf.InteractiveSession(). Using it and the .eval() method, we can get a
full look at the values without the need to constantly refer to the session object. tf.InteractiveSession() allows you to replace the usual tf.Ses
sion(), so that you don’t need a variable holding the session for
running ops. This can be useful in interactive Python environments,
like when writing IPython notebooks, for instance.

In [0]:
sess = tf.InteractiveSession()
seq = tf.random.normal(shape=(2, 2), seed=42)
print(seq.eval())
sess.close()

[[-0.28077507 -0.1377521 ]
 [-0.6763296   0.02458041]]




#### Matrix multiplication

In [0]:
A = tf.constant([[1, 2, 3],
                 [4, 5, 6]])
print(A.get_shape())
x = tf.constant([1, 0, 1])
print(x.get_shape())

(2, 3)
(3,)


We cannot just use tf.matmul to A and x. Before that, we have to transform x which is a 1D vector to a 2D single-column matrix

In [0]:
with tf.Session() as sess:
    print(sess.run(A))
    print(sess.run(x))

[[1 2 3]
 [4 5 6]]
[1 0 1]


In [0]:
x = tf.expand_dims(x, 1)

with tf.Session()as sess:
    print(sess.run(x))

[[1]
 [0]
 [1]]


In [0]:
x = tf.constant([1, 0, 1])
x = tf.expand_dims(x, 0)

with tf.Session() as sess:
    print(sess.run(x))

[[1 0 1]]


In [0]:
A = tf.random.truncated_normal(shape=(3, 3), seed=42)
x = tf.random.normal(shape=(3, 1), seed=42)

with tf.Session() as sess:
    print(sess.run(A))
    print(sess.run(x))
    print(sess.run(tf.matmul(A, x)))

[[-0.28077507 -0.1377521  -0.6763296 ]
 [ 0.02458041 -0.46845472 -0.00246632]
 [-0.9745911   0.6638492   0.4368011 ]]
[[-0.28077507]
 [-0.1377521 ]
 [-0.6763296 ]]
[[ 1.6274279 ]
 [-0.46013314]
 [ 1.7178389 ]]


### Names

Each Tensor object also has an identifying name. This name is an intrinsic string
name, not to be confused with the name of the variable. As with dtype, we can use
the .name attribute to see the name of the object:

In [0]:
with tf.Graph().as_default():
    c1 = tf.constant(4, dtype=tf.float64, name='c')
    c2 = tf.constant(4, dtype=tf.int32, name='c')
print(c1.name)
print(c2.name)

c:0
c_1:0


In [0]:
c1 = tf.constant(4, dtype=tf.float64, name='c')
c2 = tf.constant(4, dtype=tf.int32, name='c')
print(c1.name)
print(c2.name)

c:0
c_1:0


The name of the Tensor object is simply the name of its corresponding operation (“c”;
concatenated with a colon), followed by the index of that tensor in the outputs of the
operation that produced it—it is possible to have more than one. Objects residing within the same graph cannot have the same name
—TensorFlow forbids it. As a consequence, it will automatically
add an underscore and a number to distinguish the two. Of course,
both objects can have the same name when they are associated with
different graphs.

#### Name scopes

Sometimes when dealing with a large, complicated graph, we would like to create
some node grouping to make it easier to follow and manage. For that we can hierarchically
group nodes together by name. We do so by using tf.name_scope("pre
fix") together with the useful with clause again:

In [0]:
with tf.Graph().as_default():
    c1 = tf.constant(4, dtype=tf.float64, name='c')
    with tf.name_scope("prefix_name"):
        c2 = tf.constant(4, dtype=tf.int32, name='c')
        c3 = tf.constant(4, dtype=tf.float64, name='c')

print(c1.name)
print(c2.name)
print(c3.name)

c:0
prefix_name/c:0
prefix_name/c_1:0


In this example we’ve grouped objects contained in variables c2 and c3 under the
scope prefix_name, which shows up as a prefix in their names.
Prefixes are especially useful when we would like to divide a graph into subgraphs
with some semantic meaning. These parts can later be used, for instance, for visualization
of the graph structure.

In [0]:
g = tf.Graph()

with g.as_default():
    c1 = tf.constant(5, dtype=tf.float32, name='c')
    with tf.name_scope("layer_1"):
        c2 = tf.constant(9, dtype=tf.int32, name='c')
        c3 = tf.constant(7, dtype=tf.int32, name='c')
    with tf.name_scope("layer_2"):
        c4 = tf.constant(1, dtype=tf.int32, name='c')
        c5 = tf.constant(3, dtype=tf.int32, name='c')

print(c1.graph is g)
print()

print(c1.name)
print(c2.name)
print(c3.name)
print(c4.name)
print(c5.name)


True

c:0
layer_1/c:0
layer_1/c_1:0
layer_2/c:0
layer_2/c_1:0


---

## Variables, Placeholders, and Simple Optimization

### Variables

The optimization process serves to tune the parameters of some given model. For
that purpose, TensorFlow uses special objects called Variables. Unlike other Tensor objects that are “refilled” with data each time we run the session, Variables can maintain
a fixed state in the graph. This is important because their current state might
influence how they change in the following iteration. Like other Tensors, Variables
can be used as input for other operations in the graph.

Using Variables is done in two stages. First we call the tf.Variable() function in
order to create a Variable and define what value it will be initialized with. We then
have to explicitly perform an initialization operation by running the session with the
tf.global_variables_initializer() method, which allocates the memory for the
Variable and sets its initial values.

In [0]:
init_val = tf.random_normal((1, 5), 0.1)
var = tf.Variable(init_val, name='var')
print("pre run: \n{}".format(var))

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    post_var = sess.run(var)

print("\npost run: \n{}".format(post_var))

pre run: 
<tf.Variable 'var_3:0' shape=(1, 5) dtype=float32_ref>

post run: 
[[ 0.37189773 -0.44705704  2.4359581  -2.0406601  -0.2951817 ]]


Note that if we run the code again, we see that a new variable is created each time, as
indicated by the automatic concatenation of _1 to its name:

This could be very inefficient when we want to reuse the model (complex models
could have many variables!); for example, when we wish to feed it with several different
inputs. To reuse the same variable, we can use the tf.get_variables() function
instead of tf.Variable().

### Placeholders

So far we’ve used source operations to create our input data. TensorFlow, however,
has designated built-in structures for feeding input values. These structures are called
placeholders. Placeholders can be thought of as empty Variables that will be filled with data later on. We use them by first constructing our graph and only when it is executed
feeding them with the input data.

Placeholders have an optional shape argument. If a shape is not fed or is passed as
None, then the placeholder can be fed with data of any size. It is common to use
None for the dimension of a matrix that corresponds to the number of samples (usually
rows), while having the length of the features (usually columns) fixed:

In [0]:
# ph = tf.placeholder(tf.float32, shape=(None, 10))
# None in the 0-axis means that we do not specify 
# the number of samples 

Whenever we define a placeholder, we must feed it with some input values or else an
exception will be thrown. The input data is passed to the session.run() method as a
dictionary, where each key corresponds to a placeholder variable name, and the
matching values are the data values given in the form of a list or a NumPy array:


In [0]:
# sess.run(s, feed_dict={x : X_data, w: w_data})

Let’s see how it looks with another graph example, this time with placeholders for two
inputs: a matrix x and a vector w. These inputs are matrix-multiplied to create a fiveunit
vector xw and added with a constant vector b filled with the value -1. Finally, the
variable s takes the maximum value of that vector by using the tf.reduce_max()
operation. The word reduce is used because we are reducing a five-unit vector to a
single scalar:

In [0]:
x_data = np.random.randn(5, 10)
w_data = np.random.randn(10, 1)

with tf.Graph().as_default():
    x = tf.placeholder(tf.float32, shape=(5, 10))
    w = tf.placeholder(tf.float32, shape=(10, 1))
    b = tf.fill((5, 1), -1.)
    xw = tf.matmul(x, w)

    xwb = xw + b
    s = tf.reduce_max(xwb)
    with tf.Session() as sess:
        outs = sess.run(s, feed_dict={x: x_data, w: w_data})

print("Outs: {}".format(outs))

Outs: 7.539361000061035


### Optimization

#### Training to predict

We have some target variable y, which we want to explain using some feature vector
x. To do so, we first choose a model that relates the two. Our training data points will
be used for “tuning” the model so that it best captures the desired relation. In the following
chapters we focus on deep neural network models, but for now we will settle
for a simple regression problem.

Let’s start by describing our regression model:

$f(x_i) = w^Tx_i + b$

$y_i = f(x_i) + \epsilon_i$

f(xi) is assumed to be a linear combination of some input data xi, with a set of
weights w and an intercept b. Our target output yi is a noisy version of f(xi) after being
summed with Gaussian noise εi (where i denotes a given sample).

In [0]:
# x = tf.placeholder(tf.float32, shape=[None, 3])
# y_true = tf.placeholder(tf.float32, shape=None)
# w = tf.Variable([[0, 0, 0]], dtype=tf.float32, name='weights')
# b = tf.Variable(0, dtype=tf.float32, name='bias')

Once the placeholders and Variables are defined, we can write down our model. In
this example, it’s simply a multivariate linear regression—our predicted output
y_pred is the result of a matrix multiplication of our input container x and our
weights w plus a bias term b:

In [0]:
# y_pred = tf.matmul(w, tf.transpose(x)) + b

#### Defining a loss function

Next, we need a good measure with which we can evaluate the model’s performance.
To capture the discrepancy between our model’s predictions and the observed targets,
we need a measure reflecting “distance.” This distance is often referred to as an
objective or a loss function, and we optimize the model by finding the set of parameters
(weights and bias in this case) that minimize it.


There is no ideal loss function, and choosing the most suitable one is often a blend of
art and science. The choice may depend on several factors, like the assumptions of
our model, how easy it is to minimize, and what types of mistakes we prefer to avoid.

#### MSE and cross entropy

$L(y, \hat{y}) = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2$

In our linear regression example, we take the difference between the vector y_true
(y), the true targets, and y_pred (ŷ), the model’s predictions, and use tf.square() to
compute the square of the difference vector. This operation is applied element-wise.
We then average the squared differences using the tf.reduce_mean() function:

In [0]:
# loss = tf.reduce_mean(tf.square(y_true - y_pred))

Another very common loss, especially for categorical data, is the cross entropy, which
we used in the softmax classifier in the previous chapter. The cross entropy is given
by

$H(p, q) = -\sum_x p(x)log\:q(x)$

and for classification with a single correct label (as is the case in an overwhelming
majority of the cases) reduces to the negative log of the probability placed by the classifier
on the correct label.

In [0]:
# loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true, logits=y_pred)
# loss = tf.reduce_mean(loss)

#### The gradient descent optimizer

The next thing we need to figure out is how to minimize the loss function. While in
some cases it is possible to find the global minimum analytically (when it exists), in
the great majority of cases we will have to use an optimization algorithm. Optimizers
update the set of weights iteratively in a way that decreases the loss over time.

The most commonly used approach is gradient descent, where we use the loss’s gradient
with respect to the set of weights. In slightly more technical terms, if our loss is
some multivariate function F(w̄), then in the neighborhood of some point w ̄0
, the
“steepest” direction of decrease of F(w̄) is obtained by moving from w ̄0
in the direction
of the negative gradient of F at w ̄0.

So if w ̄1
= w ̄0
-γ∇F(w ̄0
) where ∇F(w ̄0
) is the gradient of F evaluated at w ̄0
, then for a
small enough γ:

The gradient descent algorithms work well on highly complicated network architectures
and therefore are suitable for a wide variety of problems. More specifically,
recent advances make it possible to compute these gradients by utilizing massively
parallel systems, so the approach scales well with dimensionality (though it can still
be painfully time-consuming for large real-world problems). While convergence to
the global minimum is guaranteed for convex functions, for nonconvex problems
(which are essentially all problems in the world of deep learning) they can get stuck
in local minima. In practice, this is often good enough, as is evidenced by the huge
success of the field of deep learning.

#### Sampling methods

The gradient of the objective is computed with respect to the model parameters and
evaluated using a given set of input samples, xs. How many of the samples should we
take for this calculation? Intuitively, it makes sense to calculate the gradient for the
entire set of samples in order to benefit from the maximum amount of available
information. This method, however, has some shortcomings. For example, it can be
very slow and is intractable when the dataset requires more memory than is available.

A more popular technique is the stochastic gradient descent (SGD), where instead of
feeding the entire dataset to the algorithm for the computation of each step, a subset
of the data is sampled sequentially. The number of samples ranges from one sample at
a time to a few hundred, but the most common sizes are between around 50 to
around 500 (usually referred to as mini-batches).

Using smaller batches usually works faster, and the smaller the size of the batch, the
faster are the calculations. However, there is a trade-off in that small samples lead to
lower hardware utilization and tend to have high variance, causing large fluctuations
to the objective function. Nevertheless, it turns out that some fluctuations are beneficial
since they enable the set of parameters to jump to new and potentially better local
minima. Using a relatively smaller batch size is therefore effective in that regard, and
is currently overall the preferred approach.

#### Gradient descent in TensorFlow

TensorFlow makes it very easy and intuitive to use gradient descent algorithms. Optimizers
in TensorFlow compute the gradients simply by adding new operations to the
graph, and the gradients are calculated using automatic differentiation. This means,
in general terms, that TensorFlow automatically computes the gradients on its own,
“deriving” them from the operations and structure of the computation graph.

An important parameter to set is the algorithm’s learning rate, determining how
aggressive each update iteration will be (or in other words, how large the step will be
in the direction of the negative gradient). We want the decrease in the loss to be fast
enough on the one hand, but on the other hand not large enough so that we overshoot
the target and end up at a point with a higher value of the loss function.

We first create an optimizer by using the GradientDescentOptimizer() function
with the desired learning rate. We then create a train operation that updates our variables
by calling the optimizer.minimize() function and passing in the loss as an
argument:

In [0]:
# optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# train = optimizer.minimize(loss)

#### Wrapping it up with examples

##### Example 1: Linear Regression

In [0]:
# === Create data and simulate results ===
x_data = np.random.randn(2000, 3)
w_real = [0.3, 0.5, 0.1]
b_real = -0.2

noise = np.random.randn(1, 2000) * 0.1
y_data = np.matmul(w_real, x_data.T) + b_real + noise

Next, we estimate our set of weights w and bias b by optimizing the model (i.e., finding
the best parameters) so that its predictions match the real targets as closely as
possible. Each iteration computes one update to the current parameters. In this example
we run 10 iterations, printing our estimated parameters every 5 iterations using
the sess.run() method.

Don’t forget to initialize the variables! In this example we initialize both the weights
and the bias with zeros; however, there are “smarter” initialization techniques to
choose, as we will see in the next chapters. We use name scopes to group together
parts that are related to inferring the output, defining the loss, and setting and creating
the train object:

In [0]:
NUM_STEPS = 10

g = tf.Graph() # building specific graph
wb_ = [] # weights
with g.as_default():
    x = tf.placeholder(tf.float32, shape=[None, 3]) # input
    y_true = tf.placeholder(tf.float32, shape=None) # input

    with tf.name_scope('inference') as scope:
        w = tf.Variable([[0, 0, 0]], dtype=tf.float32, name='weights')
        b = tf.Variable(0, dtype=tf.float32, name='bias')
        y_pred = tf.matmul(w, tf.transpose(x)) + b

    with tf.name_scope('loss') as scope:
        loss = tf.reduce_mean(tf.square(y_true - y_pred))

    with tf.name_scope('train') as scope:
        learning_rate = 0.5
        optimizer = tf.train.GradientDescentOptimizer(learning_rate)
        train = optimizer.minimize(loss)

    # before starting, initialize the variables
    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        sess.run(init)
        for step in range(NUM_STEPS):
            sess.run(train, feed_dict={x: x_data, y_true: y_data})
            if (step%5 == 0):
                print(step, sess.run([w, b]))
                wb_.append(sess.run([w, b]))

        print(10, sess.run([w, b]))

0 [array([[0.3082251 , 0.5072045 , 0.09721694]], dtype=float32), -0.1795402]
5 [array([[0.30226943, 0.498831  , 0.09799903]], dtype=float32), -0.1993168]
10 [array([[0.30226943, 0.498831  , 0.09799903]], dtype=float32), -0.19931678]


##### Example 2: Logistic Regression

In [0]:
N = 20000

def sigmoid(x):
    return 1 / (x + np.exp(-x))

# ===== Create data =====
x_data = np.random.randn(N, 3)
w_real = [0.3, 0.5, 0.1]
b_real = -0.2
wxb = np.matmul(w_real, x_data.T) + b_real

y_data_pre_noise = sigmoid(wxb)
y_data = np.random.binomial(1, y_data_pre_noise)

In [0]:
NUM_STEPS = 50


with 

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(inint)
    for i in range(NUM_STEPS):
        sess.run(train)

(3,)

In [0]:
NUM_STEPS = 50

g = tf.Graph() # building specific graph
wb_ = [] # weights
with g.as_default():
    x = tf.placeholder(tf.float32, shape=[None, 3]) # input
    y_true = tf.placeholder(tf.float32, shape=None) # input

    with tf.name_scope('inference') as scope:
        w = tf.Variable([[0, 0, 0]], dtype=tf.float32, name='weights')
        b = tf.Variable(0, dtype=tf.float32, name='bias')
        y_pred = tf.matmul(w, tf.transpose(x)) + b

    with tf.name_scope('loss') as scope:
        loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true, logits=y_pred)
        loss = tf.reduce_mean(loss)

    with tf.name_scope('train') as scope:
        learning_rate = 0.5
        optimizer = tf.train.GradientDescentOptimizer(learning_rate)
        train = optimizer.minimize(loss)

    # before starting, initialize the variables
    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        sess.run(init)
        for step in range(NUM_STEPS):
            sess.run(train, feed_dict={x: x_data, y_true: y_data})
            if (step%5 == 0):
                print(step, sess.run([w, b]))
                wb_.append(sess.run([w, b]))

        print(50, sess.run([w, b]))

0 [array([[0.02830815, 0.04464366, 0.00892977]], dtype=float32), 0.17637569]
5 [array([[0.12645872, 0.20251669, 0.03831439]], dtype=float32), 0.7850605]
10 [array([[0.18835798, 0.30481362, 0.05564939]], dtype=float32), 1.1303365]
15 [array([[0.23494191, 0.38290447, 0.0683545 ]], dtype=float32), 1.3478993]
20 [array([[0.27262336, 0.4465248 , 0.07856297]], dtype=float32), 1.4956505]
25 [array([[0.303922  , 0.49957046, 0.08705515]], dtype=float32), 1.6015068]
30 [array([[0.33017766, 0.5441667 , 0.09421503]], dtype=float32), 1.6804487]
35 [array([[0.35229072, 0.5817769 , 0.10028522]], dtype=float32), 1.7411702]
40 [array([[0.37095195, 0.61354333, 0.10544523]], dtype=float32), 1.7890242]
45 [array([[0.38672286, 0.64040506, 0.10983869]], dtype=float32), 1.8274659]
50 [array([[0.39757478, 0.65889513, 0.11288211]], dtype=float32), 1.8530277]
