<a href="https://colab.research.google.com/github/Uzmamushtaque/CSCI4962-Projects-ML-AI/blob/main/Lecture_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lecture 2

## Today's Lecture

1. Data manipulation and Pre-processing (Tensorflow)
2. Vectorization
3. Broadcasting
4. Python numpy and pandas
5. Linear Regression
6. About Homework 1

# Data manipulation

Generally, there are two important things we need to do with data: 
(i) acquire them; and (ii) process them once they are inside the computer. 

[TensorFlow](https://www.tensorflow.org/) is an open-source end-to-end machine learning library for preprocessing data, modelling data and serving models (getting them into the hands of others).

## Introduction to Tensors

If you've ever used [NumPy](https://numpy.org/), tensors are kind of like NumPy arrays.

You can consider of a tensor as a multi-dimensional numerical representation (also referred to as n-dimensional, where n can be any number) of something. Where something can be almost anything you can imagine:

1. It could be numbers themselves (using tensors to represent the price of houses).
2. It could be an image (using tensors to represent the pixels of an image).
3. It could be text (using tensors to represent words).

Or it could be some other form of information (or data) you want to represent with numbers.

The main difference between tensors and NumPy arrays (also an n-dimensional array of numbers) is that tensors can be used on GPUs (graphical processing units) and TPUs (tensor processing units).

The benefit of being able to run on GPUs and TPUs is faster computation, this means, if we wanted to find patterns in the numerical representations of our data, we can generally find them faster using GPUs and TPUs.

Let us get started with Tensors.
The first thing we'll do is import TensorFlow under the common alias tf.

In [1]:
# Import TensorFlow
import tensorflow as tf
print(tf.__version__) # find the version number (should be 2.x+)

2.6.0


A tensor represents a (possibly multi-dimensional) array of numerical values. With one axis, a tensor corresponds (in math) to a vector. With two axes, a tensor corresponds to a matrix. Let us create one tensor and then update its shape:

In [2]:
x = tf.range(12)
x

<tf.Tensor: shape=(12,), dtype=int32, numpy=array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11], dtype=int32)>

We can access a tensor’s shape (the length along each axis) by inspecting its shape property.

In [3]:
x.shape

TensorShape([12])

If we just want to know the total number of elements in a tensor, i.e., the product of all of the shape elements, we can inspect its size. 

In [4]:
tf.size(x)

<tf.Tensor: shape=(), dtype=int32, numpy=12>

To change the shape of a tensor without altering either the number of elements or their values, we can invoke the reshape function.

In [5]:
X = tf.reshape(x, (3, 4))
X

<tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]], dtype=int32)>

Reshaping by manually specifying every dimension is unnecessary. If our target shape is a matrix with shape (height, width), then after we know the width, the height is given implicitly. Try calling x.reshape(-1, 4) or x.reshape(3, -1) for x above. Why do you think you get the result you are getting?

Typically, we will want our matrices initialized either with zeros, ones, some other constants, or numbers randomly sampled from a specific distribution. We can create a tensor representing a tensor with all elements set to 0 and a shape of (2, 3, 4) as follows:

In [7]:
tf.zeros((2, 3, 4))
tf.ones((3,3,4))

<tf.Tensor: shape=(3, 3, 4), dtype=float32, numpy=
array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]], dtype=float32)>

The following snippet creates a tensor with shape (3, 4). Each of its elements is randomly sampled from a standard Gaussian (normal) distribution with a mean of 0 and a standard deviation of 1.

In [8]:
tf.random.normal(shape=[3, 4])

<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[-1.769822  ,  0.5473651 , -1.2090687 ,  0.24317421],
       [-0.6128867 ,  0.70870316,  0.06551764,  0.87915313],
       [ 0.8630595 , -1.271956  ,  0.02810226, -1.2665564 ]],
      dtype=float32)>

In [9]:
# An exact input for a tensor- Python List
tf.constant([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

<tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[2, 1, 4, 3],
       [1, 2, 3, 4],
       [4, 3, 2, 1]], dtype=int32)>