<a href="https://colab.research.google.com/github/LastCodeBender42/TensorFlow-Certification-Prep/blob/main/02_General_Topics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **01 Activation Functions in Neural Networks**

An activation function in a neural network defines the output of a neuron given an input or set of inputs. It introduces non-linearity into the model, enabling it to learn and represent complex patterns in data. Without activation functions, a neural network would behave like a simple linear regression model, regardless of the number of layers.

### Common Activation Functions

1. **Sigmoid Function**:
   \[
   \sigma(x) = \frac{1}{1 + e^{-x}}
   \]
   - **Range**: (0, 1)
   - **Usage**: Often used in binary classification problems.
   - **Example**:
     ```python
     import tensorflow as tf
     x = tf.constant([-1.0, 0.0, 1.0])
     sigmoid = tf.nn.sigmoid(x)
     print(sigmoid.numpy())  # Output: [0.26894142, 0.5, 0.7310586]
     ```

2. **Tanh (Hyperbolic Tangent) Function**:
   \[
   \tanh(x) = \frac{2}{1 + e^{-2x}} - 1
   \]
   - **Range**: (-1, 1)
   - **Usage**: Commonly used in hidden layers of neural networks.
   - **Example**:
     ```python
     import tensorflow as tf
     x = tf.constant([-1.0, 0.0, 1.0])
     tanh = tf.nn.tanh(x)
     print(tanh.numpy())  # Output: [-0.7615942, 0.0, 0.7615942]
     ```

3. **ReLU (Rectified Linear Unit) Function**:
   \[
   \text{ReLU}(x) = \max(0, x)
   \]
   - **Range**: [0, ∞)
   - **Usage**: Widely used in hidden layers of neural networks due to its simplicity and effectiveness.
   - **Example**:
     ```python
     import tensorflow as tf
     x = tf.constant([-1.0, 0.0, 1.0])
     relu = tf.nn.relu(x)
     print(relu.numpy())  # Output: [0.0, 0.0, 1.0]
     ```

4. **Leaky ReLU Function**:
   \[
   \text{Leaky ReLU}(x) =
   \begin{cases}
   x & \text{if } x > 0 \\
   \alpha x & \text{if } x \leq 0
   \end{cases}
   \]
   - **Range**: (-∞, ∞)
   - **Usage**: Used to solve the "dying ReLU" problem by allowing a small, non-zero gradient when the unit is not active.
   - **Example**:
     ```python
     import tensorflow as tf
     x = tf.constant([-1.0, 0.0, 1.0])
     leaky_relu = tf.nn.leaky_relu(x, alpha=0.1)
     print(leaky_relu.numpy())  # Output: [-0.1, 0.0, 1.0]
     ```

### Importance of Activation Functions

Activation functions introduce non-linearity into the neural network, which allows it to learn and represent more complex patterns. Without non-linear activation functions, a neural network, regardless of its depth, would be equivalent to a single-layer perceptron, which can only model linear relationships.

Each activation function has its own characteristics and is chosen based on the specific requirements of the model and the nature of the data. For instance, ReLU is popular for deep networks because it helps mitigate the vanishing gradient problem, while sigmoid and tanh are often used in the output layers of binary and multi-class classification problems, respectively.


Let's consider a scenario where you are working with gene expression data to build a neural network model for classifying types of cancer based on the expression levels of various genes.

### Example: Using Activation Functions with Gene Expression Data

#### 1. Preparing the Data
Suppose you have a dataset where each row represents a sample, each column represents a gene, and the target variable indicates the type of cancer. Here’s a simplified example:

- **Features**: Gene expression levels (e.g., `gene1`, `gene2`, ..., `geneN`)
- **Target**: Cancer type (e.g., 0 for type A, 1 for type B)

#### 2. Building the Neural Network
We'll build a neural network model using TensorFlow, where activation functions play a crucial role in transforming the input gene expression data through the network layers.

```python
import tensorflow as tf
from tensorflow.keras import layers, models

# Example gene expression data (for illustration purposes)
import numpy as np
np.random.seed(42)
num_samples = 100
num_genes = 50
X = np.random.rand(num_samples, num_genes)  # Gene expression levels
y = np.random.randint(2, size=num_samples)  # Binary cancer type labels (0 or 1)

# Splitting the data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Building the neural network
model = models.Sequential([
    layers.Dense(64, activation='relu', input_shape=(num_genes,)),
    layers.Dense(32, activation='relu'),
    layers.Dense(1, activation='sigmoid')  # Using sigmoid for binary classification
])

# Compiling the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Training the model
model.fit(X_train, y_train, epochs=10, batch_size=16, validation_split=0.1)

# Evaluating the model
loss, accuracy = model.evaluate(X_test, y_test)
print("Test Accuracy:", accuracy)
```

### Explanation of Activation Functions Used

1. **Input Layer**:
   - The input layer receives the gene expression levels of each sample. These are numerical values representing the expression levels of each gene.

2. **Hidden Layers**:
   - **Dense Layer with ReLU Activation**: The first hidden layer has 64 units with ReLU activation. ReLU (`tf.nn.relu`) transforms the input values by setting all negative values to 0, introducing non-linearity to help the model learn complex patterns.
   - **Dense Layer with ReLU Activation**: The second hidden layer has 32 units with ReLU activation, further transforming the data.

3. **Output Layer**:
   - **Dense Layer with Sigmoid Activation**: The output layer has 1 unit with sigmoid activation (`tf.nn.sigmoid`). The sigmoid function maps the output to a range between 0 and 1, which is suitable for binary classification tasks. It outputs the probability of the sample belonging to class 1 (e.g., cancer type B).

### Importance in Gene Expression Data

- **Non-linearity**: Activation functions like ReLU introduce non-linearity, which allows the neural network to model complex relationships between gene expression levels and cancer types.
- **Probability Output**: The sigmoid activation in the output layer is crucial for binary classification tasks, providing a probability score that can be interpreted as the likelihood of a sample belonging to a specific cancer type.

### Summary

Activation functions play a vital role in transforming gene expression data through the layers of a neural network. ReLU is typically used in hidden layers to handle non-linearities, while sigmoid (or softmax for multi-class classification) is used in the output layer to produce probability scores for classification tasks. This example demonstrates how you can build and train a neural network to classify cancer types based on gene expression data, leveraging activation functions to enhance the model's learning capabilities.

# **02 Tensor Gradients**

The gradient output `[<tf.Tensor: shape=(), dtype=float32, numpy=6.0>, <tf.Tensor: shape=(), dtype=float32, numpy=8.0>]` indicates the gradients of a loss function with respect to two variables (or tensors). Each element in the list is a tensor representing the gradient of the loss with respect to one of the variables.

### Understanding the Gradient Output

1. **Shape**: `shape=()` indicates that the gradient is a scalar (0-dimensional tensor). This is common when the variables being differentiated are scalars themselves.
2. **dtype**: `dtype=float32` indicates the data type of the gradient values, which in this case is a 32-bit floating-point number.
3. **numpy**: This provides the actual numeric value of the gradient. In this case, `numpy=6.0` and `numpy=8.0`.

### Context

To understand this better, let's look at an example where such a gradient output might be generated.

### Example: Computing Gradients

Let's say we have a simple function \( f(x, y) = 3x^2 + 4y \), and we want to compute the gradients with respect to \( x \) and \( y \).

```python
import tensorflow as tf

# Define the variables
x = tf.Variable(1.0)
y = tf.Variable(2.0)

# Define the function
with tf.GradientTape() as tape:
    f = 3 * x**2 + 4 * y

# Compute the gradients
gradients = tape.gradient(f, [x, y])

# Print the gradients
print(gradients)
```

### Output
```
[<tf.Tensor: shape=(), dtype=float32, numpy=6.0>, <tf.Tensor: shape=(), dtype=float32, numpy=8.0>]
```

### Explanation

- The function \( f(x, y) = 3x^2 + 4y \) is defined.
- Using `tf.GradientTape`, TensorFlow tracks the computation to compute the gradients.
- `tape.gradient(f, [x, y])` computes the gradients of \( f \) with respect to `x` and `y`.

The gradients are calculated as follows:
- The partial derivative of \( f \) with respect to \( x \): \( \frac{\partial f}{\partial x} = 6x \). For \( x = 1 \), this is \( 6 \times 1 = 6.0 \).
- The partial derivative of \( f \) with respect to \( y \): \( \frac{\partial f}{\partial y} = 4 \). This is a constant 4, so regardless of \( y \), it is \( 4.0 \).

Therefore, the gradient output `[6.0, 4.0]` matches the computed gradients:
- `6.0` is the gradient of \( f \) with respect to \( x \).
- `4.0` is the gradient of \( f \) with respect to \( y \).

### Summary
The gradient output `[<tf.Tensor: shape=(), dtype=float32, numpy=6.0>, <tf.Tensor: shape=(), dtype=float32, numpy=8.0>]` means that:
- The gradient of the loss (or function being differentiated) with respect to the first variable is `6.0`.
- The gradient of the loss with respect to the second variable is `8.0`.
These gradients indicate how much the loss will change with a small change in each variable, guiding the optimization process in machine learning models.

# **03 One-hot Encoding**

One-hot encoding is a technique used to convert categorical data into a numerical format that can be used by machine learning algorithms. It is particularly useful when you have categorical variables with no intrinsic order (i.e., nominal data). In one-hot encoding, each category is represented by a binary vector where only one element is `1` (hot) and all other elements are `0`.

### Example

Suppose you have a categorical variable with three categories: "red," "green," and "blue." Here’s how one-hot encoding would transform this data:

1. **Categories**: ["red", "green", "blue"]
2. **One-Hot Encoded Representation**:
   - "red" -> `[1, 0, 0]`
   - "green" -> `[0, 1, 0]`
   - "blue" -> `[0, 0, 1]`

### Why Use One-Hot Encoding?

- **Machine Learning Algorithms**: Many machine learning algorithms cannot work with categorical data directly and require numerical input.
- **No Ordinal Relationship**: One-hot encoding is appropriate when there is no ordinal relationship between categories (e.g., colors, countries).

### How to Perform One-Hot Encoding in TensorFlow

Let’s go through an example using TensorFlow to perform one-hot encoding.

#### Example in TensorFlow

Suppose you have the following categories: ["cat", "dog", "bird"].

```python
import tensorflow as tf

# Define the categories
categories = ["cat", "dog", "bird"]

# Convert categories to indices (e.g., using a mapping)
category_to_index = {category: index for index, category in enumerate(categories)}

# List of categories to encode
category_list = ["cat", "dog", "bird", "cat"]

# Convert category list to indices
indices = [category_to_index[category] for category in category_list]

# Perform one-hot encoding
one_hot_encoded = tf.one_hot(indices, depth=len(categories))

# Print the result
print(one_hot_encoded)
```

### Output
```
tf.Tensor(
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]], shape=(4, 3), dtype=float32)
```

### Explanation

1. **Mapping Categories to Indices**:
   ```python
   category_to_index = {category: index for index, category in enumerate(categories)}
   ```
   This creates a dictionary that maps each category to a unique index: `{"cat": 0, "dog": 1, "bird": 2}`.

2. **Convert Category List to Indices**:
   ```python
   indices = [category_to_index[category] for category in category_list]
   ```
   This converts the list of categories to their corresponding indices: `[0, 1, 2, 0]`.

3. **One-Hot Encoding**:
   ```python
   one_hot_encoded = tf.one_hot(indices, depth=len(categories))
   ```
   This converts the indices to one-hot encoded vectors. The `depth` parameter specifies the number of categories.

### Summary

- **One-Hot Encoding**: Converts categorical data into a binary vector where only one element is `1` and the rest are `0`.
- **Usage**: Useful for machine learning algorithms that require numerical input and when there is no ordinal relationship between categories.
- **TensorFlow**: The `tf.one_hot` function can be used to perform one-hot encoding efficiently.



# **04 Tensor Axes**

In TensorFlow, an axis is a dimension along which operations like reductions (sum, mean, etc.) and manipulations (slicing, reshaping) are performed. Understanding axes is crucial for working with tensors, especially when performing operations that aggregate or reshape data.

### Description of Axis

- **Axis**: An axis in a tensor refers to a specific dimension along which operations can be performed. For instance, in a 2D tensor (matrix), axis `0` typically refers to the rows and axis `1` refers to the columns. In a 3D tensor, axis `0` could be the depth, axis `1` could be the rows, and axis `2` could be the columns.

- **Axes in Tensor Operations**: When performing operations such as summing, averaging, or applying functions, you often need to specify the axis along which the operation should be performed. For example, summing along axis `0` will reduce the tensor along the rows, collapsing them into a single value per column.

### Examples

#### 1. **Reducing a Tensor along an Axis**

Consider a 2D tensor (matrix) and let's perform a sum operation along a specific axis.

```python
import tensorflow as tf

# Create a 2D tensor (matrix) with shape (3, 4)
tensor = tf.constant([[1, 2, 3, 4],
                      [5, 6, 7, 8],
                      [9, 10, 11, 12]])

# Sum along axis 0 (columns)
sum_axis_0 = tf.reduce_sum(tensor, axis=0)
print("Sum along axis 0:", sum_axis_0)
# Output: [15, 18, 21, 24]

# Sum along axis 1 (rows)
sum_axis_1 = tf.reduce_sum(tensor, axis=1)
print("Sum along axis 1:", sum_axis_1)
# Output: [10, 26, 42]
```

**Explanation**:
- **Axis 0**: Summing along axis `0` collapses rows, resulting in the sum of each column.
- **Axis 1**: Summing along axis `1` collapses columns, resulting in the sum of each row.

#### 2. **Slicing a Tensor along an Axis**

You can also slice a tensor along a particular axis.

```python
import tensorflow as tf

# Create a 3D tensor with shape (2, 3, 4)
tensor = tf.constant([[[1, 2, 3, 4],
                       [5, 6, 7, 8],
                       [9, 10, 11, 12]],

                      [[13, 14, 15, 16],
                       [17, 18, 19, 20],
                       [21, 22, 23, 24]]])

# Slice along axis 0
slice_axis_0 = tensor[1, :, :]
print("Slice along axis 0:", slice_axis_0)
# Output: [[13, 14, 15, 16],
#          [17, 18, 19, 20],
#          [21, 22, 23, 24]]

# Slice along axis 1
slice_axis_1 = tensor[:, 1, :]
print("Slice along axis 1:", slice_axis_1)
# Output: [[ 5,  6,  7,  8],
#          [17, 18, 19, 20]]

# Slice along axis 2
slice_axis_2 = tensor[:, :, 2]
print("Slice along axis 2:", slice_axis_2)
# Output: [[ 3,  7, 11],
#          [15, 19, 23]]
```

**Explanation**:
- **Axis 0**: Selecting a slice along axis `0` extracts one "layer" from the tensor.
- **Axis 1**: Selecting a slice along axis `1` extracts a slice of all layers but with the specified row.
- **Axis 2**: Selecting a slice along axis `2` extracts a slice of all layers and rows but with the specified column.

### Visualizing Axes

In general:
- For a tensor with shape `(D1, D2, ..., Dn)`, axis `0` is the first dimension, axis `1` is the second dimension, and so on up to axis `n-1` for the last dimension.

Understanding axes helps in effective tensor manipulation and operations, making it easier to perform complex data transformations and aggregations. If you have more questions or need further examples, feel free to ask!

Yes, exactly! The tensor shape `(2, 3, 4)` can indeed be interpreted in the context of gene expression data. Here’s how it maps to your example:

### Interpretation of Tensor Shape `(2, 3, 4)`

1. **Dimension 0 (Size 2)**: Represents the number of different gene expression profiles or conditions. In your example, this could be two different gene expression profiles.

2. **Dimension 1 (Size 3)**: Represents the number of replicates for each gene expression profile. So, you have three replicates for each profile.

3. **Dimension 2 (Size 4)**: Represents the time points at which the gene expression was measured. You have four time points for each replicate.

### Example Breakdown

- **Profile 1**:
  - **Replicate 1**: Measurements at time points 1, 2, 3, 4
  - **Replicate 2**: Measurements at time points 1, 2, 3, 4
  - **Replicate 3**: Measurements at time points 1, 2, 3, 4

- **Profile 2**:
  - **Replicate 1**: Measurements at time points 1, 2, 3, 4
  - **Replicate 2**: Measurements at time points 1, 2, 3, 4
  - **Replicate 3**: Measurements at time points 1, 2, 3, 4

Here’s a visual representation:

```
Gene Expression Profile 1:
[
 [ [Time1_Replicate1, Time2_Replicate1, Time3_Replicate1, Time4_Replicate1],
   [Time1_Replicate2, Time2_Replicate2, Time3_Replicate2, Time4_Replicate2],
   [Time1_Replicate3, Time2_Replicate3, Time3_Replicate3, Time4_Replicate3] ]
]

Gene Expression Profile 2:
[
 [ [Time1_Replicate1, Time2_Replicate1, Time3_Replicate1, Time4_Replicate1],
   [Time1_Replicate2, Time2_Replicate2, Time3_Replicate2, Time4_Replicate2],
   [Time1_Replicate3, Time2_Replicate3, Time3_Replicate3, Time4_Replicate3] ]
]
```

### Operations on This Tensor

- **Sum/Mean Across Time Points (Axis 2)**: To aggregate expression values across time points, you might perform operations along axis `2`, resulting in a tensor of shape `(2, 3)` where each element represents the sum or mean expression across the four time points.

- **Average Across Replicates (Axis 1)**: To get the average gene expression profile across replicates for each profile, you would perform operations along axis `1`, resulting in a tensor of shape `(2, 4)` where each element represents the average expression across replicates.

- **Profile Comparisons (Axis 0)**: To compare gene expression profiles, you might perform operations across axis `0`, aggregating or comparing across the two profiles.

This dimensionality allows you to structure and analyze gene expression data effectively, considering both the biological conditions and the experimental setup.

# **05 Reshaping Tensors**

Using `-1` in the reshape function is a convenient way to specify that TensorFlow should automatically determine the size of that particular dimension based on the total number of elements and the sizes of other dimensions. Here are some reasons why using `-1` is useful:

1. **Flexibility and Convenience:**
   - You don't need to manually calculate the size of the dimension you want TensorFlow to infer.
   - This is particularly helpful when working with tensors whose shapes might change dynamically during the program execution.

2. **Avoiding Errors:**
   - It reduces the risk of making mistakes in manually computing the new shape, especially in complex transformations or when dealing with higher-dimensional tensors.
   - TensorFlow ensures that the total number of elements remains the same, thus preventing reshape errors.

3. **Code Readability:**
   - Using `-1` can make the code more readable and maintainable, as it clearly indicates that TensorFlow should determine the appropriate size for that dimension.

### Example Scenarios:

#### 1. Flattening a Tensor:
```python
tensor = tf.reshape(tf.range(1, 17), (4, 4))
flattened_tensor = tf.reshape(tensor, [-1])
```
Here, `[-1]` specifies that the tensor should be reshaped into a one-dimensional tensor with all elements from the original tensor.

#### 2. Dynamic Shape Transformations:
Suppose you have a batch of images where the batch size can vary:
```python
batch_size = tf.placeholder(tf.int32, shape=[])
images = tf.placeholder(tf.float32, shape=[None, 28, 28, 3])  # Batch of 28x28 RGB images
flattened_images = tf.reshape(images, [batch_size, -1])
```
In this example, `-1` allows TensorFlow to automatically compute the number of features per image (which is `28 * 28 * 3 = 2352`), regardless of the batch size.

#### 3. Reshaping for Model Layers:
```python
input_tensor = tf.placeholder(tf.float32, shape=[None, 784])  # Batch of flattened 28x28 grayscale images
reshaped_tensor = tf.reshape(input_tensor, [-1, 28, 28, 1])
```
Here, `-1` allows TensorFlow to adjust the batch size dynamically while reshaping the flattened images back to their original 28x28 shape with 1 channel.

### Summary:
Using `-1` simplifies tensor reshaping by letting TensorFlow handle the size calculation for one dimension, ensuring the total number of elements is preserved. This leads to more flexible, error-resistant, and readable code.

# **06 Tensor Permutation and Transposition**

you can think of `tf.transpose` with the `perm` argument as similar to reshaping in that it changes the way data is organized. However, they are fundamentally different operations:

- **Reshaping**: Changes the shape of the tensor by rearranging the data in a new structure without changing the total number of elements. It can collapse or expand dimensions but preserves the data order.

- **Transposing with Permutation**: Changes the order of dimensions (axes) without changing the data within each dimension. It effectively reorders the axes based on the permutation specified.

### Example to Illustrate the Difference

#### Original Tensor
Consider a tensor of shape `(2, 3, 4)`:
```python
import tensorflow as tf

tensor = tf.constant([
    [[ 1,  2,  3,  4],
     [ 5,  6,  7,  8],
     [ 9, 10, 11, 12]],
    
    [[13, 14, 15, 16],
     [17, 18, 19, 20],
     [21, 22, 23, 24]]
])

print("Original shape:", tensor.shape)
print(tensor)
```

### Reshaping
Reshaping changes the shape but keeps the element order:
```python
reshaped_tensor = tf.reshape(tensor, (4, 6))
print("Reshaped shape:", reshaped_tensor.shape)
print(reshaped_tensor)
```
Output:
```
Original shape: (2, 3, 4)
tf.Tensor(
[[[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

 [[13 14 15 16]
  [17 18 19 20]
  [21 22 23 24]]], shape=(2, 3, 4), dtype=int32)
Reshaped shape: (4, 6)
tf.Tensor(
[[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]], shape=(4, 6), dtype=int32)
```

### Transposing with Permutation
Transposing with permutation changes the axes order:
```python
transposed_tensor = tf.transpose(tensor, perm=[1, 2, 0])
print("Transposed shape:", transposed_tensor.shape)
print(transposed_tensor)
```
Output:
```
Transposed shape: (3, 4, 2)
tf.Tensor(
[[[ 1 13]
  [ 2 14]
  [ 3 15]
  [ 4 16]]

 [[ 5 17]
  [ 6 18]
  [ 7 19]
  [ 8 20]]

 [[ 9 21]
  [10 22]
  [11 23]
  [12 24]]], shape=(3, 4, 2), dtype=int32)
```

### Key Differences

- **Reshaping**:
  - Changes the shape while preserving the element order.
  - The total number of elements remains the same.
  - Useful for preparing data for different layers in neural networks.

- **Transposing with Permutation**:
  - Changes the order of dimensions (axes).
  - Does not change the number of elements or their internal order within each axis.
  - Useful for changing data layout, often required for operations like matrix multiplication or preparing data for specific layer types in neural networks.

### Summary
While both reshaping and transposing with permutation change the structure of the tensor, reshaping alters the arrangement of the elements to fit a new shape, whereas transposing with permutation reorders the axes of the tensor. Each operation is useful in different contexts depending on the desired data layout and operations to be performed.