# <-- Task 24 -->

# Data preprocessing

### Data Vectorization

All inputs and targets in a neural network must be tensors of floating-point data (or, in specific cases, tensors of integers). Whatever data you need to process—sound, images, text—you must first turn into tensors, a step called data vectorization.

To convert data into a tensor for a deep learning model, you typically use a library like TensorFlow or PyTorch. using TensorFlow, you can convert NumPy arrays or other compatible formats directly into tensors using tf.convert_to_tensor() or tf.constant()

In [22]:
import numpy as np
import tensorflow as tf

In [23]:
data = np.array([[1, 2], [3, 4]])  # NumPy array
tensor = tf.convert_to_tensor(data)

In [24]:
print (data)
print (tensor)

[[1 2]
 [3 4]]
tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32)


we can also use a flattened layer in a deep learning model to convert your input data into a tensor format. A flattened layer reshapes the input data into a 1-dimensional tensor, which can then be passed into subsequent layers of the model.

In [25]:
input_data = data 

# Create a flattened layer
flattened_layer = tf.keras.layers.Flatten()

# Apply the flattened layer to your input data
flattened_data = flattened_layer(input_data)

# Print the shape of the flattened data tensor
print(flattened_data.shape)


(2, 2)


### VALUE NORMALIZATION

It isn’t safe to feed into a neural network data that takes relatively large values,  (for example, data where one
feature is in the range 0–1 and another is in the range 100–200).

your data should have the following characteristics:

 Take small values—Typically, most values should be in the 0–1 range.

 Be homogenous—That is, all features should take values in roughly the same range

In [26]:
data = data.astype(np.float32) / 255.0 # Normalize pixel values to [0, 1], tthis can be used when working with gray scale images 
data

array([[0.00392157, 0.00784314],
       [0.01176471, 0.01568628]], dtype=float32)

### HANDLING MISSING VALUES

 In general, with neural networks, it’s safe to input missing values as 0, with the condition that 0 isn’t already a meaningful value. The network will learn from exposure to the data that the value 0 means missing data and will start ignoring the value.
  
### But 
if you’re expecting missing values in the test data, but the network was
trained on data without any missing values, the network won’t have learned to ignore
missing values! In this situation, you should artificially generate training samples with
missing entries: copy some training samples several times, and drop some of the features that you expect are likely to be missing in the test data. 

# Feature engineering

Feature engineering in deep learning involves creating new features or transforming existing features to enhance the representation of the data. We use our own knowledge about the data to make the algorithm work better.
Hardcoded transformations are applied to the data before it goes into the model.

Modern deep learning removes the need for most feature engineering, because neural networks are capable of automatically extracting useful features from raw data. 
But still we need it for two reasons:

 Good features still allow you to solve problems more elegantly while using fewer
resources.

 Good features let you solve a problem with far less data. 