# Ungraded lab: Manual Feature Engineering
------------------------
 
Welcome, during this ungraded lab you are going to perform feature engineering using TensorFlow and Keras. By having a deeper understanding of the problem you are dealing with and proposing transformations to the raw features you will see how the predictive power of your model increases. In particular you will:


1. Define the model using feature columns.
2. Use Lambda layers to perform feature engineering on some of these features.
3. Compare the training history and predictions of the model before and after feature engineering.

In [1]:
# Import the packages

# Utilities
import os
import logging

# For visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd

# For modelling
import tensorflow as tf
from tensorflow import feature_column as fc
from keras import layers, models

# Set TF logger to only print errors (dismiss warnings)
logging.getLogger("tensorflow").setLevel(logging.ERROR)

# Display versions of TF and TFX related packages
print('TensorFlow version: {}'.format(tf.__version__))

TensorFlow version: 2.10.1


### 1.1 - Define paths

You will define a few global variables to indicate paths in the local workspace.

In [6]:
# Declare paths to the data
if not os.path.isdir("./data"):
    os.makedirs("./data")
DATA_DIR = './data'

# path to the raw training data
TRAINING_DATA = f'{DATA_DIR}/A_E_Fire_Dataset.csv'

### 1.2 Preview the  dataset

In [7]:
# Load the dataset to a dataframe
df = pd.read_csv(TRAINING_DATA)

# Preview the dataset
df.head()

Unnamed: 0,SIZE,FUEL,DISTANCE,DESIBEL,AIRFLOW,FREQUENCY,STATUS
0,1,gasoline,10,96,0.0,75,0
1,1,gasoline,10,96,0.0,72,1
2,1,gasoline,10,96,2.6,70,1
3,1,gasoline,10,96,3.2,68,1
4,1,gasoline,10,109,4.5,67,1


## Create an input pipeline 

Now we create an input pipeline to preprocess the data

To load the data for the model you are going to use an experimental feature of Tensorflow that lets loading directly from a `csv` file.

In [None]:
# Specify the target data
LABEL_COLUMN = 'STATUS'

# Specify numerical columns
NUMERICAL_COLS = ['SIZE', 'DISTANCE', 'DESIBEL', 'AIRFLOW', 'FREQUENCY']

# Specify string columns
STRING_COLS = ['FUEL']

# A function to seperate features and labels
def features_and_labels(row_data):
    label = row_data.pop(LABEL_COLUMN)
    return row_data, label

# A utility method to create a tf.data dataset from a CSV file
def load_dataset(pattern, batch_size=1, mode='eval'):
    dataset = tf.data.experimental.make_csv_dataset(pattern, batch_size)
    
    dataset = dataset.map(features_and_labels) #features, labels
    if mode == 'train':
        # Notice the repeat method is used so this dataset will loop infinitely
        dataset = dataset.shuffle(1000).repeat()
        # take advantage of multi-threading; 1=AUTOTUNE
        dataset = dataset.prefetch(1)
        
    return dataset

## Create a DNN Model in Keras
Now you will build a simple Neural Network with the numerical features as input represented by a [`DenseFeatures`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/DenseFeatures) layer (which produces a dense Tensor based on the given features), two dense layers with ReLU activation functions and an output layer with a linear activation function (since this is a regression problem).