## Installing libraries

In [0]:
!pip install tensorflow-transform

Collecting tensorflow-transform
[?25l  Downloading https://files.pythonhosted.org/packages/2d/bd/8ba8c1310cd741e0b83d8a064645a55c557df5a2f6b4beb11cd3a37457ed/tensorflow-transform-0.21.2.tar.gz (241kB)
[K     |█▍                              | 10kB 23.1MB/s eta 0:00:01[K     |██▊                             | 20kB 3.0MB/s eta 0:00:01[K     |████                            | 30kB 4.0MB/s eta 0:00:01[K     |█████▍                          | 40kB 4.3MB/s eta 0:00:01[K     |██████▉                         | 51kB 3.5MB/s eta 0:00:01[K     |████████▏                       | 61kB 3.9MB/s eta 0:00:01[K     |█████████▌                      | 71kB 4.2MB/s eta 0:00:01[K     |██████████▉                     | 81kB 4.7MB/s eta 0:00:01[K     |████████████▏                   | 92kB 5.0MB/s eta 0:00:01[K     |█████████████▋                  | 102kB 4.8MB/s eta 0:00:01[K     |███████████████                 | 112kB 4.8MB/s eta 0:00:01[K     |████████████████▎               | 122

## Importing libraries

In [0]:
import tempfile
import pandas as pd
import tensorflow as tf
import tensorflow_transform as tft
import tensorflow_transform.beam.impl as tft_beam

import apache_beam.io.iobase #Adicionado novo import

from __future__ import print_function
from tensorflow_transform.tf_metadata import dataset_metadata, dataset_schema, schema_utils #Adicionado schema_utils

## Preprocessing

### Loading database

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
dataset = pd.read_csv("/content/drive/My Drive/Presentations/TensorFlow on Google Cloud/polution_small.csv")

In [0]:
dataset.head()

Unnamed: 0,Date,pm10,no2,so2,soot
0,1/1/2009,98.67,14.1,44.38,34.81
1,1/2/2009,52.33,14.1,29.75,33.06
2,1/3/2009,74.67,20.5,36.25,39.25
3,1/4/2009,72.0,17.3,46.44,34.38
4,1/5/2009,81.0,25.64,56.56,45.59


### Droping column with datetime

In [0]:
features = dataset.drop("Date", axis = 1)

In [0]:
features.head()

Unnamed: 0,pm10,no2,so2,soot
0,98.67,14.1,44.38,34.81
1,52.33,14.1,29.75,33.06
2,74.67,20.5,36.25,39.25
3,72.0,17.3,46.44,34.38
4,81.0,25.64,56.56,45.59


### Converting to a dictionary


In [0]:
dict_features = list(features.to_dict("index").values())

In [0]:
dict_features[0:2]

[{'no2': 14.1, 'pm10': 98.67, 'so2': 44.38, 'soot': 34.81},
 {'no2': 14.1, 'pm10': 52.33, 'so2': 29.75, 'soot': 33.06}]

### Defining metadata

In [0]:
data_metadata = dataset_metadata.DatasetMetadata(dataset_schema.from_feature_spec({
    "no2": tf.io.FixedLenFeature([], tf.float32),
    "pm10": tf.io.FixedLenFeature([], tf.float32),
    "so2": tf.io.FixedLenFeature([], tf.float32),
    "soot": tf.io.FixedLenFeature([], tf.float32),
}))

W0519 18:41:16.057198 139799407392640 deprecation.py:323] From <ipython-input-11-e96c9b286142>:5: from_feature_spec (from tensorflow_transform.tf_metadata.dataset_schema) is deprecated and will be removed in a future version.
Instructions for updating:
from_feature_spec is a deprecated, use schema_utils.schema_from_feature_spec


In [0]:
data_metadata

{'_schema': feature {
  name: "no2"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "pm10"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "so2"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
feature {
  name: "soot"
  type: FLOAT
  presence {
    min_fraction: 1.0
  }
  shape {
  }
}
}

## Preprocing function

In [0]:
def preprocessing_fn(inputs):
  no2 = inputs["no2"]
  pm10 = inputs["pm10"]
  so2 = inputs["so2"]
  soot = inputs["soot"]
  
  no2_normalized = no2 - tft.mean(no2)
  so2_normalized = so2 - tft.mean(so2)
  
  pm10_normalized = tft.scale_to_0_1(pm10)
  soot_normalized = tft.scale_by_min_max(soot)
  
  return {
      "no2_normalized": no2_normalized,
      "so2_normalized": so2_normalized,
      "pm10_normalized": pm10_normalized,
      "sott_normalized": soot_normalized
  }

## Coding

Tensorflow Transform use  **Apache Beam** background to perform operations. 

Function parameters:

    dict_features - Our database converted to dict
    data_metadata - Defined metadata
    preprocessing_fn - preprocessing function


Apache Beam Syntax

```
result = data_to_pass | where_to_pass_the_data
```

Explaining:

**result**  -> `transformed_dataset, transform_fn`

**data_to_pass** -> `(dict_features, data_metadata)`

**where_to_pass_the_data** -> `tft_beam.AnalyzeAndTransformDataset(preprocessing_fn)` 

```
transformed_dataset, transform_fn = ((dict_features, data_metadata) | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))

```

Learn more: 
https://beam.apache.org/documentation/programming-guide/#applying-transforms

https://beam.apache.org/ 

In [0]:
def data_transform():
  with tft_beam.Context(temp_dir = tempfile.mkdtemp()):
    transformed_dataset, transform_fn = ((dict_features, data_metadata) | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))
    
  transformed_data, transformed_metadata = transformed_dataset
  
  for i in range(len(transformed_data)):
    print("Initial: ", dict_features[i])
    print("Transformed: ", transformed_data[i])

In [0]:
data_transform()

W0519 18:42:22.559719 139799407392640 impl.py:425] Tensorflow version (2.1.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
W0519 18:42:22.574245 139799407392640 interactive_environment.py:112] Interactive Beam requires Python 3.5.3+.
W0519 18:42:22.575671 139799407392640 interactive_environment.py:125] Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
W0519 18:42:22.948338 139799407392640 impl.py:425] Tensorflow version (2.1.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
W0519 18:42:24.590297 139799407392640 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201:

Initial:  {'so2': 44.38, 'no2': 14.1, 'pm10': 98.67, 'soot': 34.81}
Transformed:  {u'no2_normalized': -18.577978, u'pm10_normalized': 0.34071696, u'so2_normalized': 28.855408, u'sott_normalized': 0.2834235}
Initial:  {'so2': 29.75, 'no2': 14.1, 'pm10': 52.33, 'soot': 33.06}
Transformed:  {u'no2_normalized': -18.577978, u'pm10_normalized': 0.16963857, u'so2_normalized': 14.225407, u'sott_normalized': 0.26620758}
Initial:  {'so2': 36.25, 'no2': 20.5, 'pm10': 74.67, 'soot': 39.25}
Transformed:  {u'no2_normalized': -12.1779785, u'pm10_normalized': 0.25211355, u'so2_normalized': 20.725407, u'sott_normalized': 0.32710278}
Initial:  {'so2': 46.44, 'no2': 17.3, 'pm10': 72.0, 'soot': 34.38}
Transformed:  {u'no2_normalized': -15.377979, u'pm10_normalized': 0.24225645, u'so2_normalized': 30.915405, u'sott_normalized': 0.2791933}
Initial:  {'so2': 56.56, 'no2': 25.64, 'pm10': 81.0, 'soot': 45.59}
Transformed:  {u'no2_normalized': -7.037979, u'pm10_normalized': 0.2754827, u'so2_normalized': 41.0354

https://www.tensorflow.org/tfx/transform/get_started
