# Pipeline para a 3W

Neste notebook será implementado uma Pipiline de ML aplicado ao problema da dataset 3W da Petrobras.

Para tal será usada a biblioteca TensorFlow Extended.

Autoria: Marcus Carr

### Nomenclatura

instance/instância de um **evento**: equivale a 1 arquivo .csv

**sample**: cada timestep dentro de um .csv

### Estrutura do projeto. 

Como ainda não sei como será tudo com a implementação do módulos do TFX, vou deixar um módulo principal por enquanto.

Posteriormente, dá para analisar se seria mais adequado quebrar em diferentes módulos a estrutra do código.

In [1]:
# Verificar que TFX está instalado
import tensorflow as tf
print('TensorFlow version: {}'.format(tf.__version__))
from tfx import v1 as tfx
print('TFX version: {}'.format(tfx.__version__))

2023-07-31 20:14:05.316157: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-07-31 20:14:05.790263: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-07-31 20:14:05.792007: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


TensorFlow version: 2.12.1
TFX version: 1.13.0


### Seting up variables

These will define our pipeline.

In [2]:
import os

PIPELINE_NAME = "pipeline-3w"

# Output directory to store artifacts generated from the pipeline.
PIPELINE_ROOT = os.path.join('pipelines', PIPELINE_NAME)
# Path to a SQLite DB file to use as an MLMD storage.
METADATA_PATH = os.path.join('metadata', PIPELINE_NAME, 'metadata.db')
# Output directory where created models from the pipeline will be exported.
SERVING_MODEL_DIR = os.path.join('serving_model', PIPELINE_NAME)

# Set default logging level.
from absl import logging
logging.set_verbosity(logging.INFO)

In [3]:
import raw_data_acquisition as rda
import raw_data_inspector as rdi
from constants import utils, config
import models

Adquirir dados!

In [4]:
rda.acquire_dataset_if_needed() # 17min48s

INFO:absl:Found existing converted data.


In [5]:
print(f"Size of directory with converted files is: {utils.get_directory_size(config.DIR_CONVERTED_DATASET)/(1024**3):.3f} GB")

Size of directory with converted files is: 1.263 GB


In [7]:
inspector = rdi.RawDataInspector(
    config.DIR_CONVERTED_DATASET,
    config.PATH_DATA_INSPECTOR_CACHE,
    True
)

inspector.get_metadata_table()

Unnamed: 0_level_0,class_type,source,well_id,path,timestamp,file_size,num_timesteps
hash_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
12eb534,SPURIOUS_CLOSURE_DHSV,REAL,12.0,/home/aipas/Documents/coding/lemi_3w/data/data...,2017-03-20 02:10:00,124392,6939
5cc826a,SPURIOUS_CLOSURE_DHSV,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,945062,28799
c82853f,SPURIOUS_CLOSURE_DHSV,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,935871,28799
9f475ba,SPURIOUS_CLOSURE_DHSV,REAL,9.0,/home/aipas/Documents/coding/lemi_3w/data/data...,2017-03-13 16:08:04,237274,6738
e7cec40,SPURIOUS_CLOSURE_DHSV,REAL,11.0,/home/aipas/Documents/coding/lemi_3w/data/data...,2014-05-15 10:46:09,144840,22409
...,...,...,...,...,...,...,...
f0936fa,QUICK_RESTRICTION_PCK,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,645552,26999
3549e12,QUICK_RESTRICTION_PCK,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,770200,26999
3aeb749,QUICK_RESTRICTION_PCK,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,480640,26999
fbdf7d0,QUICK_RESTRICTION_PCK,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,539257,26999


In [17]:
inspector.get_metadata_table(class_types=[models.EventClassType(8).name])

Unnamed: 0_level_0,class_type,source,well_id,path,timestamp,file_size,num_timesteps
hash_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
caf4d77,HYDRATE_IN_PRODUCTION_LINE,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,852114,26999
5b7fe50,HYDRATE_IN_PRODUCTION_LINE,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,842411,26999
4fda0d1,HYDRATE_IN_PRODUCTION_LINE,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,903977,26999
a59a858,HYDRATE_IN_PRODUCTION_LINE,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,947840,26999
5fa48a9,HYDRATE_IN_PRODUCTION_LINE,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,882326,26999
...,...,...,...,...,...,...,...
096aea5,HYDRATE_IN_PRODUCTION_LINE,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,870658,26999
5c33f90,HYDRATE_IN_PRODUCTION_LINE,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,872063,26999
237d812,HYDRATE_IN_PRODUCTION_LINE,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,890828,26999
ae34dbe,HYDRATE_IN_PRODUCTION_LINE,SIMULATED,,/home/aipas/Documents/coding/lemi_3w/data/data...,NaT,907551,26999
