# Notebook Info

From the data tables that we have, we try identifying the features that matter the most for forecasting
failures.

For now the data is pulled from the `xdiagresults` table and failure is imported from the `failure_info` table.

Database Details:
```
database = 'oasis-dev'
schema = 'clean'
table1 = 'xdiagresults'  # For features
table2 = 'failure_info'  # For Failures
```

# Imports

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [2]:
import pandas as pd
from library import lib_aws

pd.set_option('display.max_rows', 500)
import warnings
warnings.filterwarnings('ignore')

# Data Import

- Features imported from `xspoc.xdiagresults`
- Failures imported from `clean.failure_info`


In [8]:
%%time
# Querying the features

query_features = """
SELECT 
    "NodeID",
    "Date",
    "PPRL",
    "MPRL",
    "FluidLoadonPump",
    "PumpIntakePressure"
FROM
    xspoc.xdiagresults
ORDER BY "NodeID", "Date";
"""

query_failures = """
SELECT 
    "NodeID",
    "Last Oil",
    "Start Date",
    "Finish Date",
    "Job Type",
    "Job Bucket",
    "Primary Symptom",
    "Secondary Symptom"
FROM
    clean.failure_info
ORDER BY "NodeID";
"""

with lib_aws.PostgresRDS(db='oasis-dev') as engine:
    features = pd.read_sql(query_features, engine, parse_dates=['Date'])
    failures = pd.read_sql(query_failures, engine, parse_dates=['Last Oil', 'Start Date', 'Finish Date'])

Connected to oasis-dev DataBase
Connection Closed
Wall time: 18.8 s


In [14]:
# Cleanin NodeID in features
features['NodeID'] = (features['NodeID'].str.replace("#", "")  # remove #
                                     .str.replace('\s+', ' ', regex=True)  # remove multiple spaces if present
                                     .str.strip()  # Remove trailing whitespaces
                                     .str.lower()  # lower all character
                                     .str.title()  # Uppercase first letter of each word
                                     .map(lambda x: x[0:-2] + x[-2:].upper()))

In [15]:
features.NodeID

0             Bonner 9-12H
1             Bonner 9-12H
2             Bonner 9-12H
3             Bonner 9-12H
4             Bonner 9-12H
                ...       
105248    Stenehjem 15X-9H
105249    Stenehjem 15X-9H
105250    Stenehjem 15X-9H
105251    Stenehjem 15X-9H
105252    Stenehjem 15X-9H
Name: NodeID, Length: 105253, dtype: object