This notebook goes over the data and sets it up in the database.

# Imports

In [1]:
"""
For setting up local imports in an Ipython Shell
This is a workaround for ipython, dont need it for basic python scripts
"""
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [2]:
import pandas as pd
import numpy as np

# Daily Streaming
from config import username, password, endpoint, data_path
from library import lib_aws

In [3]:
# HELPER FUNCTIONS

# Clean up strings
def node_clean(node_str):
    """
    Function that cleans up NodeID strings
    """
    node_str = " ".join(node_str.split())  # remove empty white spaces
    node_str = node_str.replace('#', "").strip().lower().title()  # remove # character, plus clean characters
    node_str =  node_str[0:-2] + node_str[-2:].upper() # last 2 characters whill alwsy be upper case
    return node_str

# Historical Data

The following three sources of historical data are present:
```
- tblDataHistory from 2 sql server backups
- ESPData.E2E.20200924.1601.csv : Latest backup in the sftp server before streaming starts
```

In [4]:
%%time
# latest backup data
file_name = 'ESPData.E2E.20200924.1601.csv'
full_path = os.path.join(data_path, file_name)

esp_hist = pd.read_csv(full_path, parse_dates=['Date'])
esp_hist.NodeID = esp_hist.NodeID.apply(node_clean)
esp_hist.drop_duplicates(subset=['NodeID', 'Date', 'Address'], inplace=True)
esp_hist.reset_index(inplace=True, drop=True)
esp_hist.head()

Wall time: 1min 25s


Unnamed: 0,NodeID,Date,Address,Value
0,Acadia 31-25H,2020-08-18,2002,38.346401
1,Acadia 31-25H,2020-08-18,2004,18.342501
2,Acadia 31-25H,2020-08-18,2005,24.0
3,Acadia 31-25H,2020-08-18,2006,28735.0
4,Acadia 31-25H,2020-08-18,2007,16373.0


In [5]:
# Add data to the db
lib_aws.AddData.add_data(df=esp_hist, 
                 db='esp-data', 
                 table='data',
                 merge_type='append',  # Only use replace if you know what you are doing
                 index_col='NodeID') 

Data replaceed on Table data in time 660.56s


# Data Streaming

- Data streams in the sftp server
- Combine the codes with `oasis-data-stream` application.
- Has been set up

# Extra Tables

- This includes `espaddr` table which will help in mapping the numerical addresses to Feature names we know and love :)
- The `espParameters.xlsx` file has the necessary info.

In [11]:
file_path = r'C:\Users\rai_v\OneDrive\Python Coursera\local-data\oasis\espParameters.xlsx'
espaddr = pd.read_excel(file_path)
espaddr.head()

Unnamed: 0,Address,Parameter,Pump
0,32176.0,Motor current,ESP Schlumberger Uniconn
1,32166.0,Frequency,ESP Schlumberger Uniconn
2,32141.0,Motor Temperature,ESP Schlumberger Uniconn
3,32145.0,Current Leakage,ESP Schlumberger Uniconn
4,32140.0,Pump Intake Temperature,ESP Schlumberger Uniconn


In [10]:
print("Pump Value Counts")
display(espaddr.Pump.value_counts())

print("Parameter Value Counts")
display(espaddr.Parameter.value_counts())

Pump Value Counts


ESP Summit                  15
ESP Apergy AL SPOC          14
ESP Apergy Smarten IAM      13
ESP Schlumberger Uniconn    13
Name: Pump, dtype: int64

Parameter Value Counts


Pump Intake Temperature       4
Pump Intake Pressure          4
Frequency                     4
Casing Pressure               4
Current Leakage               4
Tubing Pressure               4
Motor current                 4
Pump Discharge Pressure       4
Output Amps                   4
Motor Temperature             4
Output volts                  4
Y Vibration                   3
X Vibration                   3
Drive Fequency Setpoint       1
PID Feedback Loop Setpoint    1
Vibration                     1
PID Setpoint                  1
PIDTargetSetPoint             1
Name: Parameter, dtype: int64

In [12]:
# Adding it into the database
lib_aws.AddData.add_data(df=espaddr, 
                 db='esp-data', 
                 table='espaddr',
                 merge_type='append',  # Only use replace if you know what you are doing
                 index_col='Address') 

Data appended on Table espaddr in time 16.10s
