<a name="page-top"></a>

In [2]:
import os, sys
from pprint import pprint
from IPython.core.display import display, HTML

%reload_ext autoreload
%autoreload 2

# Bearings Vibration Anomaly Detection

TOC
1. <a href="#dataset">The Dataset - Bearings Vibration Sensory Time Series</a><br />
2. <a href="#mongodb">Download Sensors Data to MongoDB</a><br />
3. <a href="#pyspark">PySpark Connection</a><br />
4. Keras LSTM Autoencoder
5. Abnormal Vibrations - Hard Failure Early Warning

<img width="33.3%" alt="Coming Soon !" src="./images/under-construction.png" align="center" />

<a name="dataset"></a>
<h2>1. The Dataset - Bearings Vibration Sensory Time Series <small><em><a href="#page-top">(go back to top &uarr;)]</a></em></small></h2>

<div style='text-align: justify;'>
The "Bearing Data Set" is part of the records provided by NASA in its <b>Prognostics Data Repository</b> <a href='https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/' target='_blank'><img href='.' src='./images/target_blank.png' style='vertical-align: baseline; display: inline;' /></a>. This NASA data repository gathers a collection of datasets that have been donated by various universities, agencies, or companies.
<br />
<br />
It was originally used for the Feb 2006 research paper by J. Lee, H. Qiu, G. Yu, J. Lin, and <em>Rexnord Technical Services</em>, for the <u>Intelligent Maintenance System center <a href='http://www.imscenter.net/' target='_blank'><img href='.' src='./images/target_blank.png' style='vertical-align: baseline; display: inline;' /></a> - University of Cincinnati</u>, entitled <b>Wavelet filter-based weak signature detection method and its application on rolling element bearing prognostics</b> <a href='https://www.researchgate.net/publication/223556476_Wavelet_filter-based_weak_signature_detection_method_and_its_application_on_rolling_element_bearing_prognostics' target='_blank'><img href='.' src='./images/target_blank.png' style='vertical-align: baseline; display: inline;' /></a>.
</div>

<div style="float: left; border-right: 5px solid transparent;">
<table border="0" width="350px;" style="background-color: #f5f5f5; float: left;">
    <tr>
        <td>
            <center><b>A Spherical Roller Bearing <em>(sectional view)</em></b></center>
        </td>
    </tr>
    <tr>
        <td>
            <img alt="spherical Roller Bearing" src="./images/spherical_roller_bearing.png?uncache=12364" />
        </td>
    </tr>
    <tr>
        <td>
            <center><b>Bearings test rig and sensor placement illustration</b></center>
        </td>
    </tr>
    <tr>
        <td>
            <a href="./images/bearings_test_rig.png" target="_blank" style="text-decoration: none; color: inherit;">
            <img alt="Bearings test rig" src="./images/bearings_test_rig.png?uncache=12364" />
            <br />
            <center><em>click to enlarge</em></center></a>
        </td>
    </tr>
</table></div>
<div style='text-align: justify;'>
The data comprises measurements from vibrations sensors respectively mounted on 4 distinct spherical roller bearings. These bearings were placed along a rotating shaft as illustrated on the bottom-left figure.
<br />
The experiment was of a 'test-to-failure' nature and has been reproduced 3 times. All bearing failures occurred after exceeding their designed life time, which was of more than 100 million revolutions.
<br />
<p style="text-align: justify; color: darkgray;">
<u>REMARK</u>&nbsp;: for 'test 1', 8 sensors were used (2 per bearing), when for both 'test 2' and 'test 3' only 4 sensors were deemed sufficient to be mounted on the test bench (1 per bearing).
</p>
</div>



<a name="mongodb"></a>
<h2>2. Download Sensors Data to MongoDB <small><em><a href="#page-top">(go back to top &uarr;)]</a></em></small></h2>

<div style='text-align: justify;'>
<a href="https://www.mongodb.com/" target="_blank"><img alt="Mongo DB !" src="./images/mongodb.png?uncache=34364" align="right" style="height: 50px; margin: 0px 0px 0px 10px;" /></a>
In <a href="./measurements_to_mongo.html" target="_blank">this companion notebook <img href='.' src='./images/target_blank.png' style='vertical-align: baseline; display: inline;' /></a>, we automate the download of the data from the NASA repository and load it into a Mongo DB database. The source data is provided as a compressed set of very numerous files, each consisting in tens of thousands time-slices of 1 second sensors recording at a sampling rate of 20 kHz. We use <span style="background-color: #e0e0e0">&nbsp;multiprocessing&nbsp;</span> to speedup the data dumps to Mongo DB.
<br />
<br />
For our read needs on the newly created <b>nasa_bearing_ims_db</b> Mongo DB database, a user named 'readOnlyUser' is available to us&nbsp;:
</div>

In [4]:
from pymongo import MongoClient

username = 'readOnlyUser'
password = 'readOnlyUser_password'

client = MongoClient('mongodb://%s:%s@127.0.0.1' % (username, password)
                     , appname='Abnormal Vibration Watchdog')

We can for instance look into 2 measurement timestep documents from test 2&nbsp;:

In [9]:
cursor = client.nasa_ims_database.measurements.find(
    {'test_id': 2}, limit=2)
for document in cursor : pprint(document)

{'_id': ObjectId('5f606c7f9c3887062e5f88bb'),
 'sensor_1': '-0.049',
 'sensor_2': '-0.071',
 'sensor_3': '-0.132',
 'sensor_4': '-0.010',
 'test_id': 2,
 'timestamp': datetime.datetime(2004, 2, 12, 10, 32, 39)}
{'_id': ObjectId('5f606c7f9c3887062e5f88bc'),
 'sensor_1': '-0.042',
 'sensor_2': '-0.073',
 'sensor_3': '-0.007',
 'sensor_4': '-0.105',
 'test_id': 2,
 'timestamp': datetime.datetime(2004, 2, 12, 10, 32, 39)}


<div style='text-align: justify;'>
<p style="text-align: justify; color: darkgray;">
<u>REMARK</u>&nbsp;: Once the data dowloaded, we can follow this link to the README file that comes alongside it&nbsp;: <a href="./data/Readme Document for IMS Bearing Data.pdf" target="_blank"><img href='.' src='./images/target_blank.png' style='vertical-align: baseline; display: inline;' /></a>. This README provides a little further details on the experiment run to produce the dataset than introduced <a href="#dataset" style="color: #819bc7;"><b>&sect;1.</b></a>.
</p>
</div>

<a name="pyspark"></a>
<h2>3. PySpark Connection <small><em><a href="#page-top">(go back to top &uarr;)]</a></em></small></h2>

<a href="https://spark.apache.org/docs/latest/api/python/index.html" target="_blank"><img style="height: 50px;" alt="PySpark" src="./images/pyspark.png?uncache=3443644" align="right" /></a>
We will be interfacing our <em>NASA IMS Measurements</em> Mongo DB database collection thru the <a href="https://docs.mongodb.com/spark-connector/master/python-api/" target="_blank"><b>MONGODB SPARK CONNECTOR</b> <img href='.' src='./images/target_blank.png' style='vertical-align: baseline; display: inline;' /></a>. This will allow for distributed access and pre-processing of our data.
<br />
<br />
Lets first establish a "Mongo DB connector aware" Spark Session for the Jupyter kernel instance running this notebook&nbsp;:

In [5]:
from pyspark import SparkContext
from pyspark.sql import SparkSession

# add the 'mongo-spark-connector' classpath info
# PRIOR to the JVM start on this notebook kernel
spark_jars_directory = os.path.realpath("C:/Users/Organization/.ivy2/jars/")
sys.path.insert(0, spark_jars_directory)
SUBMIT_ARGS = '--packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.2 pyspark-shell'
# - where "2.11" is our Scala version
#   (on which our Spark 2.4.4 is running)
# - where 2.4.2 is the mongodb-spark connector version
#   (compatible with our Spark 2.4.4 version)
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS

## Start Spark
master = "local[*]" ; app_name = "Abnormal_Vibration_Watchdog"
sc = SparkContext(master = master, appName = app_name)
spark = SparkSession(sc).builder.getOrCreate()

display(HTML("<em>To monitor jobs &amp; stages, go to the " +
             "<a href='" + spark.sparkContext.uiWebUrl + "/environment/'" +
             " target='_blank'>Spark UI</a> (on your Spark master host)</em>"))

We can now set a reference to our Mongo DB database thru a Spark DataFrame&nbsp;:

In [6]:
df = (
    spark.read
        .format("com.mongodb.spark.sql.DefaultSource")
        .option("uri", "mongodb://" + username + ":" + password + "@127.0.0.1:27017")
        .option("database", "nasa_ims_database")
        .option("collection", "measurements")

        .option("pipeline", "[{ $limit: 10 }]") # temporary

        .load()
)

In [7]:
df.printSchema()

root
 |-- _id: struct (nullable = true)
 |    |-- oid: string (nullable = true)
 |-- sensor_1: string (nullable = true)
 |-- sensor_2: string (nullable = true)
 |-- sensor_3: string (nullable = true)
 |-- sensor_4: string (nullable = true)
 |-- sensor_5: string (nullable = true)
 |-- sensor_6: string (nullable = true)
 |-- sensor_7: string (nullable = true)
 |-- sensor_8: string (nullable = true)
 |-- test_id: integer (nullable = true)
 |-- timestamp: timestamp (nullable = true)



We can also look into any two measurement timestep documents from our dataset&nbsp;:

In [8]:
from pprint import pprint

records = df.rdd.map(lambda row: row.asDict()).take(2)
for record in records :
    pprint(record)

{'_id': Row(oid='5f60cea78a22a457166fda0c'),
 'sensor_1': '-0.022',
 'sensor_2': '-0.039',
 'sensor_3': '-0.183',
 'sensor_4': '-0.054',
 'sensor_5': '-0.105',
 'sensor_6': '-0.134',
 'sensor_7': '-0.129',
 'sensor_8': '-0.142',
 'test_id': 1,
 'timestamp': datetime.datetime(2003, 10, 22, 14, 6, 24)}
{'_id': Row(oid='5f60cea78a22a457166fda0d'),
 'sensor_1': '-0.105',
 'sensor_2': '-0.017',
 'sensor_3': '-0.164',
 'sensor_4': '-0.183',
 'sensor_5': '-0.049',
 'sensor_6': '0.029',
 'sensor_7': '-0.115',
 'sensor_8': '-0.122',
 'test_id': 1,
 'timestamp': datetime.datetime(2003, 10, 22, 14, 6, 24)}


In [22]:
spark.stop()

We're now well-equipped to tackle our model training challenge. We're indeed now in a position to developp a custom Keras fit-generator that will handle our training set like a champ'&nbsp;!

<br />
<br />
<br />
<b><p style="font-size=24pt">BEING CONTINUED..</p></b>
<br />
<br />
<br />

<hr style="height:2px;border-width:0;color:gray;background-color:gray;width:80%" />

# EXTRA

<b>Export Notebook to HTML (with Markdown extension cells evaluated and hidden cell outputs omitted)&nbsp;:</b>

In [9]:
from my_TS_Anomaly_lib.jupyter_markdown_extension import md_extension_to_html

md_extension_to_html(os.path.join(os.path.realpath('.'), 'main.ipynb'))

'main.ipynb' ; 'D:\jupyter_notebooks\TimeSeries_Anomaly_Detection\main.html'
done.
