### LDA Core Mininimal Implementation with Tf-idf and K-means

In addition to the current implementation of LDA Core with Word2Vec and SOM, this notebook provides an alternative method of anomaly detection by encoding a fixed log data file with Tf-idf and identifying log anomalies with the unsupervised machine learning algorithm K-means. 

### Import packages

In [46]:
import os
import time
import nltk
import numpy as np
import logging
import pickle
import sompy
from multiprocessing import Pool
from itertools import product
import pandas as pd
import re
import gensim as gs
import matplotlib.pyplot as plt
from scipy.spatial.distance import cosine
from sklearn.preprocessing import normalize
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans;
from nltk.cluster import KMeansClusterer

### Define our Functions

#### 1. Log Preprocesing

One assumption that all these functions use is that we instantly convert our data into a pandas dataframe that has a "messages" column containing the relevent information for us. 

We then treat each individual log line as a "word", cleaning it by removing all non-alphabet charcters including white spaces. 

In [47]:
def _preprocess(data):
    for col in data.columns:
        if col == "messages":
            data[col] = data[col].apply(_clean_message)
        else:
            data[col] = data[col].apply(to_str)

    data = data.fillna("EMPTY")
    
def _clean_message(line):
    """Remove all none alphabetical characters from message strings."""
    return "".join(
        re.findall("[a-zA-Z]+", line)
    )  # Leaving only a-z in there as numbers add to anomalousness quite a bit

def to_str(x):
    """Convert all non-str lists to string lists for Word2Vec."""
    ret = " ".join([str(y) for y in x]) if isinstance(x, list) else str(x)
    return ret

Function below transfers the original data into ideal dataframe based on user's preference for further analysis.

In [48]:
def merge_logs_by_unit(data, unit = 'log', tfidf = False):
    messages = data[['messages']]
    n_row = data.shape[0]
    df = pd.DataFrame(columns=['messages'])
    result = []

    for i in range(n_row):
        if unit == 'application': 
            message = pd.DataFrame({'messages':[np.array_str(messages.iloc[i].values)]})  
        elif unit == 'log':   
            message = pd.DataFrame({'messages':messages.iloc[i].values})
            message = message.messages.apply(pd.Series)\
                        .merge(message, left_index=True, right_index=True)\
                        .drop(['messages'], axis=1)\
                        .melt(value_name='messages')\
                        .drop('variable', axis=1)\
                        .dropna()
            message['index']=[data.index[i]]*message.shape[0]
            message = message.set_index('index')
        
        _preprocess(message)
        if tfidf:
            temp = message.groupby(["index"])["messages"].agg(lambda x: ' '.join(x.astype(str))).to_frame()
            docs = temp.messages.tolist()
            result.append(docs[0])
        else:
            df = df.append(message)

    return result if tfidf else df

#### 2. Text Encoding  

Here we employ the gensim implementation of Word2Vec to encode our logs as fixed length numerical vectors. Logs are noteably not the natural usecase for Word2Vec, but this appraoch attempts to leverage the fact that logs lines themselves, like words, have a context, so encoding a log based on its co-occurence with other logs does make some intuitive sense.

In [49]:
def create(words, vector_length, window_size):
    """Create new word2vec model."""
    w2vmodel = {}
    for col in words.columns:
        if col in words:
            w2vmodel[col] = gs.models.Word2Vec([list(words[col])], min_count=1, size=vector_length, 
                                     window=window_size, seed=42, workers=1, iter=550,sg=0)
        else:
            #_LOGGER.warning("Skipping key %s as it does not exist in 'words'" % col)
            pass
        
    transformed_data = one_vector(words, w2vmodel)
    transformed_data = transformed_data[:,1:]    
    return w2vmodel, transformed_data

def one_vector(new_D, w2vmodel):
    """Create a single vector from model."""
    transforms = {}
    for col in w2vmodel.keys():
        if col in new_D:
            transforms[col] = w2vmodel[col].wv[new_D[col]]

    new_data = []

    for i in range(len(transforms["messages"])):
        logc = np.array(0)
        for _, c in transforms.items():
            if c.item(i):
                logc = np.append(logc, c[i])
            else:
                logc = np.append(logc, [0, 0, 0, 0, 0])
        new_data.append(logc)

    return np.array(new_data, ndmin=2)



Function to retrain the Word2Vec model when new log message added, so we can then make an inference about the new log message.

In [648]:
def get_encode(w2vmodel, log, data):
    train_with_all = False  # Here we train the new log only
    if not train_with_all:
        w2vmodel['messages'].build_vocab([[log]], update=True)
    
    log = pd.DataFrame({"messages":log},index=[1])
    _preprocess(log)
    
    if log.messages.iloc[0] in list(w2vmodel['messages'].wv.vocab.keys()):
        vector = w2vmodel["messages"].wv[log.messages.iloc[0]]
    else:
        data_new = data.append(log, ignore_index=True)
        w2vmodel_new = create(data_new, 100,3)[0]                                
        vector = w2vmodel_new["messages"].wv[log.messages.iloc[0]]
    
    return vector

Function to employ the sklearn implementation of Tf-idf to encode our logs as numerical vectors.

In [1135]:
def tfidf_encoder(docs):
    
    tfidf_vectorizer=TfidfVectorizer(use_idf=True, lowercase=False)
    tfidf_fit=tfidf_vectorizer.fit(docs)
#     tfidf
    tfidf_transform=tfidf_fit.transform(docs)
    return tfidf_vectorizer, tfidf_transform
#     # idf
#     idf_transform = scipy.sparse.csr_matrix([tfidf_vectorizer.idf_.tolist()] * 56)
#     return tfidf_vectorizer, idf_transform

#### 3. Model Training

Here we employ the sklearn implementation of the K-means to train our model. This function simply makes it a bit easier for the user to interact with the model. This function returns a trained model, the index of the centroids, the distances from data points to the centroids, and the coordinates of the controids.

In [652]:
def train_kmeans_model(n_clusters, transformed_data):
    kmeans = KMeans(n_clusters=n_clusters, init='k-means++', n_init= 20, max_iter=300, n_jobs = None, algorithm="auto")
    fit = kmeans.fit(transformed_data)
    index = fit.predict(transformed_data)
    dist = fit.transform(transformed_data)
    clusters = fit.cluster_centers_
    return fit, index, dist, clusters

#### 4. Generating Anomaly Scores

One of the key elements of this approach is quantifying the distance between our logs and the centroids of the K-means model. The function below performs calculaton for the information of the K-means model needed for model inference / prediction.

In [653]:
def generate_stats_table(index, dist, clusters):
    n_clusters = clusters.shape[0]
    pd1 = pd.DataFrame(dist, columns = range(n_clusters))
    pd2 = pd.DataFrame(index, columns = ['label'])
    dist_df = pd.concat([pd1, pd2], axis=1)    

    cluster = [None] * n_clusters
    stats = [None] * n_clusters
    for i in range(n_clusters):
        cluster[i] = dist_df[dist_df['label']==i][i].tolist()
        stats[i] = [np.min(cluster[i]),np.max(cluster[i]), 
                    np.mean(cluster[i]), np.std(cluster[i])]

    stats = np.asarray(stats)
    pd_stats = pd.DataFrame(stats, columns = ['min','max','mean','sd'])
    return pd_stats

#### 5. Model Inference / Prediction

Here we are making an inference about a list of or a single new log message. This is done by scoring the incoming log and evaluating whether or not it passess a certain threshold value.

In [654]:
def get_anomaly_app_given_sd(multiplier, pd_stats, dist):
    threshold = (multiplier)*pd_stats['sd'].values[0] + pd_stats['mean'].values[0]   
    count = 0
    outliers  = []
    for i, j in enumerate(dist):
        if j > threshold:
            outliers.append(i)
            count += 1
    return outliers, count

In [1262]:
def is_anomaly_w2v(sd_multiplier, log_vector, fit, pd_stats):
    
    clusters = fit.cluster_centers_   
    log_vector = log_vector.reshape(1,len(log_vector))
    index_new = fit.predict(log_vector)
    log_vector = log_vector.reshape(100,)
    dist_new = np.linalg.norm(clusters[index_new]-log_vector)
#     from scipy.stats import normaltest
#     normaltest(cluster[index_new])
#     plot(cluster[index_new])

    threshold = sd_multiplier * pd_stats['sd'].values[index_new[0]] + pd_stats['mean'].values[index_new[0]]   
    
    return dist_new > threshold

In [1263]:
# tfidf
def is_anomaly_tfidf(transformed_data, tfidf_vectorizer, outlier_index, new_log):
    
    unique_log_list = tfidf_vectorizer.get_feature_names()
    if new_log not in unique_log_list: return True

    tfidf_score_table = pd.DataFrame(transformed_data.todense())\
                        .loc[outlier_index].max().reset_index()\
                        .rename({0:'tf-idf score'}, axis=1)

    tfidf_score_table.insert(2, 'log', unique_log_list)

    tfidf_score_table = tfidf_score_table\
                        .sort_values(by=['tf-idf score'], ascending=False)\
                        .drop('index', axis=1).reset_index()

    anomaly_log_list = tfidf_score_table.loc[1:10].log.values.tolist()
    
    return new_log in anomaly_log_list

In [1264]:
# idf
def is_anomaly_tfidf(transformed_data, tfidf_vectorizer, outlier_index, new_log):
    
    unique_log_list = tfidf_vectorizer.get_feature_names()
    if new_log not in unique_log_list: return True
    tfidf_score_table = pd.DataFrame(tfidf_vectorizer.idf_)\
                        .rename({0:'tf-idf score'}, axis=1)
    tfidf_score_table.insert(1, 'log', unique_log_list)
    tfidf_score_table = tfidf_score_table\
                         .sort_values(by=['tf-idf score'], ascending=False)\
                         .reset_index()
    anomaly_log_list = tfidf_score_table.loc[1:10].log.values.tolist()
    
    return new_log in anomaly_log_list

In [1265]:
def show_labels_given_anomaly_log(data, df, anomaly_log):
    anomaly_count = pd.DataFrame()
    for i in range(np.asarray(anomaly_log).shape[0]):
        anomaly_list = np.where(df.messages == anomaly_log[i])[0]
        temp = df.index[anomaly_list].value_counts().to_frame(name='freq')
        anomaly_count = pd.concat([anomaly_count,temp]).groupby(level=0).sum().sort_values('freq', ascending=False)

    label = []
    [label.append(data.loc[anomaly_count.index[i]].label) for i in range(anomaly_count.index.shape[0])]
    anomaly_count['label'] = label
    return anomaly_count

### All in One
Integrate all the steps and functions into the one below, so user can train the model and get inference with one single call. One thing to keep in mind is that the 'tf-idf' encoding method should only be used on 'log' level.

In [1310]:
def lad_with_kmeans(data, encoding_level = 'log', encoding_method = 'w2v', sd = 3, new_log = None):
    assert not(encoding_method == 'tfidf' and encoding_level != 'log'), 'Error, if-idf can only be used on log level!'
    single = new_log is not None
    tfidf = (encoding_method == 'tfidf')
    use_log = (encoding_level == 'log')
    
    if tfidf:
        app_id = data.index.values
        docs = merge_logs_by_unit(data, 'log', tfidf = True)
        tfidf_vectorizer, transformed_data = tfidf_encoder(docs)
    else:
        if single:
            df = data[['messages']]
            df = merge_logs_by_unit(df)
            logs = df.messages.values

            w2vmodel, transformed_data = create(df,100,3)        
        else:
            df = merge_logs_by_unit(data, encoding_level)

            if use_log:
                app_id = df.index.values
                logs = df.messages.values
            else: 
                app_id = data.index.values
                
            transformed_data = create(df, 100,3)[1]  

    n_clusters = 1
    fit, index, dist, clusters = train_kmeans_model(n_clusters, transformed_data)
    pd_stats = generate_stats_table(index, dist, clusters)   
    
    outlier_index, count = get_anomaly_app_given_sd(sd, pd_stats, dist)
    
    if not single:
        if use_log and not tfidf:
            outliers_log = logs[outlier_index]
            outliers_log = pd.DataFrame(outliers_log, columns = ['log'])
            outliers_log = outliers_log['log'].value_counts().to_frame(name='Count')
            anomaly_log_list = outliers_log.index.tolist()

            labels_with_anomaly_freq = show_labels_given_anomaly_log(data, df, anomaly_log_list)
            anomaly_count = int(0.35*labels_with_anomaly_freq.index.shape[0])
            anomaly_table = labels_with_anomaly_freq[['label']].iloc[0:anomaly_count]
        else:
            outliers  = app_id[outlier_index]
            anomaly_count = count
            anomaly_table = data.loc[outliers.tolist(), ['label']]

        return anomaly_count, anomaly_table
    else:
        if tfidf:
            unique_log_list = np.asarray(tfidf_vectorizer.get_feature_names())[outlier_index]

            tfidf_score_table = pd.DataFrame(tfidf_vectorizer.idf_[outlier_index])\
                    .rename({0:'tf-idf score'}, axis=1)
            tfidf_score_table.insert(1, 'log', unique_log_list)
            tfidf_score_table = tfidf_score_table.sort_values(by=['tf-idf score'], ascending=False)\
                    .reset_index()
            anomaly_log_list = tfidf_score_table.loc[1:10].log.values.tolist()        
        else:
            outliers_log = logs[outlier_index]
            outliers_log = pd.DataFrame(outliers_log, columns = ['log'])
            outliers_log = outliers_log['log'].value_counts().to_frame(name='Count')
            anomaly_log_list = outliers_log.index.tolist()
        
        return new_log in anomaly_log_list
    

### Demo

#### 1. Read in dataset

In [1214]:
data_path = '~/Downloads/evaluation_set.dms'
#data_path = 'C:/Users/martu/Downloads/evaluation_set'
file = open(os.path.expanduser(data_path), 'rb')
data = pickle.load(file)
file.close()

In [145]:
write_path = '~/Desktop/Redhat_Proj/'
with open('unique_logs.pickle', 'wb') as filehandle:
    pickle.dump(unique_logs, filehandle)

#### 2. EDA

In [1215]:
x = data.label.value_counts()
for i in x.keys():
    print(i, x[i])

Machine_down 28
Normal 11
Disk_full 9
Network_disconnection 7


#### 3. Predicting with different data type, encoding scheme, and training methods
Example: lad_with_kmeans(data, encoding_level = 'application', encoding_method = 'w2v', sd = 0.5, new_log = 'sdfasf')

Predicting data file with Word2Vec on 'application' level with standard deviation 0.5

In [1314]:
count1, table1 = lad_with_kmeans(data, encoding_level = 'application', encoding_method = 'w2v', sd = 0.5)

collecting all words and their counts
PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
collected 55 word types from a corpus of 55 raw words and 1 sentences
Loading a fresh vocabulary
effective_min_count=1 retains 55 unique words (100% of original 55, drops 0)
effective_min_count=1 leaves 55 word corpus (100% of original 55, drops 0)
deleting the raw counts dictionary of 55 items
sample=0.001 downsamples 55 most-common words
downsampling leaves estimated 15 word corpus (29.0% of prior 55)
estimated required memory for 55 words and 100 dimensions: 71500 bytes
resetting layer weights
training model with 1 workers on 55 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=5 window=3
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 1 : training on 55 raw words (17 effective words) took 0.0s, 7839 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
wo

job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 36 : training on 55 raw words (18 effective words) took 0.0s, 6315 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 37 : training on 55 raw words (11 effective words) took 0.0s, 3848 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 38 : training on 55 raw words (14 effective words) took 0.0s, 5062 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 39 : training on 55 raw words (17 effective words) took 0.0s, 5642 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 40 : train

worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 74 : training on 55 raw words (8 effective words) took 0.0s, 3751 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 75 : training on 55 raw words (11 effective words) took 0.0s, 4738 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 76 : training on 55 raw words (15 effective words) took 0.0s, 6658 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 77 : training on 55 raw words (15 effective words) took 0.0s, 6710 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 78 : training on 55 raw words (14 effectiv

worker thread finished; awaiting finish of 0 more threads
EPOCH - 112 : training on 55 raw words (24 effective words) took 0.0s, 10524 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 113 : training on 55 raw words (18 effective words) took 0.0s, 7776 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 114 : training on 55 raw words (15 effective words) took 0.0s, 6609 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 115 : training on 55 raw words (13 effective words) took 0.0s, 5193 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 116 : training on 55 raw words (20 effective words) took 0.0s, 9252 e

worker thread finished; awaiting finish of 0 more threads
EPOCH - 150 : training on 55 raw words (18 effective words) took 0.0s, 9252 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 151 : training on 55 raw words (13 effective words) took 0.0s, 6110 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 152 : training on 55 raw words (20 effective words) took 0.0s, 7778 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 153 : training on 55 raw words (17 effective words) took 0.0s, 10190 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 154 : training on 55 raw words (15 effective words) took 0.0s, 5732 e

worker thread finished; awaiting finish of 0 more threads
EPOCH - 188 : training on 55 raw words (19 effective words) took 0.0s, 8119 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 189 : training on 55 raw words (18 effective words) took 0.0s, 8027 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 190 : training on 55 raw words (13 effective words) took 0.0s, 6764 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 191 : training on 55 raw words (19 effective words) took 0.0s, 8834 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 192 : training on 55 raw words (15 effective words) took 0.0s, 6922 ef

worker thread finished; awaiting finish of 0 more threads
EPOCH - 226 : training on 55 raw words (15 effective words) took 0.0s, 6750 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 227 : training on 55 raw words (17 effective words) took 0.0s, 10863 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 228 : training on 55 raw words (14 effective words) took 0.0s, 5185 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 229 : training on 55 raw words (17 effective words) took 0.0s, 11587 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 230 : training on 55 raw words (21 effective words) took 0.0s, 9466 

worker thread finished; awaiting finish of 0 more threads
EPOCH - 264 : training on 55 raw words (17 effective words) took 0.0s, 6781 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 265 : training on 55 raw words (13 effective words) took 0.0s, 5382 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 266 : training on 55 raw words (16 effective words) took 0.0s, 9717 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 267 : training on 55 raw words (20 effective words) took 0.0s, 11418 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 268 : training on 55 raw words (10 effective words) took 0.0s, 4038 e

worker thread finished; awaiting finish of 0 more threads
EPOCH - 302 : training on 55 raw words (20 effective words) took 0.0s, 9135 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 303 : training on 55 raw words (13 effective words) took 0.0s, 5059 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 304 : training on 55 raw words (16 effective words) took 0.0s, 6640 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 305 : training on 55 raw words (21 effective words) took 0.0s, 10291 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 306 : training on 55 raw words (17 effective words) took 0.0s, 6630 e

worker thread finished; awaiting finish of 0 more threads
EPOCH - 340 : training on 55 raw words (17 effective words) took 0.0s, 6084 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 341 : training on 55 raw words (18 effective words) took 0.0s, 6806 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 342 : training on 55 raw words (19 effective words) took 0.0s, 6565 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 343 : training on 55 raw words (24 effective words) took 0.0s, 12494 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 344 : training on 55 raw words (17 effective words) took 0.0s, 8767 e

worker thread finished; awaiting finish of 0 more threads
EPOCH - 378 : training on 55 raw words (17 effective words) took 0.0s, 5894 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 379 : training on 55 raw words (18 effective words) took 0.0s, 7046 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 380 : training on 55 raw words (16 effective words) took 0.0s, 7840 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 381 : training on 55 raw words (13 effective words) took 0.0s, 6587 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 382 : training on 55 raw words (14 effective words) took 0.0s, 5060 ef

worker thread finished; awaiting finish of 0 more threads
EPOCH - 416 : training on 55 raw words (10 effective words) took 0.0s, 3671 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 417 : training on 55 raw words (13 effective words) took 0.0s, 4967 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 418 : training on 55 raw words (22 effective words) took 0.0s, 7888 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 419 : training on 55 raw words (21 effective words) took 0.0s, 10627 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 420 : training on 55 raw words (13 effective words) took 0.0s, 5022 e

worker thread finished; awaiting finish of 0 more threads
EPOCH - 454 : training on 55 raw words (22 effective words) took 0.0s, 8561 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 455 : training on 55 raw words (13 effective words) took 0.0s, 6396 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 456 : training on 55 raw words (14 effective words) took 0.0s, 5741 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 457 : training on 55 raw words (14 effective words) took 0.0s, 6646 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 458 : training on 55 raw words (15 effective words) took 0.0s, 6242 ef

worker thread finished; awaiting finish of 0 more threads
EPOCH - 492 : training on 55 raw words (12 effective words) took 0.0s, 4758 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 493 : training on 55 raw words (13 effective words) took 0.0s, 6613 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 494 : training on 55 raw words (18 effective words) took 0.0s, 6419 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 495 : training on 55 raw words (14 effective words) took 0.0s, 6183 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 496 : training on 55 raw words (15 effective words) took 0.0s, 6251 ef

worker thread finished; awaiting finish of 0 more threads
EPOCH - 530 : training on 55 raw words (10 effective words) took 0.0s, 5028 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 531 : training on 55 raw words (23 effective words) took 0.0s, 7858 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 532 : training on 55 raw words (12 effective words) took 0.0s, 5967 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 533 : training on 55 raw words (15 effective words) took 0.0s, 8541 effective words/s
job loop exiting, total 1 jobs
worker exiting, processed 1 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 534 : training on 55 raw words (17 effective words) took 0.0s, 7619 ef

In [1315]:
print('Total number of identified anomaly logs: '+'\033[1m'+str(count1)+'\033[0m')
display(table1)

Total number of identified anomaly logs: [1m12[0m


Unnamed: 0,label
application_1445087491445_0002,Machine_down
application_1445144423722_0022,Network_disconnection
application_1445182159119_0014,Disk_full
application_1445062781478_0012,Machine_down
application_1445087491445_0010,Machine_down
application_1445076437777_0001,Machine_down
application_1445094324383_0004,Machine_down
application_1445182159119_0012,Normal
application_1445062781478_0011,Normal
application_1445087491445_0001,Machine_down


Predicting data file with Word2Vec on 'log' level with standard deviation 3

In [1316]:
count2, table2 = lad_with_kmeans(data, encoding_level = 'log', encoding_method = 'w2v', sd = 3)

collecting all words and their counts
PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
collected 1203 word types from a corpus of 394310 raw words and 1 sentences
Loading a fresh vocabulary
effective_min_count=1 retains 1203 unique words (100% of original 1203, drops 0)
effective_min_count=1 leaves 394310 word corpus (100% of original 394310, drops 0)
deleting the raw counts dictionary of 1203 items
sample=0.001 downsamples 46 most-common words
downsampling leaves estimated 169640 word corpus (43.0% of prior 394310)
estimated required memory for 1203 words and 100 dimensions: 1563900 bytes
resetting layer weights
training model with 1 workers on 1203 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=5 window=3
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 1 : training on 394310 raw words (10000 effective words) took 0.0s, 1021613 effective words/s
job loop exiting, tota

worker thread finished; awaiting finish of 0 more threads
EPOCH - 34 : training on 394310 raw words (10000 effective words) took 0.0s, 646969 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 35 : training on 394310 raw words (10000 effective words) took 0.0s, 648371 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 36 : training on 394310 raw words (10000 effective words) took 0.0s, 653419 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 37 : training on 394310 raw words (10000 effective words) took 0.0s, 968920 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 38 : training on 394310 raw words (100

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 71 : training on 394310 raw words (10000 effective words) took 0.0s, 850376 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 72 : training on 394310 raw words (10000 effective words) took 0.0s, 1009964 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 73 : training on 394310 raw words (10000 effective words) took 0.0s, 652698 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 74 : training on 394310 raw words (10000 effective words) took 0.0s, 875317 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish

EPOCH - 107 : training on 394310 raw words (10000 effective words) took 0.0s, 588390 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 108 : training on 394310 raw words (10000 effective words) took 0.0s, 754225 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 109 : training on 394310 raw words (10000 effective words) took 0.0s, 605538 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 110 : training on 394310 raw words (10000 effective words) took 0.0s, 604129 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 111 : training on 394310 raw words (10000 effective words) took 0.0s, 625745 effective words

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 144 : training on 394310 raw words (10000 effective words) took 0.0s, 1043785 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 145 : training on 394310 raw words (10000 effective words) took 0.0s, 663474 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 146 : training on 394310 raw words (10000 effective words) took 0.0s, 708634 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 147 : training on 394310 raw words (10000 effective words) took 0.0s, 671071 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting fi

EPOCH - 180 : training on 394310 raw words (10000 effective words) took 0.0s, 681706 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 181 : training on 394310 raw words (10000 effective words) took 0.0s, 1156280 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 182 : training on 394310 raw words (10000 effective words) took 0.0s, 740753 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 183 : training on 394310 raw words (10000 effective words) took 0.0s, 740218 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 184 : training on 394310 raw words (10000 effective words) took 0.0s, 686324 effective word

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 217 : training on 394310 raw words (10000 effective words) took 0.0s, 943051 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 218 : training on 394310 raw words (10000 effective words) took 0.0s, 712809 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 219 : training on 394310 raw words (10000 effective words) took 0.0s, 672797 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 220 : training on 394310 raw words (10000 effective words) took 0.0s, 626131 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting fin

EPOCH - 253 : training on 394310 raw words (10000 effective words) took 0.0s, 746002 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 254 : training on 394310 raw words (10000 effective words) took 0.0s, 693302 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 255 : training on 394310 raw words (10000 effective words) took 0.0s, 685365 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 256 : training on 394310 raw words (10000 effective words) took 0.0s, 661701 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 257 : training on 394310 raw words (10000 effective words) took 0.0s, 994846 effective words

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 290 : training on 394310 raw words (10000 effective words) took 0.0s, 1034159 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 291 : training on 394310 raw words (10000 effective words) took 0.0s, 753140 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 292 : training on 394310 raw words (10000 effective words) took 0.0s, 707952 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 293 : training on 394310 raw words (10000 effective words) took 0.0s, 709964 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting fi

EPOCH - 326 : training on 394310 raw words (10000 effective words) took 0.0s, 706826 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 327 : training on 394310 raw words (10000 effective words) took 0.0s, 714117 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 328 : training on 394310 raw words (10000 effective words) took 0.0s, 733881 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 329 : training on 394310 raw words (10000 effective words) took 0.0s, 777375 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 330 : training on 394310 raw words (10000 effective words) took 0.0s, 730106 effective words

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 363 : training on 394310 raw words (10000 effective words) took 0.0s, 745976 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 364 : training on 394310 raw words (10000 effective words) took 0.0s, 760119 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 365 : training on 394310 raw words (10000 effective words) took 0.0s, 682799 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 366 : training on 394310 raw words (10000 effective words) took 0.0s, 718255 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting fin

EPOCH - 399 : training on 394310 raw words (10000 effective words) took 0.0s, 749366 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 400 : training on 394310 raw words (10000 effective words) took 0.0s, 754008 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 401 : training on 394310 raw words (10000 effective words) took 0.0s, 754527 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 402 : training on 394310 raw words (10000 effective words) took 0.0s, 761064 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 403 : training on 394310 raw words (10000 effective words) took 0.0s, 734264 effective words

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 436 : training on 394310 raw words (10000 effective words) took 0.0s, 1041222 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 437 : training on 394310 raw words (10000 effective words) took 0.0s, 746203 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 438 : training on 394310 raw words (10000 effective words) took 0.0s, 753838 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 439 : training on 394310 raw words (10000 effective words) took 0.0s, 782590 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting fi

EPOCH - 472 : training on 394310 raw words (10000 effective words) took 0.0s, 785888 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 473 : training on 394310 raw words (10000 effective words) took 0.0s, 743757 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 474 : training on 394310 raw words (10000 effective words) took 0.0s, 1125062 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 475 : training on 394310 raw words (10000 effective words) took 0.0s, 783762 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 476 : training on 394310 raw words (10000 effective words) took 0.0s, 792222 effective word

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 509 : training on 394310 raw words (10000 effective words) took 0.0s, 726300 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 510 : training on 394310 raw words (10000 effective words) took 0.0s, 771201 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 511 : training on 394310 raw words (10000 effective words) took 0.0s, 1185581 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 512 : training on 394310 raw words (10000 effective words) took 0.0s, 745545 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting fi

EPOCH - 545 : training on 394310 raw words (10000 effective words) took 0.0s, 699930 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 546 : training on 394310 raw words (10000 effective words) took 0.0s, 668414 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 547 : training on 394310 raw words (10000 effective words) took 0.0s, 677959 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 548 : training on 394310 raw words (10000 effective words) took 0.0s, 926223 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 549 : training on 394310 raw words (10000 effective words) took 0.0s, 718210 effective words

In [1317]:
print('Total number of identified anomaly logs: '+'\033[1m'+str(count2)+'\033[0m')
display(table2)

Total number of identified anomaly logs: [1m19[0m


Unnamed: 0,label
application_1445144423722_0020,Network_disconnection
application_1445087491445_0004,Machine_down
application_1445087491445_0005,Normal
application_1445062781478_0013,Machine_down
application_1445087491445_0001,Machine_down
application_1445087491445_0003,Machine_down
application_1445087491445_0002,Machine_down
application_1445094324383_0005,Machine_down
application_1445062781478_0018,Machine_down
application_1445182159119_0002,Disk_full


Predicting data file with Tf-idf on 'log' level with standard deviation 0.1

In [1318]:
count3, table3 = lad_with_kmeans(data, encoding_level = 'log', encoding_method = 'tfidf', sd = 0.1)

In [1319]:
print('Total number of identified anomaly logs: '+'\033[1m'+str(count3)+'\033[0m')
display(table3)

Total number of identified anomaly logs: [1m15[0m


Unnamed: 0,label
application_1445087491445_0002,Machine_down
application_1445062781478_0012,Machine_down
application_1445062781478_0013,Machine_down
application_1445144423722_0023,Network_disconnection
application_1445144423722_0024,Normal
application_1445182159119_0016,Machine_down
application_1445062781478_0017,Machine_down
application_1445175094696_0003,Network_disconnection
application_1445087491445_0008,Machine_down
application_1445175094696_0004,Network_disconnection


Predicting single log with Word2Vec on 'log' level with standard deviation 3

In [1323]:
new_logline = 'CausedbyjavanetNoRouteToHostExceptionNoRoutetoHostfromMININTFNANLItomsrasafailedonsockettimeoutexceptionjavanetNoRouteToHostExceptionNoroutetohostnofurtherinformationFormoredetailsseehttpwikiapacheorghadoopNoRouteToHost'

In [1321]:
is_anomaly1 = lad_with_kmeans(data, encoding_level = 'log', encoding_method = 'w2v', sd = 3, new_log = new_logline)

collecting all words and their counts
PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
collected 1203 word types from a corpus of 394310 raw words and 1 sentences
Loading a fresh vocabulary
effective_min_count=1 retains 1203 unique words (100% of original 1203, drops 0)
effective_min_count=1 leaves 394310 word corpus (100% of original 394310, drops 0)
deleting the raw counts dictionary of 1203 items
sample=0.001 downsamples 46 most-common words
downsampling leaves estimated 169640 word corpus (43.0% of prior 394310)
estimated required memory for 1203 words and 100 dimensions: 1563900 bytes
resetting layer weights
training model with 1 workers on 1203 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=5 window=3
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 1 : training on 394310 raw words (10000 effective words) took 0.0s, 671068 effective words/s
job loop exiting, total

worker thread finished; awaiting finish of 0 more threads
EPOCH - 34 : training on 394310 raw words (10000 effective words) took 0.0s, 833340 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 35 : training on 394310 raw words (10000 effective words) took 0.0s, 628119 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 36 : training on 394310 raw words (10000 effective words) took 0.0s, 621837 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 37 : training on 394310 raw words (10000 effective words) took 0.0s, 675716 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 38 : training on 394310 raw words (100

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 71 : training on 394310 raw words (10000 effective words) took 0.0s, 653285 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 72 : training on 394310 raw words (10000 effective words) took 0.0s, 668590 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 73 : training on 394310 raw words (10000 effective words) took 0.0s, 659077 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 74 : training on 394310 raw words (10000 effective words) took 0.0s, 674758 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish 

EPOCH - 107 : training on 394310 raw words (10000 effective words) took 0.0s, 706180 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 108 : training on 394310 raw words (10000 effective words) took 0.0s, 728437 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 109 : training on 394310 raw words (10000 effective words) took 0.0s, 1143181 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 110 : training on 394310 raw words (10000 effective words) took 0.0s, 752594 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 111 : training on 394310 raw words (10000 effective words) took 0.0s, 722162 effective word

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 144 : training on 394310 raw words (10000 effective words) took 0.0s, 706198 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 145 : training on 394310 raw words (10000 effective words) took 0.0s, 684466 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 146 : training on 394310 raw words (10000 effective words) took 0.0s, 413384 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 147 : training on 394310 raw words (10000 effective words) took 0.0s, 650732 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting fin

EPOCH - 180 : training on 394310 raw words (10000 effective words) took 0.0s, 721846 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 181 : training on 394310 raw words (10000 effective words) took 0.0s, 938927 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 182 : training on 394310 raw words (10000 effective words) took 0.0s, 731535 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 183 : training on 394310 raw words (10000 effective words) took 0.0s, 709120 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 184 : training on 394310 raw words (10000 effective words) took 0.0s, 748831 effective words

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 217 : training on 394310 raw words (10000 effective words) took 0.0s, 761470 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 218 : training on 394310 raw words (10000 effective words) took 0.0s, 717997 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 219 : training on 394310 raw words (10000 effective words) took 0.0s, 737609 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 220 : training on 394310 raw words (10000 effective words) took 0.0s, 722380 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting fin

EPOCH - 253 : training on 394310 raw words (10000 effective words) took 0.0s, 980213 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 254 : training on 394310 raw words (10000 effective words) took 0.0s, 745668 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 255 : training on 394310 raw words (10000 effective words) took 0.0s, 725004 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 256 : training on 394310 raw words (10000 effective words) took 0.0s, 933600 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 257 : training on 394310 raw words (10000 effective words) took 0.0s, 694978 effective words

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 290 : training on 394310 raw words (10000 effective words) took 0.0s, 1156103 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 291 : training on 394310 raw words (10000 effective words) took 0.0s, 698701 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 292 : training on 394310 raw words (10000 effective words) took 0.0s, 932253 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 293 : training on 394310 raw words (10000 effective words) took 0.0s, 735192 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting fi

EPOCH - 326 : training on 394310 raw words (10000 effective words) took 0.0s, 698642 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 327 : training on 394310 raw words (10000 effective words) took 0.0s, 750165 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 328 : training on 394310 raw words (10000 effective words) took 0.0s, 755331 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 329 : training on 394310 raw words (10000 effective words) took 0.0s, 736403 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 330 : training on 394310 raw words (10000 effective words) took 0.0s, 746239 effective words

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 363 : training on 394310 raw words (10000 effective words) took 0.0s, 676419 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 364 : training on 394310 raw words (10000 effective words) took 0.0s, 904268 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 365 : training on 394310 raw words (10000 effective words) took 0.0s, 664504 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 366 : training on 394310 raw words (10000 effective words) took 0.0s, 922033 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting fin

EPOCH - 399 : training on 394310 raw words (10000 effective words) took 0.0s, 685585 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 400 : training on 394310 raw words (10000 effective words) took 0.0s, 706010 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 401 : training on 394310 raw words (10000 effective words) took 0.0s, 759705 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 402 : training on 394310 raw words (10000 effective words) took 0.0s, 737966 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 403 : training on 394310 raw words (10000 effective words) took 0.0s, 755511 effective words

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 436 : training on 394310 raw words (10000 effective words) took 0.0s, 740748 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 437 : training on 394310 raw words (10000 effective words) took 0.0s, 779009 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 438 : training on 394310 raw words (10000 effective words) took 0.0s, 791733 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 439 : training on 394310 raw words (10000 effective words) took 0.0s, 725314 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting fin

EPOCH - 472 : training on 394310 raw words (10000 effective words) took 0.0s, 782584 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 473 : training on 394310 raw words (10000 effective words) took 0.0s, 1008468 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 474 : training on 394310 raw words (10000 effective words) took 0.0s, 989085 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 475 : training on 394310 raw words (10000 effective words) took 0.0s, 769442 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 476 : training on 394310 raw words (10000 effective words) took 0.0s, 742171 effective word

job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 509 : training on 394310 raw words (10000 effective words) took 0.0s, 730275 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 510 : training on 394310 raw words (10000 effective words) took 0.0s, 776888 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 511 : training on 394310 raw words (10000 effective words) took 0.0s, 759094 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 512 : training on 394310 raw words (10000 effective words) took 0.0s, 757572 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting fin

EPOCH - 545 : training on 394310 raw words (10000 effective words) took 0.0s, 737268 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 546 : training on 394310 raw words (10000 effective words) took 0.0s, 766595 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 547 : training on 394310 raw words (10000 effective words) took 0.0s, 1023634 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 548 : training on 394310 raw words (10000 effective words) took 0.0s, 1005381 effective words/s
job loop exiting, total 2 jobs
worker exiting, processed 2 jobs
worker thread finished; awaiting finish of 0 more threads
EPOCH - 549 : training on 394310 raw words (10000 effective words) took 0.0s, 769556 effective wor

53.06823992729187


In [1322]:
print('Log "'+'\033[1m'+new_logline+'\033[0m'+'" is anomaly: '+'\033[1m'+str(is_anomaly1)+'\033[0m')

Log "[1mINFOmainorgapachehadoopmapredYarnChildSleepingformsbeforeretryingagainGotnullnow[0m" is anomaly: [1mTrue[0m


Predicting single log with Tf-idf on 'log' level with standard deviation 0.5

In [1324]:
is_anomaly2 = lad_with_kmeans(data, encoding_level = 'log', encoding_method = 'tfidf', sd = 0.1, new_log = new_logline)

In [1325]:
print('Log "'+'\033[1m'+new_logline+'\033[0m'+'" is anomaly: '+'\033[1m'+str(is_anomaly2)+'\033[0m')

Log "[1mCausedbyjavanetNoRouteToHostExceptionNoRoutetoHostfromMININTFNANLItomsrasafailedonsockettimeoutexceptionjavanetNoRouteToHostExceptionNoroutetohostnofurtherinformationFormoredetailsseehttpwikiapacheorghadoopNoRouteToHost[0m" is anomaly: [1mTrue[0m
