embeddings.json #7

stephen-hayne · 2022-05-12T17:53:37Z

I'm trying to reproduce your results (like another poster here)...

Perhaps a silly question, but after downloading the HDFS and BGL datasets, running them through Drain, I'm now getting this error - can you advise how/where to get your "embeddings.json" file?

python3 main_run.py --folder=hdfs/ --log_file=HDFS.log --dataset_name=hdfs --device=cpu --model_name=deeplog --window_type=session --sample=sliding_window --is_logkey --train_size=0.4 --train_ratio=1 --valid_ratio=0.1 --test_ratio=1 --max_epoch=100 --n_warm_up_epoch=0 --n_epochs_stop=10 --batch_size=1024 --num_candidates=70 --history_size=10 --lr=0.001 --accumulation_step=5 --session_level=hour --window_size=50 --step_size=50 --output_dir=experimental_results/deeplog/session/cd2 --is_process
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Loading ./dataset/hdfs/HDFS.log_structured.csv
575061it [00:00, 1983685.17it/s]
11175629it [00:19, 566251.66it/s]
Save options parameters
vocab size 20
save vocab in experimental_results/deeplog/session/cd2hdfs/deeplog_vocab.pkl
Loading vocab
20
Loading train dataset

Traceback (most recent call last):
  File "main_run.py", line 213, in <module>
    main()
  File "main_run.py", line 195, in main
    run_deeplog(options)
  File "/stephen/LogADEmpirical/logadempirical/deeplog.py", line 26, in run_deeplog
    Trainer(options).start_train()
  File "/stephen/LogADEmpirical/logadempirical/logdeep/tools/train.py", line 101, in __init__
    train_logs, train_labels = sliding_window(data,
  File "/stephen/LogADEmpirical/logadempirical/logdeep/dataset/sample.py", line 108, in sliding_window
    event2semantic_vec = read_json(os.path.join(data_dir, e_name))
  File "/stephen/LogADEmpirical/logadempirical/logdeep/dataset/sample.py", line 14, in read_json
    with open(filename, 'r') as load_f:
FileNotFoundError: [Errno 2] No such file or directory: './dataset/hdfs/embeddings.json'

The text was updated successfully, but these errors were encountered:

vanhoanglepsa · 2022-05-16T01:41:25Z

Hi,
Currently, we adopt LogRobust to generate the embedding file. For now, it isn't included in this repository. We will try to start updating this part in the next week.

stephen-hayne · 2022-05-16T14:33:28Z

I have read "Robust Log-Based Anomaly Detection on Unstable Log Data" and "Log-based Anomaly Detection Without Log Parsing" with interest (as well as several of the others in the citations).

Will LogRobust be put on github? Or just the data you generated?

vanhoanglepsa · 2022-05-17T08:56:44Z

We will add the code to generate embeddings in this repository, not only the generated data.

X-zhihao · 2022-08-13T10:43:37Z

How can we get this HDFS.log_structured.csv?

souravs17031999 · 2023-01-22T04:42:45Z

@vanhoanglepsa is the code updated to generate embeddings for generic log data ?
@stephen-hayne were you able to resolve this issue ? I am having same issue.

stephen-hayne · 2023-01-27T16:43:24Z

@souravs17031999 No, this issue is not resolved.
@vanhoanglepsa Can you please help us to reproduce your work?

pupuu555 · 2023-04-07T01:40:49Z

hi,have you add the code to generate embeddings in this repository? I haven't find the file, could you please tell me how to generate the embeddings?Or give me the embeddings.json? Thank you so much!!

stephen-hayne · 2023-04-07T15:52:27Z

Yes - I haven't been able to find the file you mentioned either... *I have succeeded generated the embedding.json by this code. Hope it helps!* *https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py <https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py>*

…

-- Dr. Stephen C. Hayne, Professor Emeritus (CSU) Cyber Security and Information Systems Consultant I love to fly formation! Nanchang - 443LM "http://selfsynchronize.com/hayne/plane/"

On Thu, Apr 6, 2023 at 7:41 PM pupuu555 ***@***.***> wrote: hi,have you add the code to generate embeddings in this repository? I haven't find the file, could you please tell me how to generate the embeddings?Or give me the embeddings.json? Thank you so much!! — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEKIS2PHXIS4WRBZDJVNEWLW75WCZANCNFSM5VZAXOPQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

xichie · 2023-04-08T09:17:19Z

Hi, the code following is my used to generate embedding.json. Hope it helps!

from logadempirical.PLELog.data.Embedding import *
from logadempirical.PLELog.data.DataLoader import *
import logging
import json

class NumpyEncoder(json.JSONEncoder):
    """ Special json encoder for numpy types """
    def default(self, obj):
        if isinstance(obj, (np.int_, np.intc, np.intp, np.int8,
                            np.int16, np.int32, np.int64, np.uint8,
                            np.uint16, np.uint32, np.uint64)):
            return int(obj)
        elif isinstance(obj, (np.float_, np.float16, np.float32,
                              np.float64)):
            return float(obj)
        elif isinstance(obj, (np.ndarray,)):
            return obj.tolist()
        return json.JSONEncoder.default(self, obj)
    
# Specify logger
logger = logging.getLogger('embedding')
logger.setLevel(logging.INFO)
dataset = 'bgl'
save_path = './dataset/bgl'
templatesDir = './dataset/bgl'
log_file = 'BGL_all.log'
logID2Temp, templates = load_templates_from_structured(templatesDir, logger, dataset,
                                                               log_file=log_file)
templateVocab = nlp_emb_mergeTemplateEmbeddings_BGL(save_path, templates, dataset, logger)

with open(os.path.join(save_path, 'templates_BGL.vec'), 'r', encoding='utf-8') as reader:
    templateVocab = {}
    line_num = 0
    for line in reader.readlines():
        if line_num == 0:
            vocabSize, embedSize = [int(x) for x in line.strip().split()]
        else:
            items = line.strip().split()
            if len(items) != embedSize + 1: continue
            template_word, template_embedding = items[0], np.asarray(items[1:], dtype=np.float64)
            for logID, temp in logID2Temp.items():
                if temp == template_word:
                    templateVocab[logID] = template_embedding
        line_num += 1
    replica_logIDs = []
    for logId in logID2Temp.keys():
        if logID not in templateVocab.keys():
            replica_logIDs.append(logID)
    # 有重复的template
    for logID in replica_logIDs:  
        temp = logID2Temp[logID]
        line_num = 0
        for line in reader.readlines():
            if line_num == 0:
                vocabSize, embedSize = [int(x) for x in line.strip().split()]
            else:
                items = line.strip().split()
                if len(items) != embedSize + 1: continue
                template_word, template_embedding = items[0], np.asarray(items[1:], dtype=np.float64)
                if temp == template_word:
                    templateVocab[logID] = template_embedding
            line_num += 1 
with open(os.path.join(save_path, 'embeddings.json'), 'w') as writer:
    json.dump(templateVocab, writer, cls=NumpyEncoder)

pupuu555 · 2023-04-10T13:35:15Z

是的 - 我也找不到你提到的文件...... 我已经通过这段代码成功生成了 embedding.json。希望对您有所帮助！ * https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py < https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py>*
……
-- Dr. Stephen C. Hayne, Professor Emeritus (CSU) Cyber Security and Information Systems Consultant I love to fly formation! Nanchang - 443LM "http://selfsynchronize.com/hayne/plane/"
On Thu, Apr 6, 2023 at 7:41 PM pupuu555 @.> wrote: hi,have you add the code to generate embeddings in this repository? I haven't find the file, could you please tell me how to generate the embeddings?Or give me the embeddings.json? Thank you so much!! — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKIS2PHXIS4WRBZDJVNEWLW75WCZANCNFSM5VZAXOPQ . You are receiving this because you were mentioned.Message ID: @.>

Thank you sooooo much！！！！！！

pupuu555 · 2023-04-10T13:36:00Z

您好，下面的代码是我用来生成 embedding.json 的。希望能帮助到你！

from logadempirical.PLELog.data.Embedding import *
from logadempirical.PLELog.data.DataLoader import *
import logging
import json

class NumpyEncoder(json.JSONEncoder):
    """ Special json encoder for numpy types """
    def default(self, obj):
        if isinstance(obj, (np.int_, np.intc, np.intp, np.int8,
                            np.int16, np.int32, np.int64, np.uint8,
                            np.uint16, np.uint32, np.uint64)):
            return int(obj)
        elif isinstance(obj, (np.float_, np.float16, np.float32,
                              np.float64)):
            return float(obj)
        elif isinstance(obj, (np.ndarray,)):
            return obj.tolist()
        return json.JSONEncoder.default(self, obj)
    
# Specify logger
logger = logging.getLogger('embedding')
logger.setLevel(logging.INFO)
dataset = 'bgl'
save_path = './dataset/bgl'
templatesDir = './dataset/bgl'
log_file = 'BGL_all.log'
logID2Temp, templates = load_templates_from_structured(templatesDir, logger, dataset,
                                                               log_file=log_file)
templateVocab = nlp_emb_mergeTemplateEmbeddings_BGL(save_path, templates, dataset, logger)

with open(os.path.join(save_path, 'templates_BGL.vec'), 'r', encoding='utf-8') as reader:
    templateVocab = {}
    line_num = 0
    for line in reader.readlines():
        if line_num == 0:
            vocabSize, embedSize = [int(x) for x in line.strip().split()]
        else:
            items = line.strip().split()
            if len(items) != embedSize + 1: continue
            template_word, template_embedding = items[0], np.asarray(items[1:], dtype=np.float64)
            for logID, temp in logID2Temp.items():
                if temp == template_word:
                    templateVocab[logID] = template_embedding
        line_num += 1
    replica_logIDs = []
    for logId in logID2Temp.keys():
        if logID not in templateVocab.keys():
            replica_logIDs.append(logID)
    # 有重复的template
    for logID in replica_logIDs:  
        temp = logID2Temp[logID]
        line_num = 0
        for line in reader.readlines():
            if line_num == 0:
                vocabSize, embedSize = [int(x) for x in line.strip().split()]
            else:
                items = line.strip().split()
                if len(items) != embedSize + 1: continue
                template_word, template_embedding = items[0], np.asarray(items[1:], dtype=np.float64)
                if temp == template_word:
                    templateVocab[logID] = template_embedding
            line_num += 1 
with open(os.path.join(save_path, 'embeddings.json'), 'w') as writer:
    json.dump(templateVocab, writer, cls=NumpyEncoder)

Thank you sooooo much！！！！好人一生平安

pupuu555 · 2023-04-11T04:07:45Z

Hi, the code following is my used to generate embedding.json. Hope it helps!

from logadempirical.PLELog.data.Embedding import *
from logadempirical.PLELog.data.DataLoader import *
import logging
import json

class NumpyEncoder(json.JSONEncoder):
    """ Special json encoder for numpy types """
    def default(self, obj):
        if isinstance(obj, (np.int_, np.intc, np.intp, np.int8,
                            np.int16, np.int32, np.int64, np.uint8,
                            np.uint16, np.uint32, np.uint64)):
            return int(obj)
        elif isinstance(obj, (np.float_, np.float16, np.float32,
                              np.float64)):
            return float(obj)
        elif isinstance(obj, (np.ndarray,)):
            return obj.tolist()
        return json.JSONEncoder.default(self, obj)
    
# Specify logger
logger = logging.getLogger('embedding')
logger.setLevel(logging.INFO)
dataset = 'bgl'
save_path = './dataset/bgl'
templatesDir = './dataset/bgl'
log_file = 'BGL_all.log'
logID2Temp, templates = load_templates_from_structured(templatesDir, logger, dataset,
                                                               log_file=log_file)
templateVocab = nlp_emb_mergeTemplateEmbeddings_BGL(save_path, templates, dataset, logger)

with open(os.path.join(save_path, 'templates_BGL.vec'), 'r', encoding='utf-8') as reader:
    templateVocab = {}
    line_num = 0
    for line in reader.readlines():
        if line_num == 0:
            vocabSize, embedSize = [int(x) for x in line.strip().split()]
        else:
            items = line.strip().split()
            if len(items) != embedSize + 1: continue
            template_word, template_embedding = items[0], np.asarray(items[1:], dtype=np.float64)
            for logID, temp in logID2Temp.items():
                if temp == template_word:
                    templateVocab[logID] = template_embedding
        line_num += 1
    replica_logIDs = []
    for logId in logID2Temp.keys():
        if logID not in templateVocab.keys():
            replica_logIDs.append(logID)
    # 有重复的template
    for logID in replica_logIDs:  
        temp = logID2Temp[logID]
        line_num = 0
        for line in reader.readlines():
            if line_num == 0:
                vocabSize, embedSize = [int(x) for x in line.strip().split()]
            else:
                items = line.strip().split()
                if len(items) != embedSize + 1: continue
                template_word, template_embedding = items[0], np.asarray(items[1:], dtype=np.float64)
                if temp == template_word:
                    templateVocab[logID] = template_embedding
            line_num += 1 
with open(os.path.join(save_path, 'embeddings.json'), 'w') as writer:
    json.dump(templateVocab, writer, cls=NumpyEncoder)

hi,When i ran the file you gave me,I met a new issue: FileNotFoundError: [Errno 2] No such file or directory: 'dataset/nlp-word.vec',how can i get the nlp-word.vec? I don't find a way to genearate this file in the code.

sailormoon-c · 2023-04-14T20:18:22Z

是的 - 我也找不到你提到的文件...... 我已经通过这段代码成功生成了 embedding.json。希望对您有所帮助！ * https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py < https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py>*
……
-- Dr. Stephen C. Hayne, Professor Emeritus (CSU) Cyber Security and Information Systems Consultant I love to fly formation! Nanchang - 443LM "http://selfsynchronize.com/hayne/plane/"
On Thu, Apr 6, 2023 at 7:41 PM pupuu555 @.> wrote: hi,have you add the code to generate embeddings in this repository? I haven't find the file, could you please tell me how to generate the embeddings?Or give me the embeddings.json? Thank you so much!! — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKIS2PHXIS4WRBZDJVNEWLW75WCZANCNFSM5VZAXOPQ . You are receiving this because you were mentioned.Message ID: _@**.**_>

Thank you sooooo much！！！！！！

可以加个微信吗？我最近被这个项目搞得头都快炸掉了，拜托拜托，我的微信是：RainyloveStatic

xichie · 2023-04-16T06:33:04Z

是的 - 我也找不到你提到的文件...... 我已经通过这段代码成功生成了 embedding.json。希望对您有所帮助！ * https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py < https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py>*
……
-- Dr. Stephen C. Hayne, Professor Emeritus (CSU) Cyber Security and Information Systems Consultant I love to fly formation! Nanchang - 443LM "http://selfsynchronize.com/hayne/plane/"
On Thu, Apr 6, 2023 at 7:41 PM pupuu555 @.> wrote: hi,have you add the code to generate embeddings in this repository? I haven't find the file, could you please tell me how to generate the embeddings?Or give me the embeddings.json? Thank you so much!! — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKIS2PHXIS4WRBZDJVNEWLW75WCZANCNFSM5VZAXOPQ . You are receiving this because you were mentioned.Message ID: _@**.**_>

Thank you sooooo much！！！！！！

可以加个微信吗？我最近被这个项目搞得头都快炸掉了，拜托拜托，我的微信是：RainyloveStatic
You can download nlp-word.vec by following:
https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embeddings.json #7

embeddings.json #7

stephen-hayne commented May 12, 2022 •

edited

Loading

vanhoanglepsa commented May 16, 2022

stephen-hayne commented May 16, 2022

vanhoanglepsa commented May 17, 2022

X-zhihao commented Aug 13, 2022 •

edited

Loading

souravs17031999 commented Jan 22, 2023 •

edited

Loading

stephen-hayne commented Jan 27, 2023

pupuu555 commented Apr 7, 2023

stephen-hayne commented Apr 7, 2023 via email

xichie commented Apr 8, 2023 •

edited

Loading

pupuu555 commented Apr 10, 2023

pupuu555 commented Apr 10, 2023

pupuu555 commented Apr 11, 2023

sailormoon-c commented Apr 14, 2023

xichie commented Apr 16, 2023

embeddings.json #7

embeddings.json #7

Comments

stephen-hayne commented May 12, 2022 • edited Loading

vanhoanglepsa commented May 16, 2022

stephen-hayne commented May 16, 2022

vanhoanglepsa commented May 17, 2022

X-zhihao commented Aug 13, 2022 • edited Loading

souravs17031999 commented Jan 22, 2023 • edited Loading

stephen-hayne commented Jan 27, 2023

pupuu555 commented Apr 7, 2023

stephen-hayne commented Apr 7, 2023 via email

xichie commented Apr 8, 2023 • edited Loading

pupuu555 commented Apr 10, 2023

pupuu555 commented Apr 10, 2023

pupuu555 commented Apr 11, 2023

sailormoon-c commented Apr 14, 2023

xichie commented Apr 16, 2023

stephen-hayne commented May 12, 2022 •

edited

Loading

X-zhihao commented Aug 13, 2022 •

edited

Loading

souravs17031999 commented Jan 22, 2023 •

edited

Loading

xichie commented Apr 8, 2023 •

edited

Loading