Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logs are printed without values #2914

Closed
amirdor opened this issue Aug 9, 2020 · 3 comments
Closed

logs are printed without values #2914

amirdor opened this issue Aug 9, 2020 · 3 comments

Comments

@amirdor
Copy link

amirdor commented Aug 9, 2020

Problem description

the logs of build_vocab and train are printed without the values inside

examples:
model.build_vocab(sentences=docs, progress_per=10000)

08/09/2020 08:25:52 AM [INFO] collecting all words and their counts
08/09/2020 08:25:52 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types
08/09/2020 08:25:52 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types
08/09/2020 08:25:52 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types
08/09/2020 08:25:52 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types
08/09/2020 08:25:52 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types
08/09/2020 08:25:53 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types
08/09/2020 08:25:53 AM [INFO] PROGRESS: at sentence #%i, processed %i words, keeping %i word types

model.train(sentences=docs, total_examples=len(docs), epochs=3)

08/09/2020 08:26:05 AM [INFO] training model with %i workers on %i vocabulary and %i features, using sg=%s hs=%s sample=%s negative=%s window=%s
08/09/2020 08:26:06 AM [INFO] EPOCH %i - PROGRESS: at %.2f%% examples, %.0f words/s, in_qsize %i, out_qsize %i
08/09/2020 08:26:07 AM [INFO] EPOCH %i - PROGRESS: at %.2f%% examples, %.0f words/s, in_qsize %i, out_qsize %i
08/09/2020 08:26:08 AM [INFO] EPOCH %i - PROGRESS: at %.2f%% examples, %.0f words/s, in_qsize %i, out_qsize %i
08/09/2020 08:26:09 AM [INFO] EPOCH %i - PROGRESS: at %.2f%% examples, %.0f words/s, in_qsize %i, out_qsize %i
08/09/2020 08:26:10 AM [INFO] EPOCH %i - PROGRESS: at %.2f%% examples, %.0f words/s, in_qsize %i, out_qsize %i
08/09/2020 08:26:11 AM [INFO] EPOCH %i - PROGRESS: at %.2f%% examples, %.0f words/s, in_qsize %i, out_qsize %i
08/09/2020 08:26:12 AM [INFO] EPOCH %i - PROGRESS: at %.2f%% examples, %.0f words/s, in_qsize %i, out_qsize %i
08/09/2020 08:26:13 AM [INFO] EPOCH %i - PROGRESS: at %.2f%% examples, %.0f words/s, in_qsize %i, out_qsize %i

it happened both for FastText and Word2Vec models

Versions

Linux-4.9.217-0.1.ac.205.84.332.metal1.x86_64-x86_64-with-redhat-5.3-Tikanga
Python 3.6.10 (default, Jul 27 2020, 00:14:34)
[GCC 4.9.4]
Bits 64
NumPy 1.19.1
SciPy 1.2.1
gensim 3.8.3
FAST_VERSION 1

@piskvorky
Copy link
Owner

piskvorky commented Aug 10, 2020

That's really weird. The logging code is here, I don't see how that can happen:
https://github.com/RaRe-Technologies/gensim/blob/e889fa3d45a406cabbc7e180fa9a8ee3f76ac6f0/gensim/models/base_any2vec.py#L1286

@amirdor can you try the following please:

import logging
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(filename)s:%(lineno)s - %(message)s')

logging.info(
    "EPOCH %i - PROGRESS: at %.2f%% examples, %.0f words/s, in_qsize %i, out_qsize %i",
    1, 2, 3, 4, 5
)

import gensim
gensim.models.word2vec.logger.info(
    "EPOCH %i - PROGRESS: at %.2f%% examples, %.0f words/s, in_qsize %i, out_qsize %i",
    1, 2, 3, 4, 5
)

@amirdor
Copy link
Author

amirdor commented Aug 10, 2020

@piskvorky im getting


08/10/2020 01:01:22 PM [INFO] EPOCH %i - PROGRESS: at %.2f%% examples, %.0f words/s, in_qsize %i, out_qsize %i
08/10/2020 01:01:22 PM [INFO] EPOCH %i - PROGRESS: at %.2f%% examples, %.0f words/s, in_qsize %i, out_qsize %i

@piskvorky
Copy link
Owner

piskvorky commented Aug 10, 2020

The log format in your output is completely different from what was configured in basicConfig().

The only explanation is your logging module has been modified. This is a built-in module in Python, so such signs of its modification (which you're not even aware of) imply serious risk. I'd recommend you check your Python installation and related attack vectors in general.

Closing here – this is clearly not a Gensim issue, because even the first logging.info is wrong, before even importing Gensim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants