Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make test_CompareSparse eat less memory and faster. #3196

Merged
merged 1 commit into from
Aug 5, 2017

Conversation

helinwang
Copy link
Contributor

No description provided.

@@ -22,7 +22,7 @@
default_initial_std(0.1)
default_device(0)

word_dim = 1451594
word_dim = 145159
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should update word_dim in sample_trainer_config_rnn.conf accordingly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

@@ -22,7 +22,7 @@
default_initial_std(0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 19, uniitest->unittest. please update the same mistake in sample_trainer_config_rnn.conf accordingly. thx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

@luotao1
Copy link
Contributor

luotao1 commented Aug 4, 2017

Maybe we couldn't modify the word_dim, as the word_dim of binary dataset (data_bin_part) is 145159:

[09:13:23]	I0804 01:13:10.039898 25308 Trainer.cpp:114] ignore sparse_remote_update=true due to --local=true
[09:13:23]	I0804 01:13:10.039927 25308 Trainer.cpp:165] trainer mode: Normal
[09:13:23]	I0804 01:13:10.071002 25308 ProtoDataProvider.cpp:55] load data file trainer/tests/data_bin_part
[09:13:23]	I0804 01:13:10.076408 25308 ProtoDataProvider.cpp:70] read done, num of instance=1000
[09:13:23]	I0804 01:13:10.076447 25308 ProtoDataProvider.cpp:367] slot0:avgNNZ=6.678; slot1:avgNNZ=5.47; slot2:avgNNZ=15.924; slot3:avgNNZ=12.808; slot4:avgNNZ=6.713; slot5:avgNNZ=5.489; slot6:avgNNZ=16.915; slot7:avgNNZ=13.482;
[09:13:23]	I0804 01:13:10.076545 25308 GradientMachine.cpp:85] Initing parameters..
[09:13:23]	I0804 01:13:11.508193 25308 GradientMachine.cpp:92] Init parameters done.
[09:13:23]	F0804 01:13:11.508652 25308 Matrix.cpp:2540] Check failed: index[i] < (int)tableSize (1190443 vs. 145159)

@helinwang
Copy link
Contributor Author

helinwang commented Aug 4, 2017

Wrote a Python program to rewrite the data file (inspired by the show_pb.py in our repo):

import os
import sys
from google.protobuf.internal.decoder import _DecodeVarint
from google.protobuf.internal.encoder import _EncodeVarint
import paddle.proto.DataFormat_pb2 as DataFormat

def write_proto(file, message):
    str = message.SerializeToString()
    _EncodeVarint(file.write, len(str))
    file.write(str)

def read_proto(file, message):
    """
    read a protobuffer struct from file, the length of the struct is stored as
    a varint, then followed by the actual struct data.
    @return True success, False for end of file
    """

    buf = file.read(8)
    if not buf:
        return False
    result, pos = _DecodeVarint(buf, 0)
    buf = buf[pos:] + file.read(result - len(buf) + pos)
    message.ParseFromString(buf)

    return True


def usage():
    print >> sys.stderr, "Usage: python write_pb.py PROTO_DATA_FILE output_file"
    exit(1)


if __name__ == '__main__':
    if len(sys.argv) < 2:
        usage()

    dim = 999

    w = open(sys.argv[2], "ab")
    f = open(sys.argv[1])
    header = DataFormat.DataHeader()
    read_proto(f, header)
    write_proto(w, header)

    for d in header.slot_defs:
        if d.dim > 10: #hack
            d.dim = dim
    
    #print header

    sample = DataFormat.DataSample()
    while read_proto(f, sample):
        for vs in sample.vector_slots:
            for idx, id in enumerate(vs.ids):
                vs.ids[idx] = id % dim

        #print sample
        write_proto(w, sample)
    
    w.close()

@helinwang helinwang changed the title WIP: try to make unit test test_CompareSparse eat less memory Make test_CompareSparse eat less memory and faster. Aug 5, 2017
@wangkuiyi wangkuiyi merged commit dc21a58 into PaddlePaddle:develop Aug 5, 2017
heavengate pushed a commit to heavengate/Paddle that referenced this pull request Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants