-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make test_CompareSparse eat less memory and faster. #3196
Conversation
@@ -22,7 +22,7 @@ | |||
default_initial_std(0.1) | |||
default_device(0) | |||
|
|||
word_dim = 1451594 | |||
word_dim = 145159 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should update word_dim in sample_trainer_config_rnn.conf
accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
@@ -22,7 +22,7 @@ | |||
default_initial_std(0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 19, uniitest->unittest. please update the same mistake in sample_trainer_config_rnn.conf
accordingly. thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
Maybe we couldn't modify the word_dim, as the word_dim of binary dataset (data_bin_part) is 145159:
|
39da2d3
to
c361867
Compare
Wrote a Python program to rewrite the data file (inspired by the import os
import sys
from google.protobuf.internal.decoder import _DecodeVarint
from google.protobuf.internal.encoder import _EncodeVarint
import paddle.proto.DataFormat_pb2 as DataFormat
def write_proto(file, message):
str = message.SerializeToString()
_EncodeVarint(file.write, len(str))
file.write(str)
def read_proto(file, message):
"""
read a protobuffer struct from file, the length of the struct is stored as
a varint, then followed by the actual struct data.
@return True success, False for end of file
"""
buf = file.read(8)
if not buf:
return False
result, pos = _DecodeVarint(buf, 0)
buf = buf[pos:] + file.read(result - len(buf) + pos)
message.ParseFromString(buf)
return True
def usage():
print >> sys.stderr, "Usage: python write_pb.py PROTO_DATA_FILE output_file"
exit(1)
if __name__ == '__main__':
if len(sys.argv) < 2:
usage()
dim = 999
w = open(sys.argv[2], "ab")
f = open(sys.argv[1])
header = DataFormat.DataHeader()
read_proto(f, header)
write_proto(w, header)
for d in header.slot_defs:
if d.dim > 10: #hack
d.dim = dim
#print header
sample = DataFormat.DataSample()
while read_proto(f, sample):
for vs in sample.vector_slots:
for idx, id in enumerate(vs.ids):
vs.ids[idx] = id % dim
#print sample
write_proto(w, sample)
w.close() |
No description provided.