-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
System Information
- Factorization Machine:
I am successfully using the function write_spmatrix_to_sparse_tensor
to transform my data from a sparse matrix to the recordio format expected by Sagemaker's factorization machine implementation.
Example:
def write_recordio(array, y, prefix, f):
# Convert to record protobuf
buf = io.BytesIO()
smac.write_spmatrix_to_sparse_tensor(array=array, file=buf, labels=y)
buf.seek(0)
fname = os.path.join(prefix, f)
boto3.Session().resource('s3').Bucket('bucket').Object(fname).upload_fileobj(buf)
An example of array
which are features:
(0, 990290) 1.0
(0, 1266265) 1.0
(1, 560338) 1.0
(1, 1266181) 1.0
(2, 182872) 1.0
(2, 1266205) 1.0
.................................
An example of y
which is my target:
[1. 2. 1. ... 3. 1. 5.]
write_spmatrix_to_sparse_tensor
does the job. After training my model, I then use Batch Transform to receive a .out
file with many outputs of type <class 'record_pb2.Record'>
An example of one input and associated output record:
input:
features {
key: "values"
value {
float32_tensor {
values: 1.0
values: 1.0
keys: 990290
keys: 1266265
shape: 1266394
}
}
}
label {
key: "values"
value {
float32_tensor {
values: 1.0
}
}
}
output:
label {
key: "score"
value {
float32_tensor {
values: 1.5246734619140625
}
}
}
So now I have a file I originally wrote using write_spmatrix_to_sparse_tensor
and an output from transformer.transform
and I would like the function write_sparse_tensor_to_spmatrix
to exist for both of these files (my original recordio file used for training and the output .out
file). I personally need to get back to my original parquet format as my data pull pipeline is parquet -> pandas -> sparse_matrix -> recordio and I need to reverse that process for evaluation and eventually deployment, but no matter what the use case it seems that users would frequently want to work back to their original format from both their input to the model and the output of batch transform and write_sparse_tensor_to_spmatrix
would accomplish the task.