Skip to content

write_sparse_tensor_to_spmatrix #1023

@Jomonsugi

Description

@Jomonsugi

System Information

  • Factorization Machine:

I am successfully using the function write_spmatrix_to_sparse_tensor to transform my data from a sparse matrix to the recordio format expected by Sagemaker's factorization machine implementation.

Example:

def write_recordio(array, y, prefix, f):
    # Convert to record protobuf
    buf = io.BytesIO()
    smac.write_spmatrix_to_sparse_tensor(array=array, file=buf, labels=y)
    buf.seek(0)
    
    fname = os.path.join(prefix, f)
    boto3.Session().resource('s3').Bucket('bucket').Object(fname).upload_fileobj(buf)

An example of array which are features:

  (0, 990290)	1.0
  (0, 1266265)	1.0
  (1, 560338)	1.0
  (1, 1266181)	1.0
  (2, 182872)	1.0
  (2, 1266205)	1.0
.................................

An example of y which is my target:
[1. 2. 1. ... 3. 1. 5.]

write_spmatrix_to_sparse_tensor does the job. After training my model, I then use Batch Transform to receive a .out file with many outputs of type <class 'record_pb2.Record'>
An example of one input and associated output record:
input:

features {
  key: "values"
  value {
    float32_tensor {
      values: 1.0
      values: 1.0
      keys: 990290
      keys: 1266265
      shape: 1266394
    }
  }
}
label {
  key: "values"
  value {
    float32_tensor {
      values: 1.0
    }
  }
}

output:

label {
  key: "score"
  value {
    float32_tensor {
      values: 1.5246734619140625
    }
  }
}

So now I have a file I originally wrote using write_spmatrix_to_sparse_tensor and an output from transformer.transform and I would like the function write_sparse_tensor_to_spmatrix to exist for both of these files (my original recordio file used for training and the output .out file). I personally need to get back to my original parquet format as my data pull pipeline is parquet -> pandas -> sparse_matrix -> recordio and I need to reverse that process for evaluation and eventually deployment, but no matter what the use case it seems that users would frequently want to work back to their original format from both their input to the model and the output of batch transform and write_sparse_tensor_to_spmatrix would accomplish the task.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions