Speed up execution of py2mat #4

hcommin · 2023-09-14T10:34:48Z

Repeatedly calling py2mat is currently quite slow. For example, I run this simple test script:

x_mat = randn(32768, 1);
x_py = mat2py(x_mat);

tic;
for i = 1:1024
    y_mat = py2mat(x_py);
end
toc

And execution takes around 7 seconds:

Elapsed time is 7.054170 seconds.

Using MATLAB's "Run and Time" profiler, I found that almost all of that time is spent importing numpy:

  Im = @py.importlib.import_module;
  np = Im('numpy');

This import can be avoided by deleting the two lines above and using strings/chars (like 'float64') instead of Python objects (like np.float64):

        case "float16"
          % doesn't exist in matlab, upcast to float32
          x_mat = single(x_py.astype('float32'));
        % only reals and logicals can be cast to arrays so
        % have to cast the rest to either single or double
        case "uint8"
          x_mat = uint8(x_py.astype('float32'));
        case "int8"
          x_mat = int8(x_py.astype('float32'));
        case "uint16"
          x_mat = uint16(x_py.astype('float32'));
        case "int16"
          x_mat = int16(x_py.astype('float32'));
        case "uint32"
          x_mat = uint32(x_py.astype('float32'));
        case "int32"
          x_mat = int32(x_py.astype('float32'));
        case "uint64"
          x_mat = uint64(x_py.astype('float64'));
        case "int64"
          x_mat = int64(x_py.astype('float64'));

Now my simple test script executes almost 100x faster:

Elapsed time is 0.078395 seconds.

The text was updated successfully, but these errors were encountered:

hcommin · 2023-09-14T11:20:35Z

In fact, it seems like it may be sufficient to simply replace all instances of np with py.numpy (which would be more robust than using strings).

However, I'm confused by how they behave differently in the MATLAB console:

>> np = py.importlib.import_module('numpy');
>> np.float32

ans = 

  Python type with properties:

    as_integer_ratio: [1×1 py.method_descriptor]
          is_integer: [1×1 py.method_descriptor]

    <class 'numpy.float32'>

>> py.numpy.float32

ans = 

  Python float32:

     0

    Use details function to view the properties of the Python object.

    Use single function to convert to a MATLAB array.

It seems like np.float32 is a class, whereas py.numpy.float32 is an object with value 0.

- For details, see: - Apress/python-for-matlab-development#4 - Apress/python-for-matlab-development#5

hcommin · 2023-09-15T08:55:05Z

@AlDanial I have pasted my updated version of py2mat.m below for your reference.

In this version, I used strings as per my original post, above. I mentioned in my second post that there may be a better approach which avoids strings, but I had some doubts (and no quick way to test). Perhaps you could choose the better option.

% Convert a native Python variable to an equivalent native MATLAB variable.

% {{{ code/matlab_py/py2mat.m   v 1.2  2022-09-16
% This code accompanies the book _Python for MATLAB Development:
% Extend MATLAB with 300,000+ Modules from the Python Package Index_ 
% ISBN 978-1-4842-7222-0 | ISBN 978-1-4842-7223-7 (eBook)
% DOI 10.1007/978-1-4842-7223-7
% https://github.com/Apress/python-for-matlab-development
% 
% Copyright © 2022 Albert Danial
% 
% MIT License:
% Permission is hereby granted, free of charge, to any person obtaining a copy
% of this software and associated documentation files (the "Software"), to deal
% in the Software without restriction, including without limitation the rights
% to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
% copies of the Software, and to permit persons to whom the Software is
% furnished to do so, subject to the following conditions:
% 
% The above copyright notice and this permission notice shall be included in
% all copies or substantial portions of the Software.
% 
% THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
% IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
% FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
% THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
% LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
% FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
% DEALINGS IN THE SOFTWARE.
% }}}
function [x_mat] = py2mat(x_py)
  switch class(x_py)
      % Python dictionaries
      case 'py.dict'
      try
        % the obvious dict -> struct conversion only works
        % if the key string can be a MATLAB variable
        x_mat = struct(x_py);
      catch EO
        % A failure most likely means a Python key is not
        % a proper MATLAB variable name.  Go through them
        % individually
        key_index = int64(0);
        key_list = cell(py.list(x_py.keys()))
        for key = key_list
          key_index = key_index + 1;
          if class(key) == 'cell'
            % key could be a Python number
            v_name = string(py.str(key{1}));
          else
            v_name = string(key);
          end
          if isvarname(v_name)
            x_mat.(v_name) = x_py.get(key_list{key_index});
          else
            % this key can't be used; replace it
            fixed = sprintf('K%06d_', key_index) + ...
                    regexprep(v_name,'\W','_');
            x_mat.(fixed) = x_py.get(key_list{key_index});
          end
        end
      end
      fields = fieldnames(x_mat);
      for i = 1:length(fields)
        new_var = py2mat(x_mat.(fields{i}));
        x_mat = setfield(x_mat,fields{i}, new_var);
      end

    % NumPy arrays and typed scalars
    case 'py.numpy.ndarray'
      switch string(x_py.dtype.name)
        case "float64"
          x_mat = x_py.double;
        case "float32"
          x_mat = x_py.single;
        case "float16"
          % doesn't exist in matlab, upcast to float32
          x_mat = single(x_py.astype('float32'));
        % only reals and logicals can be cast to arrays so
        % have to cast the rest to either single or double
        case "uint8"
          x_mat = uint8(x_py.astype('float32'));
        case "int8"
          x_mat = int8(x_py.astype('float32'));
        case "uint16"
          x_mat = uint16(x_py.astype('float32'));
        case "int16"
          x_mat = int16(x_py.astype('float32'));
        case "uint32"
          x_mat = uint32(x_py.astype('float32'));
        case "int32"
          x_mat = int32(x_py.astype('float32'));
        case "uint64"
          x_mat = uint64(x_py.astype('float64'));
        case "int64"
          x_mat = int64(x_py.astype('float64'));
        case "bool"
          x_mat = logical(x_py);
        % Complex types require a math operation to coerce the
        % real and imaginary components to contiguous arrays.
        % Use "+0" for minimal performance impact.  Without this,
        % complex creation fails with
        %   Python Error: ValueError: ndarray is not contiguous
        case "complex64"
          x_mat = complex(single(x_py.real+0), single(x_py.imag+0));
        case "complex128"
          x_mat = complex(double(x_py.real+0), double(x_py.imag+0));
        case "complex256"
          fprintf('py2mat:  MATLAB does not support quad precision complex\n')
          x_mat = [];
          return
        otherwise
          % gets here with np.float16, custom dtypes
          fprintf('py2mat: %s not recognized\n', ...
                  string(x_py.dtype.name));
          x_mat = [];
          return
      end

    % Scipy sparse matrices
    case {'py.scipy.sparse.coo.coo_matrix', ...
          'py.scipy.sparse.csr.csr_matrix', ...
          'py.scipy.sparse.csc.csc_matrix', ...
          'py.scipy.sparse.dok.dok_matrix', ...
          'py.scipy.sparse.bsr.bsr_matrix', ...
          'py.scipy.sparse.dia.dia_matrix', ...
          'py.scipy.sparse.lil.lil_matrix', }
      ndims = x_py.get_shape();
      if length(ndims) ~= 2
          fprintf('py2mat:  can only convert 2D sparse matrices\n')
          x_mat = [];
          return
      end
      nR = int64(ndims{1});
      nC = int64(ndims{2});
      x_py = x_py.tocoo();
      values = py2mat(x_py.data);
      if isempty(values)
          % gets here if trying to convert a complex256 sparse matrix
          return
      end
      % add 1 to row & col indices to go from 0-based to 1-based
      x_mat = sparse(single(x_py.row)+1, single(x_py.col)+1, values, nR, nC);

    % Python sets, tuples, and lists
    case {'cell', 'py.tuple', 'py.list'}
      [nR, nC] = size(x_py);
      x_mat = cell(nR,nC);
      for r = 1:nR
        for c = 1:nC
          x_mat{r,c} = py2mat(x_py{r,c});
        end
      end

    % Python strings
    case 'py.str'
      x_mat = string(x_py);

    % Python integers
    case 'py.int'
      x_mat = x_py.int64;

    % Python floats (float64) -- same as matlab double
    case 'double'
      x_mat = x_py;

    case 'py.datetime.datetime'
      x_mat = datetime(int64(x_py.year),  ...
                       int64(x_py.month), ...
                       int64(x_py.day),   ...
                       int64(x_py.hour),  ...
                       int64(x_py.minute),...
                       int64(x_py.second),...
                       int64(x_py.microsecond));

    case 'logical'
      x_mat = logical(x_py);

    case 'py.bytes'
      x_mat = uint8(x_py);

    case 'py.NoneType'
      x_mat = '';

    case 'py.scipy.optimize.optimize.OptimizeResult'
      for k = cell(py.list(x_py.keys))
          x_mat.(string(k{1})) = py2mat(x_py.get(k{1}));
      end

    % punt
    otherwise
      % return the original item?  nothing?
      fprintf('py2mat: type "%s" not recognized\n', ...
              string(x_py.dtype.name));
      x_mat = [];
  end
end

support bool type in py2mat, #3 fix microsecond/millisecond conversion error in datetime add conversion and performance tests for mat2py and py2mat

hcommin mentioned this issue Sep 14, 2023

Speed up execution of mat2py #5

Closed

hcommin added a commit to enclustra/en_cl_fix that referenced this issue Sep 14, 2023

Speed up py2mat and mat2py by avoiding repeated Python package imports

2f53222

- For details, see: - Apress/python-for-matlab-development#4 - Apress/python-for-matlab-development#5

AlDanial added a commit that referenced this issue Sep 15, 2023

performance enhancements by https://github.com/hcommin, #4, #5

6bf5d44

support bool type in py2mat, #3 fix microsecond/millisecond conversion error in datetime add conversion and performance tests for mat2py and py2mat

AlDanial closed this as completed Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up execution of py2mat #4

Speed up execution of py2mat #4

hcommin commented Sep 14, 2023

hcommin commented Sep 14, 2023

hcommin commented Sep 15, 2023

Speed up execution of py2mat #4

Speed up execution of py2mat #4

Comments

hcommin commented Sep 14, 2023

hcommin commented Sep 14, 2023

hcommin commented Sep 15, 2023