Skip to content

Unicode/Bytes concatenation is inefficient #3453

@da-woods

Description

@da-woods

CPython has a specific optimization when concatenating strings - it checks the reference count of the first operand and tries to concatenate in place if possible. This is done in ceval: https://github.com/python/cpython/blob/309d7cc5df4e2bf3086c49eb2b1b56b929554500/Python/ceval.c#L5354. For some specific cases this can make a big performance difference https://stackoverflow.com/questions/35787022/cython-string-concatenation-is-super-slow-what-else-does-it-do-poorly

I had an initial go at it here: #3451. However there's definite failure paths since it can NULL out variables that Cython isn't expecting to be NULL.

A couple of possible options:

  1. It might be possible to create something with that basically re-implements PyUnicode_Append but without clearing operand1. (i.e. remove this line https://github.com/python/cpython/blob/b146568dfcbcd7409c724f8917e4f77433dd56e4/Objects/unicodeobject.c#L11517)

  2. (probably easier) ensure that operand1 is always set to something on exit, even if it's a dummy value like an empty string. This could mostly be based on the current PR, but it would ocassionally lead to unexpected behaviour (mostly when exceptions are caught and handled)

      cdef unicode val = "X"
      try:
           val += "x"
      except:
           pass
      return val  # wouldn't crash, but would be an odd placeholder string.
    

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions