Attaching files with spaces in the file name causes a ValueError when writing the AASX #236

WelliSolutions · 2024-02-17T15:51:14Z

I was adding a file to an AASX which contained spaces. When writing the AASX, I got the following traceback:

Traceback (most recent call last):
[...]
  File "[...]\venv\Lib\site-packages\basyx\aas\adapter\aasx.py", line 415, in write_aas
    self.write_all_aas_objects("/aasx/data.{}".format("json" if write_json else "xml"),
  File "[...]\venv\Lib\site-packages\basyx\aas\adapter\aasx.py", line 552, in write_all_aas_objects
    with self.writer.open_part(file_name, content_type) as p:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[...]\venv\Lib\site-packages\pyecma376_2\package_model.py", line 366, in open_part
    check_part_name(name)
  File "[...]\venv\Lib\site-packages\pyecma376_2\package_model.py", line 658, in check_part_name
    raise ValueError("{} is not an URI path with multiple segments (each not empty and not starting with '.') "
ValueError: '/aasx/attachments/file with space.txt' is not an URI path with multiple segments (each not empty and not starting with '.') or not starting with '/' or ending wit '/'

Note that the error message is a bit misleading. It's not because of the / and not because of the .. I double checked and it's the space that causes the problem.

My expectation is that the Basyx library handles the file name at the level of DictSupplementaryFileContainer and the add_file() method. As written in the tutorial, the method shall return the actual file name. If spaces are not allowed, please return a file name without spaces.

The text was updated successfully, but these errors were encountered:

WelliSolutions · 2024-02-17T16:00:45Z

More details:
In package_model.py (of pyecma376_2), the check is made:

def check_part_name(part_name: str) -> None:
    """ Check if `part_name` is a valid OPC part name in URI (not IRI) representation.

    :raises ValueError: if it is not.
    """
    if not RE_PART_NAME.match(part_name):
        raise ValueError("{} is not an URI path with multiple segments (each not empty and not starting with '.') "
                         "or not starting with '/' or ending wit '/'".format(repr(part_name)))

And RE_PART_NAME does not allow spaces:

RE_PART_NAME = re.compile(r'^(/[A-Za-z0-9\-\._~%:@!$&\'()*+,;=]*[A-Za-z0-9\-_~%:@!$&\'()*+,;=])+$')

Let's first conclude that \. and \' are equivalent to . and ', because it's inside a bracket expression and those don't need to be escaped there.

The first bracket expression contains the ., the second doesn't. Also, the first bracket expression is optional, the second isn't. So, to me, the Regex is checking whether something does not end with a dot, rather than "not starting with '.'" as mentioned in the error message. So, at least this part may be a bug in the pyecma library (either the error message or the Regex).

Other than that, I don't have enough knowledge to judge whether ECMA 376 allows spaces in file names or not.

WelliSolutions · 2024-02-17T16:24:39Z

OPC Specification IMHO is here: https://standards.iso.org/ittf/PubliclyAvailableStandards/c077818_ISO_IEC_29500-2_2021(E).zip
(download link suggested by Wikipedia does not work).

The relevant chapter seems to be 6.2.2 Part Names. And 6.2.2.2 Syntax specifies:

No I18N segments shall end with a dot (“.”) character.

So this is really about ending with a dot.

Furthermore:

[...] where an I18N segment is a Unicode string that
matches the non-terminal isegment-nz and percent-encoding represents a character by the percent
character "%" followed by two hexadecimal digits, as specified in RFC 3986

So it seems to me that the Basyx LIbrar should percent-encode the space to %20.

jkhsjdhjs · 2024-02-17T16:25:11Z

I agree, perhaps we should already call check_part_name() within DictSupplementaryFileContainer.add_file(), and if that raises an exception, percent encode the part name? Or just always percent encode?

jkhsjdhjs · 2024-02-17T22:47:09Z

Alright, I did some digging:

When writing supplementary files, the AASXWriter does the following:

basyx-python-sdk/basyx/aas/adapter/aasx.py

Lines 555 to 559 in 0927305

    
           logger.debug("Writing supplementary file {} to AASX package ...".format(file_name)) 
        
           with self.writer.open_part(file_name, content_type) as p: 
        
               file_store.write_file(file_name, p) 
        
           supplementary_file_names.append(pyecma376_2.package_model.normalize_part_name(file_name)) 
        
           self._supplementary_part_names[file_name] = hash

For each supplementary file, it calls OPCPackageReader.open_part() with the filename supplied by the user, the way the file was added to the SupplementaryFileContainer. This determines the name the file will have in the AASX file.
Afterwards, it adds the normalized filename to the list supplementary_file_names, which will later be used to write the relationship files.

Normalization percent-encodes the string and converts it to lower case:
https://github.com/rwth-iat/PyECMA376-2/blob/e7c28ca9c9fbebfae3d3ab9a8abc104aa57f7913/pyecma376_2/package_model.py#L642-L645

Thus, it seems to me, that only the relationship files are expected to contain normalized, percent-encoded URIs, and the filenames are actually not expected to follow this format. If the filenames would be percent-encoded and converted to lower case, it would also be impossible to determine the original filename before packaging, and the assertion at the end of tutorial_aasx.py would never be true:

basyx-python-sdk/basyx/aas/examples/tutorial_aasx.py

Line 156 in 0927305

assert actual_file_name in new_file_store

Yet, OPCPackageWriter.open_part() and OPCPackageWriter.create_fragmented_part() both call check_part_name() on the passed filename, which validates the filename against the regular expression for part URIs.

So, I had a look at the specification you linked and it doesn't confirm my thoughts (well, partially):

7.3.4 Mapping logical item names to ZIP item names
For each logical item, the process of mapping of logical item names to ZIP item names shall involve the following steps, in order:

Remove the leading forward slash (“/”) from the logical item name or, in the case of interleaved parts, from each of the logical item names within the complete sequence.
Remove the leading forward slash (“/”) from the logical item name or, parts, from each of the logical item names within the complete sequence.

Percent-encode every item non-ASCII character.

Validating the filenames via check_part_name() is wrong, as spaces are ASCII characters and thus allowed in filenames.
However, non-ASCII characters still need to be percent-encoded. Thus, the correct way would be to replace the call to check_part_name() by a call to a function that percent-encodes the passed filename, or at least to change the current validation.

s-heppner · 2024-02-18T19:41:48Z

Did I understand correctly, that this is a bug in the check_part_name() function of the pyecma376_2 library?

jkhsjdhjs · 2024-02-18T22:04:06Z

I think the check_part_name() function is correct, what's incorrect is that this function is used to validate filenames for the zip package. IMO, a function to correctly percent-encode filenames for AASX packages should be implemented in pyecma376_2.

pyecma376_2 already has a function to percent-encode and convert to lower-case part names. However, the way I understand it is that this percent-encoding for part-names is different from the percent-encoding for filenames, as it

converts the entire string to lower-case and
also encodes characters that would be allowed in filenames, such as spaces.

s-heppner added the bug Something isn't working label Feb 18, 2024

s-heppner mentioned this issue Feb 19, 2024

Allow for spaces in file names rwth-iat/PyECMA376-2#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attaching files with spaces in the file name causes a ValueError when writing the AASX #236

Attaching files with spaces in the file name causes a ValueError when writing the AASX #236

WelliSolutions commented Feb 17, 2024

WelliSolutions commented Feb 17, 2024

WelliSolutions commented Feb 17, 2024

jkhsjdhjs commented Feb 17, 2024

jkhsjdhjs commented Feb 17, 2024

s-heppner commented Feb 18, 2024

jkhsjdhjs commented Feb 18, 2024 •

edited

Loading

Attaching files with spaces in the file name causes a ValueError when writing the AASX #236

Attaching files with spaces in the file name causes a ValueError when writing the AASX #236

Comments

WelliSolutions commented Feb 17, 2024

WelliSolutions commented Feb 17, 2024

WelliSolutions commented Feb 17, 2024

jkhsjdhjs commented Feb 17, 2024

jkhsjdhjs commented Feb 17, 2024

s-heppner commented Feb 18, 2024

jkhsjdhjs commented Feb 18, 2024 • edited Loading

jkhsjdhjs commented Feb 18, 2024 •

edited

Loading