Skip to content

File state incorrectly marked as "errored" if contains chunks is "pending" state #182

@clehene

Description

@clehene

Description

in _update

if cstates.contains_all('finished'):
...
elif cstates.contains_none('running'):              
    self._fstates[parent] = 'errored'

If chunks are in pending state instead, this will mark the parent state as "errored" which will subsequently cause active ( return not self._fstates.contains_none('pending', 'transferring', 'merging')) to return False and monitor to continue with chunks in pending / running states.

If by the time run checks the chunks of a file the chunk has finished, this will continue, if chunks are in running this will error.


Reproduction Steps

Not a deterministic way, as it depends on parallel task execution, but either having many chunks or repeatedly will cause this (i.e. we need to 1) have chunks in pending state and 2) have some in running state when we check them)

def test_dummy(azure):
    def transfer(adlfs, src, dst, offset, size, blocksize, buffersize, shutdown_event=None):
        return size, None

    client = ADLTransferClient(azure, transfer=transfer, chunksize=8,
                               chunked=True)

    client.submit('foo', AzureDLPath('bar'), 32*32)
    client.run()

Environment summary

SDK Version: What version of the SDK are you using? (pip show azure-datalake-store)
Answer here:

Python Version: What Python version are you using? Is it 64-bit or 32-bit?
Answer here:

OS Version: What OS and version are you using?
Answer here:

Shell Type: What shell are you using? (e.g. bash, cmd.exe, Bash on Windows)
Answer here:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions