Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "Defer errors" option to applicable iterators #2442

Merged
merged 9 commits into from Jan 7, 2024
Merged

Conversation

joeyballentine
Copy link
Member

So it turns out the reason it wasn't working properly before was because of #2440... The models i was using just weren't being picked up by the node i was using to test. Therefore, I didn't actually need to do the manual iteration with the while True + next() nonsense.

This might not be the most glamorous solution, but it works and I'm happy enough with it.

@RunDevelopment
Copy link
Member

RunDevelopment commented Jan 7, 2024

Instead of dealing with deferred errors in one place (where you actually do the iterating), you now handle it separately in 4 places. Iterator progress is also messed up because errors silently caught by the iterator itself can't be accounted for (the issue is that expected_length isn't reliable at all anymore).

Yeah, I'm not a fan of this. Could we do the while True + next() nonsense instead?

@joeyballentine
Copy link
Member Author

sure

@joeyballentine
Copy link
Member Author

Doesn't work. I reverted the stuff in api.py back aside from setting defer_errors to true, and i replaced the for values in ... loop in process.py with the while. It doesn't work, it just errors at the first error and gives up.

Since I know you'll wanna test it yourself, I pushed my changes to a new branch called defer-errors-nonworking. Maybe I just did something wrong.

@RunDevelopment
Copy link
Member

Doesn't work.

That's my bad.

        def supplier():
            for i, x in enumerate(l):
                yield map_fn(x, i)

Obviously the generator will stop when map_fn throws. That's my mistake.

So I read through python's docs, and it just doesn't tell you what will happen when the __next__ method of a generator/iterator throws an exception beside StopIteration. Great: unspecified behavior. So I asked chatGPT, and it said that this is strongly discouraged. It told me that no one expects this and that a lot of code might just completely bug out.

So I guess the only solution compatible with python would be to roll our own Result<T, E> type. Rust had it all figured out after all.

So we need to change our API to iter_supplier: Callable[[], Iterable[Result[I]]] where Result[I] is either an item I or an Exception. We could assume that I will never be of type Exception and change the API to iter_supplier: Callable[[], Iterable[I | Exception]] instead. This would get rid of the Result type.


Also, first thing I noticed:

        iterable = iterator_output.iterator.iter_supplier()
        while True:
            try:
                values = next(iterable)  # type: ignore

This is wrong and the type error even told you that. next expects an iterator, not an iterable. An iterator is the thing that has the __next__ method, and an iterable is something that can give you can iterator via its __iter__ method. E.g. a list is the iterable that will give you an iterator over its elements. The list iterator is essentially just an index that will be incremented whenever __next__ is called.

So why did your previous code seem to work anyway? Luck. Under the hood, we use python generator functions (yield) and those return a Generator object. A Generator is both iterator and iterable, so it has both a __next__ and __iter__ method. The trick is that __iter__ just returns the generator itself.

Here's what it needs to be:

        iterator = iter(iterator_output.iterator.iter_supplier())
        while True:
            try:
                values = next(iterator)

@joeyballentine
Copy link
Member Author

joeyballentine commented Jan 7, 2024

Here's what it needs to be:

I did try that but I got rid of it because it wasn't doing anything (as in, it didn't fix my problem)

@joeyballentine
Copy link
Member Author

Also, I really don't want to implement a whole result system and make all those changes right now. That's something I actually started doing back when I first tried this and it was gonna be a lot of work.

Can we just use what this branch does for now? Even if it's not ideal?

@RunDevelopment
Copy link
Member

I really don't want to implement a whole result system and make all those changes right now.

Checkout the defer-errors-nonworking branch. I made it work with my iter_supplier: Callable[[], Iterable[I | Exception]] suggestion :) ba312c6

I did try that but I got rid of it because it wasn't doing anything (as in, it didn't fix my problem)

Sorry, I put this in a confusing way. Yes, this doesn't fix your problem. It's just also wrong.

@joeyballentine
Copy link
Member Author

Thanks :) that's a pretty elegant solution.

Just merged that into this branch

backend/src/process.py Outdated Show resolved Hide resolved
backend/src/process.py Outdated Show resolved Hide resolved
backend/src/process.py Outdated Show resolved Hide resolved
backend/src/process.py Outdated Show resolved Hide resolved
@joeyballentine joeyballentine merged commit 437c70d into main Jan 7, 2024
14 checks passed
@joeyballentine joeyballentine deleted the defer-errors branch January 7, 2024 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants