Can't use StripTransform in .yml file #391

veggiemike · 2024-05-19T08:06:22Z

I just realized I cannot use a StripTransform in v1.9.0. It looks like it's never imported by listen.py. A quick audit revealed the following lines appear to be missing from the top of listen.py... Any reason any of these were excluded (e.g., they don't actually work)?

from logger.transforms.delta_transform import DeltaTransform
from logger.transforms.derived_data_transform import DerivedDataTransform
from logger.transforms.format_transform import FormatTransform
from logger.transforms.interpolation_transform import InterpolationTransform
from logger.transforms.nmea_checksum_transform import NMEAChecksumTransform
from logger.transforms.regex_replace_transform import RegexReplaceTransform
from logger.transforms.select_fields_transform import SelectFieldsTransform
from logger.transforms.split_transform import SplitTransform
from logger.transforms.strip_transform import StripTransform
from logger.transforms.subsample_transform import SubsampleTransform

I've imported all of them on my server and things seem to work, although I've honestly only tested StripTransform.

webbpinner · 2024-05-19T11:39:09Z

The way I've handled this in the past is to add the module arg when invoking a reader/transform/writer that's not already imported. i.e.

transforms:
    - class: SplitTransform
      module: logger.transforms.split_transform

davidpablocohn · 2024-05-19T13:14:42Z

Hi Mike! Good catch. This is basically me being inconsistent in design philosophy and driving down the middle of the road. When I originally wrote listen.py, I had it import every reader/writer/transform under the sun. Then, as transforms multiplied, I felt like it started to get cumbersome - should it really include everything, even though most of it won't ever be used? How much does that slow down startup (importing, reading, parsing all that extra code)? So figured that, beyond the original set, I'd just count on folks specifying where each component lived, using the 'module' specification, as Webb describes above. But I realize I've not made that explicit in the documentation. Would love your thoughts (and Webb's) on which way to go on this.

…

On Sun, May 19, 2024 at 4:39 AM Webb Pinner ***@***.***> wrote: The way I've handled this in the past is to add the module arg when invoking a reader/transform/writer that's not already imported. i.e. transforms: - class: SplitTransform module: logger.transforms.split_transform — Reply to this email directly, view it on GitHub <#391 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFO7V3UKEZ5KX7CO34YGHYTZDCFPFAVCNFSM6AAAAABH6E653KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZGIYDIMJVGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

webbpinner · 2024-05-19T13:50:09Z

Would something like this work:
Use the reader/transforms/writers/ __init__.py files to import all the readers/transforms/writers. i.e.

#logger.readers.__init__.py

from . import cached_data_reader
from . import logfile_reader
from . import network_reader
from . import tcp_reader
from . import udp_reader
from . import redis_reader
from . import serial_reader
from . import text_file_reader
from . import database_reader
from . import composed_reader

Then listen.py would just:

import readers from logger.readers
import transforms from logger.transforms
import writers from logger.writers

This approach would require any new readers/transforms/writers to be added to their corresponding __init__.py files when they are added to the repository but other files wanting to import all available readers/transforms/writers wouldn't have to change.

This doesn't take into account performance ramifications.

veggiemike · 2024-05-19T20:32:36Z

The way I've handled this in the past is to add the module arg when invoking a reader/transform/writer that's not already imported. i.e.
transforms:
    - class: SplitTransform
      module: logger.transforms.split_transform

That hadn't occurred to me at first. In my mind that was just for local additions, like module: local.nautilus.UglyTransform.

veggiemike · 2024-05-19T20:48:09Z

Would something like this work: Use the reader/transforms/writers/ __init__.py files to import all the readers/transforms/writers. i.e.
#logger.readers.__init__.py

from . import cached_data_reader
from . import logfile_reader
from . import network_reader
from . import tcp_reader
from . import udp_reader
from . import redis_reader
from . import serial_reader
from . import text_file_reader
from . import database_reader
from . import composed_reader
Then listen.py would just:
import readers from logger.readers
import transforms from logger.transforms
import writers from logger.writers
This approach would require any new readers/transforms/writers to be added to their corresponding __init__.py files when they are added to the repository but other files wanting to import all available readers/transforms/writers wouldn't have to change.

This doesn't take into account performance ramifications.

My kneejerk reaction is that I'd hate to have to specify the module all over the place in the .yml files for "built in" transforms, readers, or writers. (I haven't tested this, do I need to do it everywhere or just the first occurnce?) I feel like as a writer of a .yml file, I shouldn't have to know in advance which modules are imported for me already... I should be able to assume all the supplied modules are ready to use.

I like the idea of controlling it via the __init__ files instead of groups of imports in listen.py, if that can be done with minimal effort. I actually was surprised to see them all empty w/out any import or prep code in them.

Regarding performance, I'm unsure, but loading modules that don't have much/any init code in them shouldn't be that impactful. I haven't bumped into any dynamic code that runs at module load-time in any of the files of poked at. Once everything gets loaded once, and .pyc files created, I'd view it as as close to free as you can get with Python. Now, if we start trying to dynamically load things by globbing on the filesystem, that would get slow (although again, I don't know how slow).

So, I personally think we should either just add the missing imports to listen.py (with the intent to always add new imports three when we add new classes) or do it in a series of __init__ files, whichever is easier. I should mention, I did not double-check Reader/Writers when I came up with that list of missing Transforms.

davidpablocohn · 2024-05-21T21:28:17Z

Okay - I'm going to have a go at implementing the __init__.py solution. Will keep you posted.

…

On Sun, May 19, 2024 at 1:48 PM Michael D Labriola ***@***.***> wrote: Would something like this work: Use the reader/transforms/writers/ __init__.py files to import all the readers/transforms/writers. i.e. #logger.readers.__init__.py from . import cached_data_reader from . import logfile_reader from . import network_reader from . import tcp_reader from . import udp_reader from . import redis_reader from . import serial_reader from . import text_file_reader from . import database_reader from . import composed_reader Then listen.py would just: import readers from logger.readers import transforms from logger.transforms import writers from logger.writers This approach would require any new readers/transforms/writers to be added to their corresponding __init__.py files when they are added to the repository but other files wanting to import all available readers/transforms/writers wouldn't have to change. This doesn't take into account performance ramifications. My kneejerk reaction is that I'd hate to have to specify the module all over the place in the .yml files for "built in" transforms, readers, or writers. (I haven't tested this, do I need to do it everywhere or just the first occurnce?) I feel like as a writer of a .yml file, I shouldn't have to know in advance which modules are imported for me already... I should be able to assume all the supplied modules are ready to use. I like the idea of controlling it via the __init__ files instead of groups of imports in listen.py, if that can be done with minimal effort. I actually was surprised to see them all empty w/out any import or prep code in them. Regarding performance, I'm unsure, but loading modules that don't have much/any init code in them shouldn't be that impactful. I haven't bumped into any dynamic code that runs at module load-time in any of the files of poked at. Once everything gets loaded once, and .pyc files created, I'd view it as as close to free as you can get with Python. Now, if we start trying to dynamically load things by globbing on the filesystem, that would get slow (although again, I don't know *how* slow). So, I personally think we should either just add the missing imports to listen.py (with the intent to always add new imports three when we add new classes) or do it in a series of __init__ files, whichever is easier. I should mention, I did not double-check Reader/Writers when I came up with that list of missing Transforms. — Reply to this email directly, view it on GitHub <#391 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFO7V3WQFT6SFYTGDJUSISLZDEFZ7AVCNFSM6AAAAABH6E653KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZGM2TKNBSHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

veggiemike · 2024-05-21T21:56:04Z

Okay - I'm going to have a go at implementing the init.py solution. Will keep you posted.

Cool! FYI, I haven't noticed any performance difference explicitly importing all the transform modules via listen.py

davidpablocohn · 2024-05-21T22:32:11Z

The one change I've had to make (branch issue_391) is that instead of Webb's from . import cached_data_reader from . import logfile_reader I'm having to do from .cached_data_reader import CachedDataReader from .composed_reader import ComposedReader And in listen.py I've gone with from logger.readers import * Otherwise, in the code (and configs) you have to invoke it as cached_data_reader.CachedDataReader. I don't see a clean way to do it without "*" imports. Do either of you have ideas on a cleaner way to do it, or shall I go with this?

…

On Tue, May 21, 2024 at 2:56 PM Michael D Labriola ***@***.***> wrote: Okay - I'm going to have a go at implementing the *init*.py solution. Will keep you posted. Cool! FYI, I haven't noticed any performance difference explicitly importing all the transform modules via listen.py — Reply to this email directly, view it on GitHub <#391 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFO7V3WO3ZQKRBW5CBIAJGLZDO7IVAVCNFSM6AAAAABH6E653KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGUYDINJTG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

veggiemike · 2024-05-21T23:12:20Z

The one change I've had to make (branch issue_391) is that instead of Webb's from . import cached_data_reader from . import logfile_reader I'm having to do from .cached_data_reader import CachedDataReader from .composed_reader import ComposedReader And in listen.py I've gone with from logger.readers import * Otherwise, in the code (and configs) you have to invoke it as cached_data_reader.CachedDataReader. I don't see a clean way to do it without "*" imports. Do either of you have ideas on a cleaner way to do it, or shall I go with this?

I was afraid of that. I did a quick little test myself the other day, couldn't get it right, and just figured my skills were rusty.

If we can't get "*" imports working, and we're left having to invoke as cached_data_reader.CachedDataReader, that's going to break every yaml config file in the universe and make them all that much harder to read... I'd rather live with having the list of import statements in the not-obvious listen.py.

davidpablocohn · 2024-05-22T00:08:03Z

It works fine with "from logger.readers import *" - it's just that importing "*" is considered poor form.

…

On Tue, May 21, 2024 at 4:12 PM Michael D Labriola ***@***.***> wrote: The one change I've had to make (branch issue_391) is that instead of Webb's from . import cached_data_reader from . import logfile_reader I'm having to do from .cached_data_reader import CachedDataReader from .composed_reader import ComposedReader And in listen.py I've gone with from logger.readers import * Otherwise, in the code (and configs) you have to invoke it as cached_data_reader.CachedDataReader. I don't see a clean way to do it without "*" imports. Do either of you have ideas on a cleaner way to do it, or shall I go with this? I was afraid of that. I did a quick little test myself the other day, couldn't get it right, and just figured my skills were rusty. If we can't get "*" imports working, and we're left having to invoke as cached_data_reader.CachedDataReader, that's going to break every yaml config file in the universe and make them all that much harder to read... I'd rather live with having the list of import statements in the not-obvious listen.py. — Reply to this email directly, view it on GitHub <#391 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFO7V3XQHMOKQQDUTQT46FLZDPIGTAVCNFSM6AAAAABH6E653KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGU3TMMZVGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

veggiemike · 2024-05-22T00:19:08Z

It works fine with "from logger.readers import " - it's just that importing "" is considered poor form.

I misunderstood. This seems like a perfectly legit use case for import * to me. I'd just do it. :-)

davidpablocohn · 2024-05-22T00:20:54Z

Will do! Running tests now...

…

On Tue, May 21, 2024 at 5:19 PM Michael D Labriola ***@***.***> wrote: It works fine with "from logger.readers import *" - it's just that importing "*" is considered poor form. I misunderstood. This seems like a perfectly legit use case for import * to me. I'd just do it. :-) — Reply to this email directly, view it on GitHub <#391 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFO7V3UAV5TP7N5L35QTNILZDPQBDAVCNFSM6AAAAABH6E653KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGYZDSNJXHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

davidpablocohn · 2024-05-22T14:39:23Z

Pushed to master cbb5638..4b0b5cc

davidpablocohn added a commit that referenced this issue May 21, 2024

Include all readers/transforms/writers in listen.py. #391

a3ccb55

davidpablocohn closed this as completed May 22, 2024

KaarelRaeis-SOI pushed a commit to schmidtocean/openrvdas that referenced this issue Jul 26, 2024

Include all readers/transforms/writers in listen.py. OceanDataTools#391

b99f135

KaarelRaeis-SOI pushed a commit to schmidtocean/openrvdas that referenced this issue Aug 23, 2024

Include all readers/transforms/writers in listen.py. OceanDataTools#391

ebc26ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't use StripTransform in .yml file #391

Can't use StripTransform in .yml file #391

veggiemike commented May 19, 2024

webbpinner commented May 19, 2024

davidpablocohn commented May 19, 2024 via email

webbpinner commented May 19, 2024 •

edited

Loading

veggiemike commented May 19, 2024

veggiemike commented May 19, 2024

davidpablocohn commented May 21, 2024 via email

veggiemike commented May 21, 2024

davidpablocohn commented May 21, 2024 via email

veggiemike commented May 21, 2024

davidpablocohn commented May 22, 2024 via email

veggiemike commented May 22, 2024

davidpablocohn commented May 22, 2024 via email

davidpablocohn commented May 22, 2024

Can't use StripTransform in .yml file #391

Can't use StripTransform in .yml file #391

Comments

veggiemike commented May 19, 2024

webbpinner commented May 19, 2024

davidpablocohn commented May 19, 2024 via email

webbpinner commented May 19, 2024 • edited Loading

veggiemike commented May 19, 2024

veggiemike commented May 19, 2024

davidpablocohn commented May 21, 2024 via email

veggiemike commented May 21, 2024

davidpablocohn commented May 21, 2024 via email

veggiemike commented May 21, 2024

davidpablocohn commented May 22, 2024 via email

veggiemike commented May 22, 2024

davidpablocohn commented May 22, 2024 via email

davidpablocohn commented May 22, 2024

webbpinner commented May 19, 2024 •

edited

Loading