Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use StripTransform in .yml file #391

Closed
veggiemike opened this issue May 19, 2024 · 13 comments
Closed

Can't use StripTransform in .yml file #391

veggiemike opened this issue May 19, 2024 · 13 comments

Comments

@veggiemike
Copy link
Contributor

Ahoy @davidpablocohn!

I just realized I cannot use a StripTransform in v1.9.0. It looks like it's never imported by listen.py. A quick audit revealed the following lines appear to be missing from the top of listen.py... Any reason any of these were excluded (e.g., they don't actually work)?

from logger.transforms.delta_transform import DeltaTransform
from logger.transforms.derived_data_transform import DerivedDataTransform
from logger.transforms.format_transform import FormatTransform
from logger.transforms.interpolation_transform import InterpolationTransform
from logger.transforms.nmea_checksum_transform import NMEAChecksumTransform
from logger.transforms.regex_replace_transform import RegexReplaceTransform
from logger.transforms.select_fields_transform import SelectFieldsTransform
from logger.transforms.split_transform import SplitTransform
from logger.transforms.strip_transform import StripTransform
from logger.transforms.subsample_transform import SubsampleTransform

I've imported all of them on my server and things seem to work, although I've honestly only tested StripTransform.

@webbpinner
Copy link
Contributor

The way I've handled this in the past is to add the module arg when invoking a reader/transform/writer that's not already imported. i.e.

transforms:
    - class: SplitTransform
      module: logger.transforms.split_transform

@davidpablocohn
Copy link
Collaborator

davidpablocohn commented May 19, 2024 via email

@webbpinner
Copy link
Contributor

webbpinner commented May 19, 2024

Would something like this work:
Use the reader/transforms/writers/ __init__.py files to import all the readers/transforms/writers. i.e.

#logger.readers.__init__.py

from . import cached_data_reader
from . import logfile_reader
from . import network_reader
from . import tcp_reader
from . import udp_reader
from . import redis_reader
from . import serial_reader
from . import text_file_reader
from . import database_reader
from . import composed_reader

Then listen.py would just:

import readers from logger.readers
import transforms from logger.transforms
import writers from logger.writers

This approach would require any new readers/transforms/writers to be added to their corresponding __init__.py files when they are added to the repository but other files wanting to import all available readers/transforms/writers wouldn't have to change.

This doesn't take into account performance ramifications.

@veggiemike
Copy link
Contributor Author

The way I've handled this in the past is to add the module arg when invoking a reader/transform/writer that's not already imported. i.e.

transforms:
    - class: SplitTransform
      module: logger.transforms.split_transform

That hadn't occurred to me at first. In my mind that was just for local additions, like module: local.nautilus.UglyTransform.

@veggiemike
Copy link
Contributor Author

Would something like this work: Use the reader/transforms/writers/ __init__.py files to import all the readers/transforms/writers. i.e.

#logger.readers.__init__.py

from . import cached_data_reader
from . import logfile_reader
from . import network_reader
from . import tcp_reader
from . import udp_reader
from . import redis_reader
from . import serial_reader
from . import text_file_reader
from . import database_reader
from . import composed_reader

Then listen.py would just:

import readers from logger.readers
import transforms from logger.transforms
import writers from logger.writers

This approach would require any new readers/transforms/writers to be added to their corresponding __init__.py files when they are added to the repository but other files wanting to import all available readers/transforms/writers wouldn't have to change.

This doesn't take into account performance ramifications.

My kneejerk reaction is that I'd hate to have to specify the module all over the place in the .yml files for "built in" transforms, readers, or writers. (I haven't tested this, do I need to do it everywhere or just the first occurnce?) I feel like as a writer of a .yml file, I shouldn't have to know in advance which modules are imported for me already... I should be able to assume all the supplied modules are ready to use.

I like the idea of controlling it via the __init__ files instead of groups of imports in listen.py, if that can be done with minimal effort. I actually was surprised to see them all empty w/out any import or prep code in them.

Regarding performance, I'm unsure, but loading modules that don't have much/any init code in them shouldn't be that impactful. I haven't bumped into any dynamic code that runs at module load-time in any of the files of poked at. Once everything gets loaded once, and .pyc files created, I'd view it as as close to free as you can get with Python. Now, if we start trying to dynamically load things by globbing on the filesystem, that would get slow (although again, I don't know how slow).

So, I personally think we should either just add the missing imports to listen.py (with the intent to always add new imports three when we add new classes) or do it in a series of __init__ files, whichever is easier. I should mention, I did not double-check Reader/Writers when I came up with that list of missing Transforms.

@davidpablocohn
Copy link
Collaborator

davidpablocohn commented May 21, 2024 via email

@veggiemike
Copy link
Contributor Author

Okay - I'm going to have a go at implementing the init.py solution. Will keep you posted.

Cool! FYI, I haven't noticed any performance difference explicitly importing all the transform modules via listen.py

@davidpablocohn
Copy link
Collaborator

davidpablocohn commented May 21, 2024 via email

@veggiemike
Copy link
Contributor Author

The one change I've had to make (branch issue_391) is that instead of Webb's from . import cached_data_reader from . import logfile_reader I'm having to do from .cached_data_reader import CachedDataReader from .composed_reader import ComposedReader And in listen.py I've gone with from logger.readers import * Otherwise, in the code (and configs) you have to invoke it as cached_data_reader.CachedDataReader. I don't see a clean way to do it without "*" imports. Do either of you have ideas on a cleaner way to do it, or shall I go with this?

I was afraid of that. I did a quick little test myself the other day, couldn't get it right, and just figured my skills were rusty.

If we can't get "*" imports working, and we're left having to invoke as cached_data_reader.CachedDataReader, that's going to break every yaml config file in the universe and make them all that much harder to read... I'd rather live with having the list of import statements in the not-obvious listen.py.

@davidpablocohn
Copy link
Collaborator

davidpablocohn commented May 22, 2024 via email

@veggiemike
Copy link
Contributor Author

It works fine with "from logger.readers import " - it's just that importing "" is considered poor form.

I misunderstood. This seems like a perfectly legit use case for import * to me. I'd just do it. :-)

@davidpablocohn
Copy link
Collaborator

davidpablocohn commented May 22, 2024 via email

@davidpablocohn
Copy link
Collaborator

Pushed to master cbb5638..4b0b5cc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants