Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text to Columns: allow more than 1000 records as input #274

Closed
wvdvegte opened this issue Jan 18, 2024 · 5 comments · Fixed by #275
Closed

Text to Columns: allow more than 1000 records as input #274

wvdvegte opened this issue Jan 18, 2024 · 5 comments · Fixed by #275

Comments

@wvdvegte
Copy link

Prototypes version

0.21.1

Orange version

3.36.2

Expected behavior

Text to Columns works with an arbitrary number of rows

Actual behavior

Text to Columns crashes when it receives more than 1000 rows at its input

Steps to reproduce the behavior

Connect a suitable data file to the input of Text to Columns that has more than 1000 rows

Additional info (worksheets, data, screenshots, ...)
@janezd
Copy link
Contributor

janezd commented Jan 22, 2024

It works for me - and I don't see a reason why it would fail.

But then, I only tried with a file with 1200 lines of a b c, which may be too simple and may mask something. Can you share your data?

@wvdvegte
Copy link
Author

Sure. The dataset is included in the attached workflow as an URL on Google Drive. It's a list of 5499 delayed train rides, where each ride has a list of comma-separated station abbreviations that we want to run through Text to Column. I have to correct myself though: the maximum number of rows that can be handled appears to be 5000, not 1000. In the attached workflow, if you set Data Sampler to sample 5001 or more rows, or if you bypass Data Sampler, the following error message pops up:

Error encountered in widget Text to Columns:

Traceback (most recent call last):
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecanvas/scheme/signalmanager.py", line 1180, in __process_next
    if self.__process_next_helper(use_max_active=True):
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecanvas/scheme/signalmanager.py", line 1218, in __process_next_helper
    self.process_node(selected_node)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecanvas/scheme/signalmanager.py", line 846, in process_node
    self.send_to_node(node, signals_in)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangewidget/workflow/widgetsscheme.py", line 806, in send_to_node
    self.process_signals_for_widget(node, widget, signals)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangewidget/workflow/widgetsscheme.py", line 820, in process_signals_for_widget
    process_signals_for_widget(widget, signals, workflow)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/functools.py", line 888, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangewidget/workflow/widgetsscheme.py", line 923, in process_signals_for_widget
    process_signal_input(input_meta, widget, signal, workflow)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/functools.py", line 888, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangewidget/workflow/widgetsscheme.py", line 886, in process_signal_input_default
    notify_input_helper(
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/functools.py", line 888, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangewidget/utils/signals.py", line 735, in set_input_helper
    handler(*args)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangewidget/utils/signals.py", line 208, in summarize_wrapper
    method(widget, value)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/prototypes/widgets/owtexttocolumns.py", line 144, in set_data
    self.apply.now()
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangewidget/gui.py", line 2007, in do_commit
    commit.call()
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangewidget/gui.py", line 1872, in call
    acting_func(instance)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/prototypes/widgets/owtexttocolumns.py", line 170, in apply
    extended_data = self.data.transform(new_domain)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/data/table.py", line 868, in transform
    return type(self).from_table(domain, self)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/data/table.py", line 832, in from_table
    table_conversion.convert(source, row_indices,
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/data/table.py", line 433, in convert
    out = array_conv.get_columns(source, source_indices,
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/data/table.py", line 313, in get_columns
    _compute_column(col, sourceri, shared_data=shared))
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/data/table.py", line 210, in _compute_column
    col = func(*args, **kwargs)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/data/util.py", line 89, in __call__
    return self.compute(data, shared_data)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/prototypes/widgets/owtexttocolumns.py", line 54, in compute
    col[indices] = 1
IndexError: arrays used as indices must be of integer (or boolean) type

railway rides.ows.zip

@janezd janezd transferred this issue from biolab/orange3-prototypes Feb 8, 2024
@janezd janezd transferred this issue from biolab/orange3 Feb 8, 2024
@janezd
Copy link
Contributor

janezd commented Feb 8, 2024

Thank you, this one was great.

If you need a quick fix (God knows when we'll release the next version of Prototypes), find the file owtexttocolumns.py and change

return {v: np.array([i for i, xs in enumerate(values) if v in xs])

to

return {v: np.array([i for i, xs in enumerate(values) if v in xs], dtype=int)```

That is, add , dtype=int.

It took me 10 minutes to fix this, and more than an hour to understand why is this sometimes needed. In short, the table is constructed in batches of 5000 rows (I didn't know this!) and the widget failed if some station code never appeared in some batch.

Interestingly, just yesterday I encountered the very same problem in a completely different place.

@wvdvegte
Copy link
Author

wvdvegte commented Feb 9, 2024

5000 seems a strange number in the context of computing - 4096 would have surprised me less ...

@janezd
Copy link
Contributor

janezd commented Feb 9, 2024

True. :)

But in this particular context there is no advantage in using round numbers like 4096, so any arbitrary number, including 5000, is OK. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants