Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text to Columns: Fix dtype for empty arrays #275

Merged
merged 1 commit into from
Feb 8, 2024

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Feb 8, 2024

Issue

Fixes #274.

The speed up in a74ea06 fails if some variable doesn't appear in any row. The reason is that the table of row indices that contain this value will be empty and empty arrays have a dtype float.

In #274, this happened because tables are converted in batches of 5000 rows, and there were batches that didn't contain some values.

In another scenario, this would happen in the domain conversion would be applied on a data different from the one on which it was created.

Description of changes

Add dtype=int.

Includes
  • Code changes
  • Tests

@janezd janezd merged commit 8ec69b9 into biolab:master Feb 8, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Text to Columns: allow more than 1000 records as input
1 participant