Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split: Refactor for discrete values, add tests, rename to "Text to Columns" #253

Merged
merged 1 commit into from
Nov 22, 2023

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Jul 11, 2022

  • Added tests for computation
  • Changed treatment of nans for discrete values:
    • if value is nan, it doesn't mean it does not contain particular substrings, so newly created values must be nan.
    • This required too many changes wrt to string variables, so I created a separate class for one hot encoding of discrete values,
    • which are now computed faster (each distinct value is split only once and updated for all appearances),
    • so there is also need for shared computation because it only wastes memory.
  • Renamed widget and modules.
Includes
  • Code changes
  • Tests
  • Documentation

@janezd janezd force-pushed the split-polish branch 3 times, most recently from 63544d9 to 4a0756f Compare July 12, 2022 11:53
@ajdapretnar
Copy link
Contributor

Counting nans works for discrete but not for string.

Screenshot 2022-07-18 at 15 50 01

@janezd
Copy link
Contributor Author

janezd commented Jul 20, 2022

Counting nans works for discrete but not for string.

I don't get this one. I removed the column with counts of missing values.

Trying this I discovered and fixed another bug - if categorical variable was a meta, the widget crashed.

@janezd
Copy link
Contributor Author

janezd commented Oct 23, 2022

I don't get this one. I removed the column with counts of missing values.

Maybe I do - maybe I just fell into the same trap. Have you been testing Split or Text to Columns? I renamed the widget. :)

@janezd janezd force-pushed the split-polish branch 2 times, most recently from a88258a to 7484867 Compare October 23, 2022 14:30
@erikafuna
Copy link

@ajdapretnar and @janezd, we forgot about this one, didn't we?

@codecov-commenter
Copy link

Codecov Report

Attention: 149 lines in your changes are missing coverage. Please review.

Comparison is base (5ef0d45) 33.92% compared to head (a74ea06) 35.99%.
Report is 9 commits behind head on master.

Files Patch % Lines
orangecontrib/prototypes/widgets/owchatgptbase.py 0.00% 108 Missing ⚠️
orangecontrib/prototypes/widgets/owchatgpt.py 0.00% 20 Missing ⚠️
...contrib/prototypes/widgets/owchatgptconstructor.py 0.00% 18 Missing ⚠️
...rangecontrib/prototypes/widgets/owtexttocolumns.py 93.02% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #253      +/-   ##
==========================================
+ Coverage   33.92%   35.99%   +2.06%     
==========================================
  Files           9       10       +1     
  Lines        1515     1478      -37     
==========================================
+ Hits          514      532      +18     
+ Misses       1001      946      -55     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@janezd
Copy link
Contributor Author

janezd commented Nov 21, 2023

Not at all. I seldom go to sleep without thinking about it, and @ajdapretnar and I discuss it at least once a week, sometimes even daily.

Seriously, how did you stumble upon this? Yes, it could be merged, I think. I also updated it (now that biolab/orange3#6058 is merged and released). @ajdapretnar, check and merge at will. We can also move it from prototypes; as I recall, it is used for some fields from google forms.

@markotoplak
Copy link
Member

Failing tests are also not an issue here and we'll fix them later. @erikafuna found a question posted on FB regarding this widget and she asked me which widget that was. Then we searched and found this. Merging. If @ajdapretnar has any comments, let's fix them later.

@markotoplak markotoplak changed the title Split: Refactor for discrete values, add tests, rename Split: Refactor for discrete values, add tests, rename to "Text to Columns" Nov 22, 2023
@markotoplak markotoplak merged commit f08b42f into biolab:master Nov 22, 2023
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants