Normalization of NaN is not working as intended #23

mrckzgl · 2024-07-08T15:03:30Z

The data class tries to normalize na / nan values into empty strings.
This is done here:

Lines 126 to 130 in a668c98

    
           self.dataset_1 = self.dataset_1.astype(str) 
        
           self.dataset_1.fillna("", inplace=True) 
        
           if not self.is_dirty_er: 
        
               self.dataset_2 = self.dataset_2.astype(str) 
        
               self.dataset_2.fillna("", inplace=True)

but it does not work as intended.
When casting the DataFrame to str, all nan values will be replaced with the string "nan" and fillna does nothing anymore.
see:

>>> pandas.DataFrame.isnull(pandas.DataFrame([numpy.nan]).astype(float))
      0
0  True
>>> pandas.DataFrame.isnull(pandas.DataFrame([numpy.nan]).astype(str))
       0
0  False
>>>

Though, I do not know the best way to handle the intended conversion. One way could be to just change the order, first do fillna and later cast to string. But I don't know what happens if fillna('', inplace=True) is thrown against dtypes incompatible with / other than a string.

best

The text was updated successfully, but these errors were encountered:

mrckzgl · 2024-07-08T15:14:44Z

I also wonder if it is necessary and good practice to convert the dataframe to string, as then there is no distinction between na and empty string anymore ...

Nikoletos-K · 2024-07-19T07:58:05Z

Hello, and I'm sorry for the late reply.

Yeah you're right on your remarks. Indeed NaN handling has no effect this way. So changing rows I think will do the trick.

        # Fill NaN values with empty string
        self.dataset_1.fillna("", inplace=True)
        self.dataset_1 = self.dataset_1.astype(str)
        if not self.is_dirty_er:
            self.dataset_2.fillna("", inplace=True)
            self.dataset_2 = self.dataset_2.astype(str)

As far as the str transformation, it is necessary in order to assure that no other types will be handled. It caused issues in many other steps, and that's why we decided to handle it this way.

The above fix will be uploaded in the next release.

Thank you that you shared it with us!

Konstantinos

mrckzgl changed the title ~~Normalization of NaN~~ Normalization of NaN is not working as intended Jul 8, 2024

Nikoletos-K self-assigned this Jul 17, 2024

Nikoletos-K added a commit that referenced this issue Jul 19, 2024

Fixed issues #22 and #23;

b360040

Nikoletos-K closed this as completed Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalization of NaN is not working as intended #23

Normalization of NaN is not working as intended #23

mrckzgl commented Jul 8, 2024 •

edited

Loading

mrckzgl commented Jul 8, 2024

Nikoletos-K commented Jul 19, 2024 •

edited

Loading

Normalization of NaN is not working as intended #23

Normalization of NaN is not working as intended #23

Comments

mrckzgl commented Jul 8, 2024 • edited Loading

mrckzgl commented Jul 8, 2024

Nikoletos-K commented Jul 19, 2024 • edited Loading

mrckzgl commented Jul 8, 2024 •

edited

Loading

Nikoletos-K commented Jul 19, 2024 •

edited

Loading