Fix - Frequencies should be Double#44
Conversation
| Nothing -> (T.pack . show) c' | ||
| initDf = empty & insertVector "Statistic" (V.fromList ["Count" :: T.Text, "Percentage (%)"]) | ||
| in L.foldl' (\df (col, k) -> insertVector (vText col) (V.fromList [k, k * 100 `div` total]) df) initDf counts | ||
| in L.foldl' (\df (col, k) -> insertVector (vText col) (V.fromList [k, k * (100 :: Double) / total]) df) initDf counts |
There was a problem hiding this comment.
I am still stuck on this issue, and would very much like to learn how to fix it.
I thought maybe doing this would help:
in L.foldl' (\df (col, k) -> insertVector (vText col) (V.fromList [toRowValue k, toRowValue $ k * 100.0 / total]) df) initDf counts
it did not and resulted compile error:
Could not deduce ‘Read DataFrame.Internal.Row.RowValue’
arising from a use of ‘insertVector’
from the context: Columnable a
bound by a pattern with constructor:
BoxedColumn :: forall a. Columnable a => V.Vector a -> Column,
in a case alternative
There was a problem hiding this comment.
I see. I don't quite like my approach here but for something to be Columnable it has to have a Read, Ord, Eq, and Show instance.
ghci> data Test = T Int deriving (Show, Eq, Read, Ord)
ghci> D.fromList [T 1, T 2]
[T 1,T 2]RowValue doesn't have a Read instance and you can't do deriving similar to what we did in the CLI cause it's a GADT.
So to unblock you can just make a stub Read instance for RowValue which throws UNIMPLEMENTED. I don't imagine someone will try to read row values in the near future. I don't even know where they are used.
Sorry this was so painful. It ended up suffering from silly design decisions.
There was a problem hiding this comment.
You can do a quick and dirty Read instance implementation:
instance Read RowValue where
readsPrec prec ('V': 'a': 'l': 'u': 'e': ' ' : a) = readsPrec prec a -- or perhaps Value $ readsPrec prec a ?At least, that's the legacy way of doing it. The newer better way is probably to do readPrec (instead of readsPrec) and use lexP or something
| Nothing -> throw $ ColumnNotFoundException name "frequencies" (map fst $ M.toList $ columnIndices df) | ||
| Just ((BoxedColumn (column :: V.Vector a))) -> let | ||
| counts = valueCounts @a name df | ||
| counts :: [(a, Double)] |
There was a problem hiding this comment.
Can we try leaving this as an int instead?
| counts = valueCounts @a name df | ||
| counts :: [(a, Double)] | ||
| counts = second fromIntegral <$> valueCounts @a name df | ||
| total :: Double |
| Nothing -> (T.pack . show) c' | ||
| initDf = empty & insertVector "Statistic" (V.fromList ["Count" :: T.Text, "Percentage (%)"]) | ||
| in L.foldl' (\df (col, k) -> insertVector (vText col) (V.fromList [k, k * 100 `div` total]) df) initDf counts | ||
| in L.foldl' (\df (col, k) -> insertVector (vText col) (V.fromList [toRowValue k, toRowValue $ k * 100.0 / total]) df) initDf counts |
There was a problem hiding this comment.
For percentage we can do the fromIntegral just before division rather than having the counts be doubles.
We can also format the string to round it for display:
import Text.Printf (printf)
printf "%.2f\n" (3.14159 :: Double)
There was a problem hiding this comment.
The printf will result percentage to print as such:
Percentage (%) | "44.26" | "31.74" | "0.02" | "11.09" | "12.88"
I could read them back as double priort to converting to RowValue which will result in this:
1 | Percentage (%) | 44.26 | 31.74 | 2.0e-2 | 11.09 | 12.88
I'll lookinto rounding the Doubles as an alterntive
There was a problem hiding this comment.
I used the round function that you already had there and here is the result:
--------------------------------------------------------------------------------
index | Statistic | <1H OCEAN | INLAND | ISLAND | NEAR BAY | NEAR OCEAN
------|----------------|-----------|----------|----------|----------|-----------
Int | Text | RowValue | RowValue | RowValue | RowValue | RowValue
------|----------------|-----------|----------|----------|----------|-----------
0 | Count | 9136 | 6551 | 5 | 2290 | 2658
1 | Percentage (%) | 44.26 | 31.74 | 2.0e-2 | 11.09 | 12.8
Frequencies should show doubles not Int, per Issue DataHaskell#42 Add `Read` instance to RowValue, to meet the `Columnable` constraint
a423fe8 to
d08d067
Compare
Round the decimals for the frequency function
2044c41 to
0667da0
Compare
Frequencies should show doubles, not Int, per #42