Consistency for function naming in Duration by giwa · Pull Request #814 · apache/spark

giwa · 2014-05-18T10:36:10Z

toFormattedString should represent formatted millisecond like "10 ms" not simply give back "10"
toString should represent string of duration. It should simply give back string of millisecond.

Currently like this.

duration = Duration(10)
duration.toString()
>> "10 ms"
duration.toFormattedString()
>> "10"

Should be

duration = Duration(10)
duration.toString()
>> "10"
duration.toFormattedString()
>> "10 ms"

Please explain what does "formatted" mean? Why does it simply give millisecond with string?

toFormattedString should represent formatted millisecond like "10 ms" not simply give back "10" toString should represent string of duration. It should simply give back string of millisecond. Currently like this. duration = Duration(10) duration.toString() >> "10 ms" duration.toFormattedString() >> "10" Should be duration = Duration(10) duration.toString() >> "10" duration.toFormattedString() >> "10 ms" Please explain what does "formatted" mean? Why does it simply give milli second with string foramt

AmplabJenkins · 2014-05-18T10:37:58Z

Can one of the admins verify this patch?

ash211 · 2014-05-18T17:59:46Z

Where else in the Spark codebase are these methods called? If we're switching their meaning we need to make sure that callers are updated to expect the new formats.

giwa · 2014-05-18T18:47:32Z

@ash211 Thank you for your comment. After my second thought, this suggestion is not good. toString in Java world gives back human readable format.

The reason why I came up this question is I am writing wrapper of Duration in Python. Since you are familiar with Python, Could I ask some questions?

Do we need dunder str and dunder repr in Duration? If they are needed, what they should give back respectively?

str(Duration)
"10 ms"

repr(Duration)
"10"

BTW
toFormattedString is not used anywhere. Regarding to toString, I think it is not used anywhere in streaming, though I need more time to look deeply.

ash211 · 2014-05-19T02:58:05Z

I'm familiar with the Python language but much less so the conventions like str vs repr.

From this SO post though, it looks like str should be "10 ms", but also repr should be something like "Duration(10ms)"

http://stackoverflow.com/questions/1436703/difference-between-str-and-repr-in-python

### What changes were proposed in this pull request? Remove overriding the description method in the V2 file sources. `FileScan` already uses all the metadata to create the description, so adding the same fields to the overridden description creates duplicates. ### Why are the changes needed? Example parquet scan from the agg pushdown suite: Before: ``` +- BatchScan parquet file:/...[min(_3)#814, max(_3)#815, min(_1)#816, max(_1)#817, count(*)#818L, count(_1)#819L, count(_2)#820L, count(_3)#821L] ParquetScan DataFilters: [], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/..., PartitionFilters: [], PushedAggregation: [MIN(_3), MAX(_3), MIN(_1), MAX(_1), COUNT(*), COUNT(_1), COUNT(_2), COUNT(_3)], PushedFilters: [], PushedGroupBy: [], ReadSchema: struct<min(_3):int,max(_3):int,min(_1):int,max(_1):int,count(*):bigint,count(_1):bigint,count(_2)..., PushedFilters: [], PushedAggregation: [MIN(_3), MAX(_3), MIN(_1), MAX(_1), COUNT(*), COUNT(_1), COUNT(_2), COUNT(_3)], PushedGroupBy: [] RuntimeFilters: [] ``` After: ``` +- BatchScan parquet file:/...[min(_3)#814, max(_3)#815, min(_1)#816, max(_1)#817, count(*)#818L, count(_1)#819L, count(_2)#820L, count(_3)#821L] ParquetScan DataFilters: [], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/..., PartitionFilters: [], PushedAggregation: [MIN(_3), MAX(_3), MIN(_1), MAX(_1), COUNT(*), COUNT(_1), COUNT(_2), COUNT(_3)], PushedFilters: [], PushedGroupBy: [], ReadSchema: struct<min(_3):int,max(_3):int,min(_1):int,max(_1):int,count(*):bigint,count(_1):bigint,count(_2)... RuntimeFilters: [] ``` ### Does this PR introduce _any_ user-facing change? Just description change in explain output. ### How was this patch tested? Updated a few UTs to accommodate checking explain string. Closes #38229 from Kimahriman/remove-file-source-description. Authored-by: Adam Binford <adamq43@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? Remove overriding the description method in the V2 file sources. `FileScan` already uses all the metadata to create the description, so adding the same fields to the overridden description creates duplicates. ### Why are the changes needed? Example parquet scan from the agg pushdown suite: Before: ``` +- BatchScan parquet file:/...[min(_3)apache#814, max(_3)apache#815, min(_1)apache#816, max(_1)apache#817, count(*)#818L, count(_1)#819L, count(_2)#820L, count(_3)#821L] ParquetScan DataFilters: [], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/..., PartitionFilters: [], PushedAggregation: [MIN(_3), MAX(_3), MIN(_1), MAX(_1), COUNT(*), COUNT(_1), COUNT(_2), COUNT(_3)], PushedFilters: [], PushedGroupBy: [], ReadSchema: struct<min(_3):int,max(_3):int,min(_1):int,max(_1):int,count(*):bigint,count(_1):bigint,count(_2)..., PushedFilters: [], PushedAggregation: [MIN(_3), MAX(_3), MIN(_1), MAX(_1), COUNT(*), COUNT(_1), COUNT(_2), COUNT(_3)], PushedGroupBy: [] RuntimeFilters: [] ``` After: ``` +- BatchScan parquet file:/...[min(_3)apache#814, max(_3)apache#815, min(_1)apache#816, max(_1)apache#817, count(*)#818L, count(_1)#819L, count(_2)#820L, count(_3)#821L] ParquetScan DataFilters: [], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/..., PartitionFilters: [], PushedAggregation: [MIN(_3), MAX(_3), MIN(_1), MAX(_1), COUNT(*), COUNT(_1), COUNT(_2), COUNT(_3)], PushedFilters: [], PushedGroupBy: [], ReadSchema: struct<min(_3):int,max(_3):int,min(_1):int,max(_1):int,count(*):bigint,count(_1):bigint,count(_2)... RuntimeFilters: [] ``` ### Does this PR introduce _any_ user-facing change? Just description change in explain output. ### How was this patch tested? Updated a few UTs to accommodate checking explain string. Closes apache#38229 from Kimahriman/remove-file-source-description. Authored-by: Adam Binford <adamq43@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

giwa closed this May 18, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistency for function naming in Duration#814

Consistency for function naming in Duration#814
giwa wants to merge 1 commit intoapache:branch-1.0from
giwa:patch-1

giwa commented May 18, 2014

Uh oh!

AmplabJenkins commented May 18, 2014

Uh oh!

ash211 commented May 18, 2014

Uh oh!

giwa commented May 18, 2014

Uh oh!

ash211 commented May 19, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

giwa commented May 18, 2014

Uh oh!

AmplabJenkins commented May 18, 2014

Uh oh!

ash211 commented May 18, 2014

Uh oh!

giwa commented May 18, 2014

Uh oh!

ash211 commented May 19, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants