Skip to content

Minor doc updates#9217

Merged
jon-wei merged 4 commits intoapache:masterfrom
suneet-s:docs-part-2
Jan 20, 2020
Merged

Minor doc updates#9217
jon-wei merged 4 commits intoapache:masterfrom
suneet-s:docs-part-2

Conversation

@suneet-s
Copy link
Contributor

Update kafka ingestion specs in tutorial docs to use the new inputSpec instead of parseSpec
Update first/ last aggregator docs to remove filterNullValues

"maxStringBytes" : <integer> # (optional, defaults to 1024),
"filterNullValues" : <boolean> # (optional, defaults to false)
"maxStringBytes" : <integer> # (optional, defaults to 1024)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @maytasm3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good

}
]
},
"parser": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the parser here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂ done

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@jon-wei jon-wei added this to the 0.17.0 milestone Jan 19, 2020
@jon-wei jon-wei merged commit 180c622 into apache:master Jan 20, 2020
suneet-s added a commit to suneet-s/druid that referenced this pull request Jan 20, 2020
* update string first last aggs

* update kafka ingestion specs in docs

* remove unnecessary parser spec
@suneet-s suneet-s deleted the docs-part-2 branch January 21, 2020 04:36
fjy pushed a commit that referenced this pull request Jan 21, 2020
* update string first last aggs

* update kafka ingestion specs in docs

* remove unnecessary parser spec
jon-wei added a commit to implydata/druid-public that referenced this pull request Jan 23, 2020
* add middle manager and indexer worker category to tier column of services view (apache#9158) (apache#9167)

* Graduation update for ASF release process guide and download links (apache#9126) (apache#9160)

* Graduation update for ASF release process guide and download links

* Fix release vote thread typo

* Fix pom.xml

* Add numeric nulls to sample data, fix some numeric null handling issues (apache#9154) (apache#9175)

* Fix LongSumAggregator comparator null handling

* Remove unneeded GroupBy test change

* Checkstyle

* Update other processing tests for new sample data

* Remove unused code

* Fix SearchQueryRunner column selectors

* Fix DimensionIndexer null handling and ScanQueryRunnerTest

* Fix TeamCity errors

* Add jackson-mapper-asl for hdfs-storage extension (apache#9178) (apache#9185)

Previously jackson-mapper-asl was excluded to remove a security
vulnerability; however, it is required for functionality (e.g.,
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator).

* Suppress CVE-2019-20330 for htrace-core-4.0.1 (apache#9189) (apache#9191)

CVE-2019-20330 was updated on 14 Jan 2020, which now gets flagged by the
security vulnerability scan. Since the CVE is for jackson-databind, via
htrace-core-4.0.1, it can be added to the existing list of security
vulnerability suppressions for that dependency.

* Fix deserialization of maxBytesInMemory (apache#9092) (apache#9170)

* Fix deserialization of maxBytesInMemory

* Add maxBytes check

Co-authored-by: Atul Mohan <atulmohan.mec@gmail.com>

* Update Kinesis resharding information about task failures (apache#9104) (apache#9201)

* fix refresh button (apache#9195) (apache#9203)

Co-authored-by: Vadim Ogievetsky <vadimon@gmail.com>

* allow empty values to be set in the auto form (apache#9198) (apache#9206)

Co-authored-by: Vadim Ogievetsky <vadimon@gmail.com>

* fix null handling for arithmetic post aggregator comparator (apache#9159) (apache#9202)

* fix null handling for arithmetic postagg comparator, add test for comparator for min/max/quantile postaggs in histogram ext

* fix

* Link javaOpts to middlemanager runtime.properties docs (apache#9101) (apache#9204)

* Link javaOpts to middlemanager runtime.properties docs

* fix broken link

* reword config links

* Tutorials use new ingestion spec where possible (apache#9155) (apache#9205)

* Tutorials use new ingestion spec where possible

There are 2 main changes
  * Use task type index_parallel instead of index
  * Remove the use of parser + firehose in favor of inputFormat + inputSource

index_parallel is the preferred method starting in 0.17. Setting the job to
index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent
of an index task

Instead of using a parserSpec, dimensionSpec and timestampSpec have been
promoted to the dataSchema. The format is described in the ioConfig as the
inputFormat.

There are a few cases where the new format is not supported
 * Hadoop must use firehoses instead of the inputSource and inputFormat
 * There is no equivalent of a combining firehose as an inputSource
 * A Combining firehose does not support index_parallel

* fix typo

* Fix TSV bugs (apache#9199) (apache#9213)

* working

* - support multi-char delimiter for tsv
- respect "delimiter" property for tsv

* default value check for findColumnsFromHeader

* remove CSVParser to have a true and only CSVParser

* fix tests

* fix another test

* Fix LATEST / EARLIEST Buffer Aggregator does not work on String column  (apache#9197) (apache#9210)

* fix buff limit bug

* add tests

* add test

* add tests

* fix checkstyle

* Doc update for the new input source and the new input format (apache#9171) (apache#9214)

* Doc update for new input source and input format.

- The input source and input format are promoted in all docs under docs/ingestion
- All input sources including core extension ones are located in docs/ingestion/native-batch.md
- All input formats and parsers including core extension ones are localted in docs/ingestion/data-formats.md
- New behavior of the parallel task with different partitionsSpecs are documented in docs/ingestion/native-batch.md

* parquet

* add warning for range partitioning with sequential mode

* hdfs + s3, gs

* add fs impl for gs

* address comments

* address comments

* gcs

* [0.17.0] Speed up String first/last aggregators when folding isn't needed. (apache#9181) (apache#9215)

* Speed up String first/last aggregators when folding isn't needed. (apache#9181)

* Speed up String first/last aggregators when folding isn't needed.

Examines the value column, and disables fold checking via a needsFoldCheck
flag if that column can't possibly contain SerializableLongStringPairs. This
is helpful because it avoids calling getObject on the value selector when
unnecessary; say, because the time selector didn't yield an earlier or later
value.

* PR comments.

* Move fastLooseChop to StringUtils.

* actually fix conflict correctly

* remove unused import

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>

* fix topn aggregation on numeric columns with null values (apache#9183) (apache#9219)

* fix topn issue with aggregating on numeric columns with null values

* adjustments

* rename

* add more tests

* fix comments

* more javadocs

* computeIfAbsent

* first/last aggregators and nulls (apache#9161) (apache#9233)

* null handling for numeric first/last aggregators, refactor to not extend nullable numeric agg since they are complex typed aggs

* initially null or not based on config

* review stuff, make string first/last consistent with null handling of numeric columns, more tests

* docs

* handle nil selectors, revert to primitive first/last types so groupby v1 works...

* Minor doc updates (apache#9217) (apache#9230)

* update string first last aggs

* update kafka ingestion specs in docs

* remove unnecessary parser spec

* [Backport] Update docs for extensions (apache#9218) (apache#9228)

Backport of apache#9218 to 0.17.0.

* More tests for range partition parallel indexing (apache#9232) (apache#9236)

Add more unit tests for range partition native batch parallel indexing.

Also, fix a bug where ParallelIndexPhaseRunner incorrectly thinks that
identical collected DimensionDistributionReports are not equal due to
not overriding equals() in DimensionDistributionReport.

* Support both IndexTuningConfig and ParallelIndexTuningConfig for compaction task (apache#9222) (apache#9237)

* Support both IndexTuningConfig and ParallelIndexTuningConfig for compaction task

* tuningConfig module

* fix tests

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
Co-authored-by: Chi Cao Minh <chi.caominh@gmail.com>
Co-authored-by: Atul Mohan <atulmohan.mec@gmail.com>
Co-authored-by: Vadim Ogievetsky <vadimon@gmail.com>
Co-authored-by: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
Co-authored-by: Maytas Monsereenusorn <52679095+maytasm3@users.noreply.github.com>
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants