Skip to content

check paths used for shuffle intermediary data manager get and delete#9630

Merged
jihoonson merged 5 commits intoapache:masterfrom
clintropolis:sanitize-shuffle-file-path
Apr 7, 2020
Merged

check paths used for shuffle intermediary data manager get and delete#9630
jihoonson merged 5 commits intoapache:masterfrom
clintropolis:sanitize-shuffle-file-path

Conversation

@clintropolis
Copy link
Member

@clintropolis clintropolis commented Apr 6, 2020

Description

This PR fixes an issue spotted by LGTM where strings provided by an HTTP request to the ShuffleResource used by native batch indexing were not checked, allowing paths to be supplied to delete, and to a lesser extent, read files outside of the intended storage directory.

Both of the fix areas are flagged with: https://lgtm.com/rules/5970070/

The added tests would fail prior to this PR and serve to illustrate the bug. I moved logic of DataSchema. validateDatasourceName and RandomIdUtils into druid-core under TaskIdUtils so that this logic could be recycled to ensure that the supervisorTaskId are chill to use.


This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • IntermediaryDataManager
  • RandomIdUtils -> TaskIdUtils

@lgtm-com
Copy link

lgtm-com bot commented Apr 6, 2020

This pull request fixes 2 alerts when merging c009e2e into fc2897d - view on LGTM.com

fixed alerts:

  • 2 for Uncontrolled data used in path expression

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Nullable
public File findPartitionFile(String supervisorTaskId, String subTaskId, Interval interval, int partitionId)
{
TaskIdUtils.validateId("supervisorTaskId", supervisorTaskId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about moving the validation up the stack to right when the external data enters druid (i.e., in the API resource layer like ShuffleResource.getPartition()/deletePartitions())? For example, in the future, there may be clients of the lower level code that do not need to do the validation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems safer to me to be here where the problem is possible so that any future potential caller wouldn't need to worry about forgetting to do this and avoid the possibility.

@lgtm-com
Copy link

lgtm-com bot commented Apr 7, 2020

This pull request fixes 2 alerts when merging d120c28 into 79522f3 - view on LGTM.com

fixed alerts:

  • 2 for Uncontrolled data used in path expression

@jihoonson jihoonson merged commit d267b1c into apache:master Apr 7, 2020
@clintropolis clintropolis deleted the sanitize-shuffle-file-path branch April 7, 2020 18:52
clintropolis added a commit to clintropolis/druid that referenced this pull request Apr 7, 2020
…apache#9630)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
fjy pushed a commit that referenced this pull request Apr 7, 2020
…#9630) (#9640)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
clintropolis added a commit to implydata/druid-public that referenced this pull request Apr 7, 2020
…apache#9630)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
clintropolis added a commit to implydata/druid-public that referenced this pull request Apr 7, 2020
…apache#9630)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
jon-wei pushed a commit to implydata/druid-public that referenced this pull request Apr 9, 2020
…apache#9630) (#78)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
ccaominh pushed a commit to implydata/druid-public that referenced this pull request Apr 9, 2020
…apache#9630) (#77)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jun 12, 2020
…apache#9630)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
suneet-s pushed a commit to suneet-s/druid that referenced this pull request Jul 14, 2020
…apache#9630)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
 Conflicts:
	server/src/test/java/org/apache/druid/segment/indexing/DataSchemaTest.java
suneet-s pushed a commit to suneet-s/druid that referenced this pull request Jul 14, 2020
…apache#9630)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
 Conflicts:
	server/src/test/java/org/apache/druid/segment/indexing/DataSchemaTest.java
suneet-s added a commit to implydata/druid-public that referenced this pull request Jul 15, 2020
…ache#9630)(apache#9666) (#146)

* Indexing Service validates externally received taskId (apache#9666)

Addresses issues flagged by https://lgtm.com/rules/5970070/

* check paths used for shuffle intermediary data manager get and delete (apache#9630)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
 Conflicts:
	server/src/test/java/org/apache/druid/segment/indexing/DataSchemaTest.java

* Add validation for authenticator and authorizer name (apache#10106)

* Add validation for authorizer name

* fix deps

* add javadocs

* Do not use resource filters

* Fix BasicAuthenticatorResource as well

* Add integration tests

* fix test

* fix

Conflicts:
	core/src/main/java/org/apache/druid/common/utils/IdUtils.java
	extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisor.java
	extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/supervisor/KinesisSupervisor.java
	integration-tests/src/main/java/org/apache/druid/testing/utils/KafkaEventWriter.java
	 integration-tests/src/test/java/org/apache/druid/tests/indexer/AbstractKafkaIndexerTest.java

* Add commons-text to server dependencies

Co-authored-by: Clint Wylie <cwylie@apache.org>
suneet-s pushed a commit to suneet-s/druid that referenced this pull request Jul 16, 2020
…apache#9630)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
 Conflicts:
	server/src/test/java/org/apache/druid/segment/indexing/DataSchemaTest.java
jon-wei pushed a commit to implydata/druid-public that referenced this pull request Jul 20, 2020
…ache#9630)(apache#9666) (#151)

* Indexing Service validates externally received taskId (apache#9666)

Addresses issues flagged by https://lgtm.com/rules/5970070/

* check paths used for shuffle intermediary data manager get and delete (apache#9630)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
 Conflicts:
	server/src/test/java/org/apache/druid/segment/indexing/DataSchemaTest.java

* Add validation for authenticator and authorizer name (apache#10106)

* Add validation for authorizer name

* fix deps

* add javadocs

* Do not use resource filters

* Fix BasicAuthenticatorResource as well

* Add integration tests

* fix test

* fix

Conflicts:
	core/src/main/java/org/apache/druid/common/utils/IdUtils.java
	core/src/test/java/org/apache/druid/common/utils/IdUtilsTest.java
	extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/indexing/kafka/KafkaConsumerConfigs.java
	extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisor.java
	extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/supervisor/KinesisSupervisor.java
	indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTask.java
	indexing-service/src/main/java/org/apache/druid/indexing/worker/IntermediaryDataManager.java
	integration-tests/src/test/java/org/apache/druid/tests/indexer/AbstractKafkaIndexerTest.java
	server/src/main/java/org/apache/druid/segment/indexing/DataSchema.java

* Add commons-text to server dependencies

* remove unsupported endpoints from ITBasicAuthConfigurationTest

Co-authored-by: Clint Wylie <cwylie@apache.org>
jon-wei pushed a commit to implydata/druid-public that referenced this pull request Jul 20, 2020
…ache#9630)(apache#9666) (#147)

* Indexing Service validates externally received taskId (apache#9666)

Addresses issues flagged by https://lgtm.com/rules/5970070/

* check paths used for shuffle intermediary data manager get and delete (apache#9630)

* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
 Conflicts:
	server/src/test/java/org/apache/druid/segment/indexing/DataSchemaTest.java

* Add validation for authenticator and authorizer name (apache#10106)

* Add validation for authorizer name

* fix deps

* add javadocs

* Do not use resource filters

* Fix BasicAuthenticatorResource as well

* Add integration tests

* fix test

* fix

Conflicts:
	extensions-core/druid-basic-security/src/main/java/org/apache/druid/security/basic/authentication/endpoint/BasicAuthenticatorResource.java
	extensions-core/druid-basic-security/src/main/java/org/apache/druid/security/basic/authorization/endpoint/BasicAuthorizerResource.java
	extensions-core/druid-basic-security/src/test/java/org/apache/druid/security/authentication/CoordinatorBasicAuthenticatorResourceTest.java
	indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractTask.java
	integration-tests/src/test/java/org/apache/druid/tests/security/ITBasicAuthConfigurationTest.java
	server/src/main/java/org/apache/druid/segment/indexing/DataSchema.java
	server/src/test/java/org/apache/druid/segment/indexing/DataSchemaTest.java
	extensions-core/druid-basic-security/src/test/java/org/apache/druid/security/basic/authorization/endpoint/BasicAuthorizerResourceTest.java
	indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTask.java

* Add commons-text to server dependencies

* remove unsupported endpoints from ITBasicAuthConfigurationTest

Co-authored-by: Clint Wylie <cwylie@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants