Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Pinot minion segment generation and push task. #6340

Merged
merged 3 commits into from
Dec 30, 2020

Conversation

xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Dec 9, 2020

Description

Adding Pinot minion segment generation and push task.

Below is a table conf example with the simple version of batch ingestion config. More configs can be found in PR #6332.

{
  "tableName": "baseballStats",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "segmentPushType": "APPEND",
    "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
    "schemaName": "baseball",
    "replication": "1"
  },
  "tenants": {
  },
  "tableIndexConfig": {
    "loadMode": "HEAP",
    "invertedIndexColumns": [
      "playerID",
      "teamID"
    ]
  },
  "metadata": {
    "customConfigs": {
    }
  },
  "ingestionConfig": {
    "batchIngestionConfig": {
      "segmentIngestionType": "APPEND",
      "segmentIngestionFrequency": "DAILY",
      "batchConfigMaps": [
        {
          "inputDirURI": "s3://<my-bucket>/batch/baseballStats/rawdata",
          "includeFileNamePattern": "glob:**/*.csv",
          "excludeFileNamePattern": "glob:**/*.tmp",
          "inputFormat": "csv",
          "outputDirURI": "s3://<my-bucket>/batch/baseballStats/segments",
          "push.mode": "metadata"
        }
      ],
      "segmentNameSpec": {},
      "pushSpec": {}
    }
  },
  "task": {
    "taskTypeConfigsMap": {
      "SegmentGenerationAndPushTask": {
      }
    }
  }
}

For local test, the bare minimum configs are just inputDirURI and inputFormat:

{
  "tableName": "baseballStats",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "segmentPushType": "APPEND",
    "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
    "schemaName": "baseball",
    "replication": "1"
  },
  "tenants": {
  },
  "tableIndexConfig": {
    "loadMode": "HEAP",
    "invertedIndexColumns": [
      "playerID",
      "teamID"
    ]
  },
  "metadata": {
    "customConfigs": {
    }
  },
  "ingestionConfig": {
    "batchIngestionConfig": {
      "segmentIngestionType": "APPEND",
      "segmentIngestionFrequency": "DAILY",
      "batchConfigMaps": [
        {
          "inputDirURI": "examples/minions/batch/baseballStats/rawdata",
          "inputFormat": "csv"
        }
      ],
      "segmentNameSpec": {},
      "pushSpec": {}
    }
  },
  "task": {
    "taskTypeConfigsMap": {
      "SegmentGenerationAndPushTask": {
      }
    }
  }
}

@xiangfu0 xiangfu0 changed the title Adding pinot minion segment creation tasks Adding Pinot minion segment generation and push task. Dec 9, 2020
@xiangfu0 xiangfu0 force-pushed the adding_pinot_minion_segment_creation_tasks_2 branch from f7c9ddb to 9d9fb90 Compare December 9, 2020 08:37
@codecov-io
Copy link

codecov-io commented Dec 9, 2020

Codecov Report

Merging #6340 (3828dd5) into master (1beaab5) will decrease coverage by 1.31%.
The diff coverage is 56.80%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6340      +/-   ##
==========================================
- Coverage   66.44%   65.13%   -1.32%     
==========================================
  Files        1075     1318     +243     
  Lines       54773    63979    +9206     
  Branches     8168     9299    +1131     
==========================================
+ Hits        36396    41671    +5275     
- Misses      15700    19341    +3641     
- Partials     2677     2967     +290     
Flag Coverage Δ
unittests 65.13% <56.80%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...e/pinot/broker/api/resources/PinotBrokerDebug.java 0.00% <0.00%> (-79.32%) ⬇️
...ot/broker/broker/AllowAllAccessControlFactory.java 71.42% <ø> (-28.58%) ⬇️
.../helix/BrokerUserDefinedMessageHandlerFactory.java 33.96% <0.00%> (-32.71%) ⬇️
...ker/routing/instanceselector/InstanceSelector.java 100.00% <ø> (ø)
...ava/org/apache/pinot/client/AbstractResultSet.java 66.66% <0.00%> (+9.52%) ⬆️
.../main/java/org/apache/pinot/client/Connection.java 35.55% <0.00%> (-13.29%) ⬇️
...inot/client/JsonAsyncHttpPinotClientTransport.java 10.90% <0.00%> (-51.10%) ⬇️
...not/common/assignment/InstancePartitionsUtils.java 73.80% <ø> (+0.63%) ⬆️
...common/config/tuner/NoOpTableTableConfigTuner.java 100.00% <ø> (ø)
...ot/common/config/tuner/RealTimeAutoIndexTuner.java 100.00% <ø> (ø)
... and 1140 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea0bfa0...3828dd5. Read the comment docs.

@xiangfu0 xiangfu0 force-pushed the adding_pinot_minion_segment_creation_tasks_2 branch 2 times, most recently from 2845119 to 4d7abbd Compare December 9, 2020 22:50
@xiangfu0
Copy link
Contributor Author

@laxmanchekka this is the PR of batch ingestion minion task. Just fyi.

@xiangfu0 xiangfu0 marked this pull request as draft December 10, 2020 21:29
@xiangfu0 xiangfu0 force-pushed the adding_pinot_minion_segment_creation_tasks_2 branch from 4d7abbd to 9ecf2f1 Compare December 16, 2020 06:52
@xiangfu0 xiangfu0 marked this pull request as ready for review December 16, 2020 22:44
@npawar
Copy link
Contributor

npawar commented Dec 17, 2020

From the code I see that the bare minimum config that has to be present in the batch config map is inputDirURI, inputFormat, outputDirURI. Is it possible to make outputDirURI also optional?

@xiangfu0 xiangfu0 force-pushed the adding_pinot_minion_segment_creation_tasks_2 branch 3 times, most recently from 8c9aa09 to 6101344 Compare December 18, 2020 05:57
@xiangfu0
Copy link
Contributor Author

From the code I see that the bare minimum config that has to be present in the batch config map is inputDirURI, inputFormat, outputDirURI. Is it possible to make outputDirURI also optional?

hmm, I think it's possible, we can just rely on the temp directory for segment build and then do tar push.

@xiangfu0 xiangfu0 force-pushed the adding_pinot_minion_segment_creation_tasks_2 branch from 6101344 to c2cf4bd Compare December 18, 2020 06:16
@xiangfu0
Copy link
Contributor Author

xiangfu0 commented Dec 18, 2020

From the code I see that the bare minimum config that has to be present in the batch config map is inputDirURI, inputFormat, outputDirURI. Is it possible to make outputDirURI also optional?

Updated the code to allow outputDirUri optional. See PR description for sample config

@xiangfu0 xiangfu0 force-pushed the adding_pinot_minion_segment_creation_tasks_2 branch 2 times, most recently from cc59e39 to f65dcc1 Compare December 18, 2020 10:10
@xiangfu0 xiangfu0 force-pushed the adding_pinot_minion_segment_creation_tasks_2 branch from f65dcc1 to 5ec925d Compare December 20, 2020 07:21
@xiangfu0 xiangfu0 force-pushed the adding_pinot_minion_segment_creation_tasks_2 branch from 5ec925d to 9b3d2bf Compare December 29, 2020 01:41
@xiangfu0 xiangfu0 force-pushed the adding_pinot_minion_segment_creation_tasks_2 branch from 9b3d2bf to 3828dd5 Compare December 29, 2020 20:15
@xiangfu0 xiangfu0 merged commit 93a4515 into master Dec 30, 2020
@xiangfu0 xiangfu0 deleted the adding_pinot_minion_segment_creation_tasks_2 branch December 30, 2020 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants