New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-1480]Min Max Index Example for DataMap #1359
Conversation
Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/32/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/153/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/782/ |
/** | ||
* End of block notification when index got created. | ||
*/ | ||
void onBlockEndWithIndex(String blockId, String directoryPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this method required, why not onBlockEnd
is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onBlockEnd Method is called once the block is written. onBlockEndWithIndex is called once the index is also written after the carbondata is written out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not get the meaning of index. it is supposed to be independent of other indexes. I think onBlockEnd event is enough for writing the index file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But during onBlockEnd as the carbonIndex is not yet written, we wont be able to access the carbonIndex files. In the example i am gathering informations from CarbonIndex Files too.
Better to keep hook after writing Index Files also. In future we may need some more hooks at different points.
@@ -31,7 +31,8 @@ | |||
/** | |||
* It is called to load the data map to memory or to initialize it. | |||
*/ | |||
void init(String filePath) throws MemoryException, IOException; | |||
void init(String blockletIndexPath, String customIndexPath, String segmentId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filepath
supposed to be either index folder name or index file name, so I don't think this extra information is required here.
And also blockletIndexPath
is not supposed passed as we have carbonIndex exists in other datamap and we supposed to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Min Max Index creation like segment properties and other things i am taking input from regular carbonindex file too. So by design we can have one parameter as primitive index path other can be of the new custom index file path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be independent of other indexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example Along with Min and Max Information i am keeping few more information for building the BlockLet. Both indexes are independent but with the current example implementation i read the Min and Max index and and then read the carbonindex index also in order to get the column cardanality and segmentproperties. These values are used to form the blocklet used for pruning.
@Override | ||
public void init(AbsoluteTableIdentifier identifier, String dataMapName) { | ||
this.identifier = identifier; | ||
cache = CacheProvider.getInstance() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the use of this cache when don't use anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/171/ |
Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/47/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/801/ |
Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/105/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/229/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/860/ |
Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/113/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/237/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/868/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/238/ |
Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/114/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/869/ |
* @param blockletId | ||
* @return | ||
*/ | ||
List<Blocklet> pruneBlockletFromBlockId(FilterResolverIntf filterExp, int blockletId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is blockletId? I don't think this method is required in datamap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BlockletId is the output of Min Max DataMap and the same is passed to BlockletDataMap in order to form the complete blocklet.
Instead of declaring the method pruneBlockletFromBlockId in DataMap, the same can be made a local function to blockletId.
@sounakr can you make it simpler. Please add datamap that can just return blocklet details with block+blockletid. Lets work on integration on other PR. |
@sounakr I feel this same as Ravindra, let's make the example in a simplest way, so that developers can understand the concept of datamap and the usage of it in short time. |
@ravipesala and @jackylk , sure will make it simple. Will check if some more interfaces needs to be opened. |
Retest this please |
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes #1359
Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1094/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2310/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2545/ |
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes apache#1359
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1955/ |
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3154/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3189/ |
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes apache#1359
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2286/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3523/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3369/ |
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes #1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes #1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes #1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes apache#1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes apache#1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes apache#1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes apache#1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes apache#1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes #1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes #1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes apache#1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes apache#1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes #1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes apache#1359
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning.