Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3560] Fixed issues for Add Segment #3426

Closed

Conversation

manishnalla1994
Copy link
Contributor

Issue1 : When the format is given in uppercase, add segment fails with unknown format.
Solution1 : Made format case-insensitive.

Issue2 : The same path is being added repeatedly, blocked this operation.

Issue3 : Added validation for the folder not containing carbon files.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?

  • Any backward compatibility impacted?

  • Document update required?

  • Testing done
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    - How it is tested? Please attach test report.
    - Is it a performance related change? Please attach the performance test report.
    - Any additional information to help reviewers in testing this change.

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@manishnalla1994 manishnalla1994 force-pushed the AddSegmentFixes branch 2 times, most recently from 633553d to 7f032f1 Compare October 30, 2019 06:03
@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/713/

@CarbonDataQA
Copy link

Build Failed with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/721/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/720/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/714/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/721/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/722/

@manishnalla1994
Copy link
Contributor Author

@ravipesala Please review.

* @param segmentPath
* @return
*/
public static CarbonFile[] getListOfCarbonIndexFiles(String segmentPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the input parameter point to segment folder of transactional table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its the path to segment folder to get the carbon file.

CarbonFile segmentFolder = FileFactory.getCarbonFile(segmentPath);
CarbonFile[] indexFiles = segmentFolder.listFiles(new CarbonFileFilter() {
@Override public boolean accept(CarbonFile file) {
return (file.getName().endsWith(CarbonTablePath.INDEX_FILE_EXT) || file.getName()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move file.getName() to next line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


val allSegments = SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)

for (currSegment <- allSegments) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use allSegments.contains or .exist instead of for

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment to explain this validation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -92,6 +92,22 @@ case class CarbonAddLoadCommand(
val segmentPath = options.getOrElse(
"path", throw new UnsupportedOperationException("PATH is manadatory"))

val format = options.getOrElse("format", "carbondata")
val isCarbonFormat = format.equals("carbondata") || format.equals("carbon")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to line 107

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -542,7 +541,7 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
copy(path.toString, newPath)
checkAnswer(sql("select count(*) from addsegment1"), Seq(Row(30)))

sql(s"alter table addsegment1 add segment options('path'='$newPath', 'format'='parquet')").show()
sql(s"alter table addsegment1 add segment options('path'='$newPath', 'format'='PARQUET')").show()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is required just for the testing purpose of uppercase format, so that I need not add a new testcase just for case sensitivity.

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/725/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/732/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/733/

@manishnalla1994 manishnalla1994 force-pushed the AddSegmentFixes branch 3 times, most recently from 6a84526 to 44f9d68 Compare November 4, 2019 06:50
@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/730/

val isCarbonFormat = format.equalsIgnoreCase("carbondata") || format.equalsIgnoreCase("carbon")

// If in the given location no carbon files are found then we can throw an exception
if (isCarbonFormat && SegmentFileStore.getListOfCarbonIndexFiles(segmentPath).isEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this validation should come from SparkFiileFormat, so ideally there should not be any changes in this class for format level validation. Better move it to format level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravipesala Here we are just checking the presence of carbon files present in the current folder location and as there was a 'isCarbonFormat' below so I just shifted it above and used it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to check this, better to check weather index file and data file are matching:

  1. If it has merge index file, all data file should be present in merge index file
  2. otherwise, one index file for one data file should present
    please move this validation to another function and invoke it here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manishnalla1994 the condition earlier is not for validation, it is just to update the metadata. How is the behavior of SparkFiileFormat infer schema if there are no carbondata or carbonindex files? I feel it is better to correct in the down layer than adding validations on top layer.

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/737/

@CarbonDataQA
Copy link

Build Failed with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/738/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/734/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/740/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/742/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/746/

if (allSegments.exists(a =>
a.getPath != null && a.getPath.equalsIgnoreCase(segmentPath)
)) {
throw new AnalysisException(s"Cannot add the segment. This path is already in use by " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change message to path already exists in table status file, can not add same segment path repeatedly: $path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

val format = options.getOrElse("format", "carbondata")
val isCarbonFormat = format.equalsIgnoreCase("carbondata") || format.equalsIgnoreCase("carbon")

// If in the given location no carbon files are found then we can throw an exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If in the given location no carbon files are found then we can throw an exception
// If in the given location no carbon index files are found then we should throw an exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/747/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/753/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/755/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/769/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/775/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/777/

@ravipesala
Copy link
Contributor

LGTM

@manishnalla1994
Copy link
Contributor Author

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/785/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/791/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/793/

@ajantha-bhat
Copy link
Member

LGTM

@asfgit asfgit closed this in 9aa5973 Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants