Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tests for validating commit message. Issue #937 #1518

Closed
wants to merge 1 commit into from

Conversation

liscju
Copy link
Contributor

@liscju liscju commented Jul 31, 2015

To specification of patterns for commit message to match (from:
https://github.com/checkstyle/checkstyle/wiki/Release-notes-automation#validation-of-last-commit-message-to-reference-issue-number )
I added newline character before end of pattern because in my environment git always add it to the end of message. I cant find information in web about this behaviour , so i dont know if it is common for git or my env. Matter to discuss is extracting class like CommitMessageValidation where matching could be done, instead of doing it in test (lastCommitMessage.matches(ACCEPTED_COMMIT_MESSAGE_PATTERN) )
, if we extract it would be easier to test, but matching functionality is so small that it probably could stay in test.

@romani
Copy link
Member

romani commented Aug 12, 2015

Sorry for delay, i will review your update as soons as i return from vacation

@romani
Copy link
Member

romani commented Aug 26, 2015

  1. I update rules for validation at https://github.com/checkstyle/checkstyle/wiki/Release-notes-automation
    please review and update code .

Iterable call = new Git(repo).log().call();

please name object by noun

getLastCommitMessage()

please get last 10 commits from history, as some PRs contain several commits.

@Before
public void setUp() throws Exception {

please use BeforeClass annotation to cache revisions list between all Tests.

new format: ^Issue #\d_. ._ $ , ^Pull #\d_. ._ $
skip words: "^(minor|config|infra|doc|spelling): .*"

On test failure we should print to user all rules for message formats to let him fix message easily.

We need an option to ignore certain user from validation, user names of them should be hardcoded in UTs - that will be a list of users who can commit with any message. It is required as some time admins has to resolve problem quickly without wasting time on issue creation and version bump (during release) also done without following to format described above.

@liscju liscju force-pushed the utValidatingCommitIssue#937 branch from a842cd3 to 95ed391 Compare August 27, 2015 15:14
@liscju
Copy link
Contributor Author

liscju commented Aug 27, 2015

  1. DONE, updated code
  2. DONE, renamed to allPreviousCommits
  3. DONE, now get maximum 10 commits when not commit is excluded due to being authored by one of users from exluded list
  4. DONE, annotated with BeforeClass
  5. DONE, but with minor change
    "^Issue #\d_. ._ $" => "^Issue #\d_. ._$"

In https://github.com/checkstyle/checkstyle/wiki/Release-notes-automation before end of string ($)
in issue pattern there is a space i removed. I done it because when i tried different commits messages with extra whitespaces in my dev environments(mingw in windows8,git in lubuntu 15) i encountered that git trims ending whitespaces.

In my dev environments it always changes ending whitespaces in one single newline character. What is more ,even when i do 'git -m "msg"' it replaces "msg" with "msg\n". I tried searching on the web about this behaviour but i couldn't find anything, so i assume that this is standard, but im not 100% sure.

  1. DONE, on message commit format error it prints information what rules it must preserve
  2. DONE, in variable USERS_EXCLUDED_FROM_VALIDATION

But...
What i don't like about this solution is that, it checks last 10 commits, even for other user commits. Because of this ,it leads now to ci build failures. Maybe slightly better solution would be if we only checks for 10 last commits of current user of repo, but it still will not be working if next user commiting will be the same as the last one that commited to main repo( it will check his old commit too). It certainly wouldnt work if user commited more than 10 times in PR, but probably no one would do such a thing:P

I think the best solution to this problem is to check only commits that following cmd would produce:

$ git log someRemote/master..currentBranch

where someRemote is the name of remote that url is main repository of checkstyle and currentBranch is name of the branch user is currently in. I prepared POC with such a solution here:

liscju@68413e8

The only requirement for this solution, is that a user must have a remote that points at main checkstyle git url, and have fetched master branch of this remote. For developers this requirements must satisfy( i really have no idea how anyone would want to contribute and not have remote with checkstyle url, or not fetched master) but for CI not. For CI to work, CI scripts would have to add remote to checkstyle and fetch master before invoking any test. Certainly, it would slow down those tests a bit. Please let me know what do you think about this solution.

@romani
Copy link
Member

romani commented Aug 28, 2015

please update format for commit message to have ":" (instead of ".") as delimiter between issue number and message . Example - https://github.com/checkstyle/checkstyle/commits/master

What i don't like about this solution is that, it checks last 10 commits, even for other user commits.

it is ok, I will enforce that format manually for 10 commits and than your UTs will pass :).

where someRemote is the name of remote that url is main repository of checkstyle and currentBranch is name of the branch user is currently in.

I do not like that, we should not demand remote existence from user, your UT should work in all enviroments (Travis (by commit and by PR), local, ......). Lets be simple for first release.

@liscju liscju force-pushed the utValidatingCommitIssue#937 branch from 95ed391 to 9b116e8 Compare August 28, 2015 07:48
@liscju
Copy link
Contributor Author

liscju commented Aug 28, 2015

  1. DONE, format in https://github.com/checkstyle/checkstyle/wiki/Release-notes-automation probably should be updated too

@rdiachenko
Copy link
Contributor

I'm just curious about why should we hardcode excluded users names in such a way?

+    private static final List<String> USERS_EXCLUDED_FROM_VALIDATION =
+            Arrays.asList("Roman Ivanov");

Why don't we use a separate "ignore list" file for this?

@romani
Copy link
Member

romani commented Aug 28, 2015

Because config file is useful when you do not want to change code, but by having config file you will change code to change list of users. What is diff to change config file or test file ?
With all in one, all in incapsuleted in one place, no extra file with unclear content is hosted

@mkordas
Copy link
Contributor

mkordas commented Aug 28, 2015

Guys, why do you want to check 10 latest commits? What if PR has 11 of them? I think the more versatile solution is to check author of the last commit and in a while loop scan previous commits until someone's else commit reached. We very rarely have PRs with commits from multiple authors. What do you think?

@liscju
Copy link
Contributor Author

liscju commented Aug 28, 2015

@mkordas 👍

@romani
Copy link
Member

romani commented Aug 28, 2015

It depends thare were times that my commits had very sequential, that is why i tried to validate by limit. PR with more then 10 commits will raise a lot of attention from me for sure during code review.

Ok, if that is fun to implement that , lets make that as mode in test configured by field value, by default work as mkordas is suggested. Time will show who was right.

@mkordas
Copy link
Contributor

mkordas commented Aug 28, 2015

One more thing - we should probably give some hints whether letter after colon should be rather big or small and if dot at the end is suggested:

Issue #1: message.
Issue #1: Message.
Issue #1: message
Issue #1: Message

@romani
Copy link
Member

romani commented Aug 28, 2015

Lets start with simple implementation and enhance later on, there are lot of room for imprivements.
Michal, please update wiki as future enhacements.

@liscju liscju force-pushed the utValidatingCommitIssue#937 branch 2 times, most recently from 6b00dfb to 045015c Compare August 28, 2015 21:36
@liscju
Copy link
Contributor Author

liscju commented Aug 28, 2015

@romani , i implemented as requested @mkordas suggestions to resolve past commits. Variable "previousCommitsResolutionType" enables changing between method with counter and method resolving last commit author commits.

@romani
Copy link
Member

romani commented Aug 28, 2015

Failed tests: 
  CommitValidationTest.testCommitMessageHasProperStructure:165 Commit message: "Merge 045015c7fda8bca7f4302ed43185c46ca777162d into 1e7ae5866daead0d81be2bfcf7febbd1ca0fcbd8
" is invalid
Proper commit message should adhere to the following rules:
    1) Must match one of the following patterns:
        ^Issue #\d*: .*$
        ^Pull #\d*: .*$
        ^(minor|config|infra|doc|spelling): .*$
    2) It contains only one line

Tests run: 1523, Failures: 1, Errors: 0, Skipped: 1

we should not any failure , as in this PR there is only one commit and it is correct.
that is due to technical merge commit of git, see what is real commit is. We need to support that behavior.

As you have in correct message format in your PR, your PR should pas on Travis and Appveyor.


private static String getRulesForCommitMessageFormatting() {
return "Proper commit message should adhere to the following rules:\n"
+ "\t1) Must match one of the following patterns:\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use anywhere any tabs. Let's use spaces instead :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@liscju liscju force-pushed the utValidatingCommitIssue#937 branch 3 times, most recently from eb297ba to 86f9627 Compare August 29, 2015 10:24
@codecov-io
Copy link

Current coverage is 99.32%

Merging #1518 into master will not affect coverage as of dbef478

@@            master   #1518   diff @@
======================================
  Files          272     272       
  Stmts        12038   12038       
  Branches      2660    2660       
  Methods          0       0       
======================================
  Hit          11957   11957       
  Partial         32      32       
  Missed          49      49       

Review entire Coverage Diff as of dbef478


Uncovered Suggestions

  1. +0.06% via ...ranslationCheck.java#232...238
  2. +0.05% via ...ranslationCheck.java#220...225
  3. +0.04% via ...tTypeAwareCheck.java#264...268
  4. See 7 more...

Powered by Codecov. Updated on successful CI builds.

@liscju
Copy link
Contributor Author

liscju commented Aug 29, 2015

@romani , @mkordas :

  1. Changed tabs to spaces

"that is due to technical merge commit of git, see what is real commit is. We need to support that behavior."
I added OmitMergeCommitsIterable to omit merge commits when iterating log commits. Log command after most recent merge commit seems to iterate through PullRequest commits first before old master branch commits. Order of commits depends of output of 'git log' in branch fetched by:

  • git fetch origin +refs/pull/1518/merge ( where 1518 is this pull request id)

I added new commit to PR mostly to discuss the idea of skipping merge commits, i will remove it on reject or i will squash it to previous on acceptance.

@romani
Copy link
Member

romani commented Aug 29, 2015

  1. idea is good, please merge commits.

  2. as we have two parents its not determined what parent we will choose, I did not found that in documentation, please make a manual test (no need to push it) and make a branch and do commit to it with BAD message, after that checkout to master and do one more commit to it with GOOD commit message. After that merge (as NOT fastforward) your branch master and run your test - if it fail that mean that merged branch is returned next, if it successful it do ordering by date and we need more sophisticated logic.

@liscju liscju force-pushed the utValidatingCommitIssue#937 branch 3 times, most recently from 074cac7 to 7612088 Compare August 30, 2015 19:42
@liscju
Copy link
Contributor Author

liscju commented Aug 30, 2015

@romani

  1. DONE

  2. It first showed commit from master branch, commit from second branch came later. It looks like it resolved by dates showing from newest to oldest. It seems we need more complex solution.

My next idea is to distinguish between two situation:

  • HEAD commit is not merge commit , and we use just log to resolve commits in date order as usual
  • HEAD commit is merge commit - than we calculate two parent commits of merge. First parent of merge commit is branch in which we performed merge operation, second parent is merged branch. When we have those parents we can calculate:

git log firstParent..secondParent

which will return all commits in secondParent which wasnt in firstParent. If first parent is master ,we get all new merged commits. This solution however has this limitation that it will work properly only if HEAD commit is merge commit of master and other ( so merge was done being in master branch). It will not work in situation where for example contributor will send PR with latest commit being merge commit between his local branches.

I added next commit to PR as a POC, let me know what do you think

@romani
Copy link
Member

romani commented Aug 31, 2015

good, but what if we will just validate two branches from merge-commit ? master-branch has to be correct always, merged-branch will be correct or not.

So will just make extra validation over commits that are already in master , but there HAVE to be no violations. In this case logic will be simple (performance is not an issue there).

@liscju
Copy link
Contributor Author

liscju commented Aug 31, 2015

@romani , so we should generate two iterators of commits - one for (git log firstParent) and the second (git log secondParent), then filter commits by one of two previously discussed strategies( by counter or by last commit author). Is this what You meant ?

@romani
Copy link
Member

romani commented Aug 31, 2015

No filtering, just validate two collection of commits. If any have problem - report that

@liscju liscju force-pushed the utValidatingCommitIssue#937 branch 2 times, most recently from 2dd7292 to 1c1ecf6 Compare September 1, 2015 11:34
@liscju
Copy link
Contributor Author

liscju commented Sep 1, 2015

@romani now commits are checked from both branches, there is one build failure in orekit which seems that it is independent from this PR:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check (default-cli) on project orekit: Execution default-cli of goal org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check failed: An API incompatibility was encountered while executing org.apache.maven.plugins:maven-checkstyle-plugin:2.15:check: java.lang.NoSuchMethodError: com.puppycrawl.tools.checkstyle.PropertiesExpander.(Ljava/util/Properties;)V

@romani
Copy link
Member

romani commented Sep 1, 2015

please rebase on latest code, I fixed that problem. Ref #2065 .

@liscju
Copy link
Contributor Author

liscju commented Sep 1, 2015

@romani after rebase it succesfully passed tests, ready for next review

@romani
Copy link
Member

romani commented Sep 1, 2015

1)>RepoCommitsIteratorFactory

Please remove class and that by simple method.

2)>previousCommitsResolutionType

Please name it as xxxxxMode

3)>masterParent,mergeParent

As you do not know where is master and where is branch please name variables by general names

  1. please squash commits, if you two cimmits to show that you actually work, please do second commit with extra space some where in a code

@mkordas
Copy link
Contributor

mkordas commented Sep 1, 2015

One more thing - we should probably give some hints whether letter after colon should be rather big or small and if dot at the end is suggested:

Issue #1: message.
Issue #1: Message.
Issue #1: message
Issue #1: Message

Michal, please update wiki as future enhacements.

@romani, what should I exactly do? I think we should just agree on exact naming convention.

@romani
Copy link
Member

romani commented Sep 1, 2015

what should I exactly do? I think we should just agree on exact naming convention.

Lets vote , I am in favor of both of these: " Issue #1: message" , "Issue #1: Message". it is more natural to have lowercase letter after ":" . But uppercase looks more distinguishing after ":".

True to say, I do not really think we need to enforce any rules there.

@liscju
Copy link
Contributor Author

liscju commented Sep 1, 2015

@romani

  1. DONE - changed to method resolveRevCommitsPair() - name may be a bit misleading but i have no better idea how should i name it, i will appreciate comment on this
  2. DONE - enum CommitsResolutionMode , variable COMMITS_RESOLUTION_MODE
  3. DONE - changed to firstParent,secondParent
  4. DONE - squashed

@romani
Copy link
Member

romani commented Sep 2, 2015

Validate commit message has proper structure ,see Issue ...

please add to javadoc description of logic implementation that existing only in PR discussion (nuance with two branches to validate and other nuances)

private static List<String> getCommitsMessagesToCheck()
private static List<RevCommit> filterValidCommits(List<RevCommit> revCommits)

Please move all logic of filtering commits to Test methods, getCommitsMessagesToCheck should return list of RevCommit objects to Test method. We will grow amount of test in future, so extra logic should appear in Tests method only. Algorithm for getting list of commits should not change.

private static boolean isMergeCommit(RevCommit currentCommit)

please move declaration below its usage. Methods declaration should appear in class right after its usage and above first usage. Distance between first usage and declaration should be reasonably minimal.

private static List<RevCommit> getCommitsByCounter(Iterator<RevCommit> allPreviousCommits) {

I do not like when single object is named as collection "......s". Please rename or prove your intention.

public static void setUp() throws Exception

please move setUp and testXXXXXX methods above all methods in that class as this is "main" methods in a class and investigation of code should start from them

boolean isFoundNewLineCharacterAtTheEndOfMessage

please put a comment there to describe reason of this.

private static String getInvalidCommitMessageFormattingError(String invalidCommitMessage)

I prefer to report hash of commit together with message to be exact as commit messages could be the same.

@liscju liscju force-pushed the utValidatingCommitIssue#937 branch 2 times, most recently from d72fd3a to 0029a9a Compare September 5, 2015 09:33
@liscju
Copy link
Contributor Author

liscju commented Sep 5, 2015

@romani

  1. DONE, i hope im understandable:
/**
 * Validate commit message has proper structure.
 *
 * Commits to check are resolved from different places according
 * to type of commit in current HEAD. If current HEAD commit is
 * non-merge commit , previous commits are resolved due to current
 * HEAD commit. Otherwise if it is a merge commit, it will invoke
 * resolving previous commits due to commits which was merged.
 *
 * After calculating commits to start with ts resolves previous
 * commits according to COMMITS_RESOLUTION_MODE variable.
 * At default(BY_LAST_COMMIT_AUTHOR) it checks first commit author
 * and return all consecutive commits with same author. Second
 * mode(BY_COUNTER) makes returning first PREVIOUS_COMMITS_TO_CHECK_COUNT
 * commits after starter commit.
 *
 * Resolved commits are filtered according to author. If commit author
 * belong to list USERS_EXCLUDED_FROM_VALIDATION then this commit will
 * not be validated.
 *
 * Filtered commit list is checked if their messages has proper structure.
 *
 * @author <a href="mailto:piotr.listkiewicz@gmail.com">liscju</a>
 */
  1. DONE - Changed getCommitsMessagesToCheck to getCommitsToCheck returning List. Filtering moved to test methods.

  2. DONE

  3. DONE , Renamed from allPreviousCommits to previousCommitsIterator

  4. DONE

  5. DONE, comment looks like:

/**
* Git by default put newline character at the end of commit message. To check
* if commit message has a single line we have to make sure that the only
* newline character found is in the end of commit message.
*/
  1. DONE

Build failed because of method to deal with merge commits(it checks commits from both branches) - master first commit has bad structure:

junit.framework.AssertionFailedError: Commit d220d7c3cc74490c7f2923f2ea85d00133afe98c message: "Issue #2080: Fix typos in code
Fixes some `SpellCheckingInspection` inspection violations.
Description:
>Spellchecker inspection helps locate typos and misspelling in your code, comments and literals." is invalid
Proper commit message should adhere to the following rules:
    1) Must match one of the following patterns:
        ^Issue #\d*: .*$
        ^Pull #\d*: .*$
        ^(minor|config|infra|doc|spelling): .*$
    2) It contains only one line
    at junit.framework.Assert.fail(Assert.java:57)
    at junit.framework.Assert.assertTrue(Assert.java:22)
    at com.puppycrawl.tools.checkstyle.CommitValidationTest.testCommitMessageHasSingleLine(CommitValidationTest.java:134)

@mkordas
Copy link
Contributor

mkordas commented Sep 5, 2015

Two questions:

  • why we don't want to allow multiline commit descriptions? They are very beneficial from my experience.
  • why some messages must start with capital letter (Issue), while others are demanded to be lowercase (minor)? Let's be at least consistent.

@romani
Copy link
Member

romani commented Sep 5, 2015

@liscju , please remove "." from message format, it is not really useful, we do not do that now, we should not fight with ourself here.
It might be useful when message is multy sentenced, but I would rather append it later if we found enough reasons to demand it.

I will answer Michal questions later. I would merge you right now and later on enhance your test with extra validations as next PR.

@mkordas
Copy link
Contributor

mkordas commented Sep 5, 2015

@liscju
Copy link
Contributor Author

liscju commented Sep 6, 2015

@romani maybe i don't understand what You mean, by currently dot in message patterns is all about matching any character , not a dot character:

private static final String ISSUE_COMMIT_MESSAGE_REGEX_PATTERN = "^Issue #\\d*: .*$";
    private static final String PR_COMMIT_MESSAGE_REGEX_PATTERN = "^Pull #\\d*: .*$";
    private static final String OTHER_COMMIT_MESSAGE_REGEXPATTERN =
            "^(minor|config|infra|doc|spelling): .*$";

    private static final String ACCEPTED_COMMIT_MESSAGE_REGEX_PATTERN =
              "(" + ISSUE_COMMIT_MESSAGE_REGEX_PATTERN + ")|"
              + "(" + PR_COMMIT_MESSAGE_REGEX_PATTERN + ")|"
              + "(" + OTHER_COMMIT_MESSAGE_REGEXPATTERN + ")";

@mkordas all issues fixed, build failed because of violation in PropertiesExpander not touched by this PR

@romani
Copy link
Member

romani commented Sep 6, 2015

merged as FF.

question from @mkordas we will discuss in separate issue - #2128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants