Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHistoryParser: Unparseable date #1193

Closed
aemyllius opened this issue Sep 26, 2016 · 9 comments · Fixed by #1196
Closed

GitHistoryParser: Unparseable date #1193

aemyllius opened this issue Sep 26, 2016 · 9 comments · Fixed by #1196
Labels
Milestone

Comments

@aemyllius
Copy link

aemyllius commented Sep 26, 2016

Git history retrieval hits an error when trying to parse the date from git log. On the whole index generation, it only hits this error in about 20% of the cases. The issue is that once this error is hit, the file is not indexed anymore. This is the crucial point.

Initially file seems to be analyzed by good analyzer. When checking indexing results, the file exists in index, but has no contents.

Sep 26, 2016 2:07:37 PM org.opensolaris.opengrok.index.DefaultIndexChangedListener fileAdd
INFO: Add: <FILE-NAME-CHANGED>.cs (CSharpAnalyzer)
Sep 26, 2016 2:07:37 PM org.opensolaris.opengrok.history.GitHistoryParser process
WARNING: Failed to parse author date: AuthorDate: Sat Dec 20 03:25:39 2014
java.text.ParseException: Unparseable date: "Sat Dec 20 03:25:39 2014"
        at java.text.DateFormat.parse(Unknown Source)
        at org.opensolaris.opengrok.history.GitHistoryParser.process(GitHistoryParser.java:93)
        at org.opensolaris.opengrok.history.GitHistoryParser.processStream(GitHistoryParser.java:63)
        at org.opensolaris.opengrok.util.Executor.exec(Executor.java:178)
        at org.opensolaris.opengrok.history.GitHistoryParser.parse(GitHistoryParser.java:153)
        at org.opensolaris.opengrok.history.GitRepository.getHistory(GitRepository.java:406)
        at org.opensolaris.opengrok.history.GitRepository.getHistory(GitRepository.java:399)
        at org.opensolaris.opengrok.history.FileHistoryCache.get(FileHistoryCache.java:496)
        at org.opensolaris.opengrok.history.HistoryGuru.getHistory(HistoryGuru.java:230)
        at org.opensolaris.opengrok.history.HistoryGuru.getHistory(HistoryGuru.java:190)
        at org.opensolaris.opengrok.history.HistoryGuru.getHistoryReader(HistoryGuru.java:174)
        at org.opensolaris.opengrok.analysis.AnalyzerGuru.populateDocument(AnalyzerGuru.java:284)
        at org.opensolaris.opengrok.index.IndexDatabase.addFile(IndexDatabase.java:606)
        at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:870)
        at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:835)
        at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:835)
        at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:835)
        at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:835)
        at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:835)
        at org.opensolaris.opengrok.index.IndexDatabase.update(IndexDatabase.java:383)
        at org.opensolaris.opengrok.index.IndexDatabase$1.run(IndexDatabase.java:168)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

Sep 26, 2016 2:07:38 PM org.opensolaris.opengrok.index.IndexDatabase addFile
INFO: Skipped file '<FILE-NAME-CHANGED>.cs' because the analyzer didn't understand it.
@tarzanek
Copy link
Contributor

since this is git related, I am marking as stopper
can you provide access to repo?
or some repo which reproduces the problem please?

tia
L

@tarzanek tarzanek added this to the 0.13 milestone Sep 27, 2016
@aemyllius
Copy link
Author

Unfortunatelly I cannot provide you with access to the git repo. Are there some commands that you would like be run on the repo? I did not have a lot of time to investigate where git is actually called, but I am suspecting that there might be a problem to retrieve the DateTime format of the repo, rather than the other way around. I have tried also setting everything to date format or to iso, that did also not help.

@tulinkry
Copy link
Contributor

Hi,
if you build OpenGrok from source, you can try changing line in GitRepository.java#L86
from the format
EEE MMM dd hh:mm:ss yyyy ZZZZ
to (without the timezone)
EEE MMM dd hh:mm:ss yyyy
and clean the affected index and index it again to see if it helps.

If yes, we're failing on documentation:

$ man git log
 ....
 --date=default is the default format, and is similar to --date=rfc2822, with a few exceptions:
 ·   there is no comma after the day-of-week
 ·   the time zone is omitted when the local time zone is used # <- !!!

and we should fix it.

Anyways @tarzanek we should consider using the --date option in the log command because user can set different date format than the repository expects and it will fail like this.

@aemyllius
Copy link
Author

I'll try to build grok from the sources and debug myself. Thanks for pointing out where the info is retrieved.

@tulinkry
Copy link
Contributor

tulinkry commented Sep 27, 2016

The history parser magic is here GitHistoryParser.java#L94-L100

And the command used to get the history is (found in GitRepository.java)

$ git log --abbrev-commit --abbrev=8 --name-only --pretty=fuller

@tarzanek
Copy link
Contributor

@tulinkry yes, we should use --date ... but looking at that man page, it seems we should accept both dates with timezone and without ...

@aemyllius
Copy link
Author

aemyllius commented Sep 27, 2016

Please note that if the git user defines in their own config a different format, a raw conversion might yield strange results, since a log without --date will render in the custom format defined in the git config file.

On the fix side, I have tried appending --date=rfc to the git log command and then using "EE, d MMM yyyy HH:mm:ss Z" ,as date format, and it seems to work much better.

@tulinkry
Copy link
Contributor

@aemyllius That's what I meant. Override the users config by using --date=something and then expect that format in the history parser.

I'd do that rather than staying without the --date option and accept both strings with/without the timezone.

@tulinkry tulinkry self-assigned this Sep 28, 2016
@aemyllius
Copy link
Author

I have explored quite a bit the formats, and the above mentioned solutios seems to work fine if you want to get also the time zone. Other formats have a very unpleasant mismatch beween the C++ specs that return nothing for the timezone if not defined, and the Java DateTime cconvertors, that use the letter Z to show that the timezone is not defined. I have compiled and used the modified jar, but it does not do the junit tests out of the box (not properly configured yet I guess), otherwise I would have created a patch myself :) .

tulinkry added a commit to tulinkry/OpenGrok that referenced this issue Sep 30, 2016
tulinkry added a commit to tulinkry/OpenGrok that referenced this issue Sep 30, 2016
tulinkry added a commit to tulinkry/OpenGrok that referenced this issue Sep 30, 2016
tulinkry added a commit to tulinkry/OpenGrok that referenced this issue Sep 30, 2016
vegetablemao referenced this issue Dec 22, 2016
Currently only support for git
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants