cppreference_doc: Commit parsing scripts to this repository #874
Conversation
C++ Reference DocsDescription: C++ reference docs from cppreference.com Example Query: cpp array cend Tab Name: About Source: http://cppreference.com These are the important fields from the IA page. Please check these for errors or missing information and update the IA page This is an automated message which will be updated as changes are made to the IA page |
Ping :) |
@manrajgrover can you look into this, please? 😄 |
@p12tic Hi, first of all, apologies for the delay. 😅 I was waiting for getting more inputs from @moollaza regarding this, since this also involves licenses in comments of code, before code reviewing it. From what I understand from the case is, we have two repositories, one maintained for creating and uploading the archive to cppreference and one fathead where we download the archive and parse it using the scripts available in former repository. This makes sense to make it easy to update that script with respect to the changes made in that repository. Correct me if I'm wrong. Since this is a case of dependency, I would suggest we use |
Yes, that's correct.
Previously I though the same too. The primary issue is that this setup is really inconvenient for no real gain. One needs to create an archive from the cppreference-doc repository in order to test even simple code change in DDG parsing scripts. Now one is able to simply change stuff in this repository, run parse.sh and immediately see the results. We still depend on cppreference-doc repository to make an update when the structure of the archive changes, but we don't depend on that repository for changing output.txt. |
There are two cases to it:
Let's solve first point. If the scripts are kept in Now for second point. If the scripts are both repository, then we need to send PR both ways to keep things sync. If they are here, one can modify them here by pulling the archive from the link. There won't be a dependency then to Keeping scripts in one place, ie. this repository makes sense if above is true. Correct me if I'm wrong. |
I should perhaps clarify one thing: the scripts are themselves are not that much coupled together. There are two parts: the scripts that parse the archive and scripts that make output.txt in the appropriate format. If the format of the archive changes, then the Duckduckgo contributors can simply copy-paste the cppreference-doc version here with no changes. If the format of output.txt changes, it's not necessary to make a second PR to cppreference-doc repository. That's the primary motivation of this PR -- that DuckDuckGo contributors can make changes even if the maintainer of cppreference-doc disappears or becomes non-cooperative or something like that. Of course, the fact that I'm maintainer of For the foreseeable future, I can make PRs to zeroclickinfo-fathead whenever cppreference-doc archive parsing scripts are updated. Also, for the foreseeable future I can integrate changes from zeroclickinfo-fathead repository into cppreference-doc repository as long as I get notified of the PR. This solves both points as far as I can see. The only problem so far is that PRs to zeroclickinfo-fathead repository take rather long time to get approved and merged :-) My own primary motivation was this and similar comments #866 (comment). I understood that keeping parsing scripts within zeroclickinfo-fathead repository was preferred by DuckDuckGo staff, so I wanted to align with this. |
@manrajgrover Just to make it clear: I'm perfectly fine with this PR being rejected due to reasons outlined in your comment. However, whole purpose of the PR was to align to DuckDuckGo guideline to keep the parsing scripts within the repository (as outlined by @pjhampton here #781 and by @rasikapohankar here #866 (comment)). A rejection would go directly against these guidelines, so I'm a bit confused about what is the current DuckDuckGo stance with regard to script location :-) |
@p12tic Keeping scripts here make sense. I'm just clearing the flow of contribution after merging this PR. Let's discuss about the licenses in the scripts in the current PR. I would need inputs from @pjhampton and @moollaza for the same. |
@pjhampton, @moollaza Ping :-) What input is needed from my side? The code is released under GPL and as far as I can see that should be fine, as the repository already contains multiple scripts under GPL, see the results of this or this search. |
Ping :) |
Hey @p12tic Sorry, this is currently on hold while we work through a lot of stuff on the goodies / spice repo... hence the lack of contact. TBH, I'm fine with this being accepted but @moollaza who is in charge of DDH, the final call is his. I'm not sure if the GPL is compatible with the Apache 2.0 though? We might want to double check that. I really appreciate your commitment here @p12tic 👍 |
@p12tic Thanks! This LGTM 👍 |
Description of new Instant Answer, or changes
The strange and inconvenient setup of cppreference_doc fathead parsing plugins has been discussed several times: #781, #539. To recap, the current setup is this:
The first two tasks are performed by the cppreference.com maintainers, the last two - by DDG staff or contributors. As a result of this process, any change is inconvenient to develop, as the developer needs to work with both repositories.
Previously I have argued that it makes sense to keep maintaining the parsing scripts in single repository. However, it occurred to me that this shouldn't necessarily mean that the repositories need to be as coupled as they are. The scripts can be kept in both places and as long as there's cooperation and PRs are made then there are no downsides.
This PR copies the DDG parsing scripts from the cppreference-doc repository as of 89745148178ff699a27d58e3972b62fbbe90dbf2 and updates the scripts to use the local copy. output.txt does not change.
Related Issues and Discussions
(none)
People to notify
@moollaza, @sahildua2305, @manrajgrover, @pjhampton
Testing & Review
To be completed by Language Leader (or DDG Staff) when reviewing Pull Request
Pull Request
Instant Answer Page (for new Instant Answers)
Code
$ duckpan test <fathead_id>
)Pull Request Review Guidelines: https://docs.duckduckhack.com/programming-mission/pr-review.html
Instant Answer Page: https://duck.co/ia/view/cppreference_doc