Skip to content
This repository has been archived by the owner on Oct 15, 2022. It is now read-only.

cppreference_doc: Commit parsing scripts to this repository #874

Merged
merged 3 commits into from Sep 11, 2017

Conversation

p12tic
Copy link
Contributor

@p12tic p12tic commented May 5, 2017

Description of new Instant Answer, or changes

The strange and inconvenient setup of cppreference_doc fathead parsing plugins has been discussed several times: #781, #539. To recap, the current setup is this:

  • create a mirror of cppreference.com using the scripts within cppreference-doc repository
  • upload the resulting archive to cppreference.com
  • download the archive from cppreference.com (fetch.sh)
  • run the parsing scripts within the downloaded archive (parse.sh)

The first two tasks are performed by the cppreference.com maintainers, the last two - by DDG staff or contributors. As a result of this process, any change is inconvenient to develop, as the developer needs to work with both repositories.

Previously I have argued that it makes sense to keep maintaining the parsing scripts in single repository. However, it occurred to me that this shouldn't necessarily mean that the repositories need to be as coupled as they are. The scripts can be kept in both places and as long as there's cooperation and PRs are made then there are no downsides.

This PR copies the DDG parsing scripts from the cppreference-doc repository as of 89745148178ff699a27d58e3972b62fbbe90dbf2 and updates the scripts to use the local copy. output.txt does not change.

Related Issues and Discussions

(none)

People to notify

@moollaza, @sahildua2305, @manrajgrover, @pjhampton

Testing & Review

To be completed by Language Leader (or DDG Staff) when reviewing Pull Request

Pull Request

  • Title follows correct format (Specifies Instant Answer + Purpose)
  • Description contains a valid Instant Answer Page Link (e.g. https://duck.co/ia/view/my_ia)

Instant Answer Page (for new Instant Answers)

  • Instant Answer page is correctly filled out and contains:
    • One topic for the Search Space Language (Java, Python, Scala, Ruby, etc.)
    • One topic from: Reference, Help, Libraries, Tools
      • Documentation Fatheads are considered "Reference"
    • Description, Data source, and 2+ example queries
    • Perl Module (e.g. "DDG::Fathead::PerlDoc" -- we only need a name, not an actual file)
    • Source Name (for "More at <source_name>" link)
    • Source Domain (must contain http:// or https:// -- can be the same as Data Source)
    • Source Info (used as Subtitle for each Article -- usually matches the IA Name)
    • 'Skip Abstract' is checked off
    • Source ID (ping @moollaza to assign one, once Fathead is ready for Beta deploy)

Code

  • Uniformly indented, well commented
  • Fetch.sh and Parse.sh run without errors
  • Output contains no blank lines, or multi-line entries
  • Fathead Tests are passing (run $ duckpan test <fathead_id>)
    • Tester should report any failures

Pull Request Review Guidelines: https://docs.duckduckhack.com/programming-mission/pr-review.html


Instant Answer Page: https://duck.co/ia/view/cppreference_doc

@daxtheduck
Copy link

daxtheduck commented May 5, 2017

C++ Reference Docs

Description: C++ reference docs from cppreference.com

Example Query: cpp array cend

Tab Name: About

Source: http://cppreference.com

These are the important fields from the IA page. Please check these for errors or missing information and update the IA page


This is an automated message which will be updated as changes are made to the IA page

@p12tic
Copy link
Contributor Author

p12tic commented May 16, 2017

Ping :)

@pjhampton
Copy link
Contributor

@manrajgrover can you look into this, please? 😄

@manrajgrover
Copy link

@p12tic Hi, first of all, apologies for the delay. 😅 I was waiting for getting more inputs from @moollaza regarding this, since this also involves licenses in comments of code, before code reviewing it.

From what I understand from the case is, we have two repositories, one maintained for creating and uploading the archive to cppreference and one fathead where we download the archive and parse it using the scripts available in former repository. This makes sense to make it easy to update that script with respect to the changes made in that repository. Correct me if I'm wrong.

Since this is a case of dependency, I would suggest we use cppreference-doc as a dependency here with frequent version updates. This way, we keep the files in one repository and making only version updates when required. Again, correct me if I'm wrong.

@p12tic
Copy link
Contributor Author

p12tic commented May 20, 2017

From what I understand from the case is, we have two repositories, one maintained for creating and uploading the archive to cppreference and one fathead where we download the archive and parse it using the scripts available in former repository. This makes sense to make it easy to update that script with respect to the changes made in that repository. Correct me if I'm wrong.

Yes, that's correct.

Since this is a case of dependency, I would suggest we use cppreference-doc as a dependency here with frequent version updates. This way, we keep the files in one repository and making only version updates when required. Again, correct me if I'm wrong.

Previously I though the same too. The primary issue is that this setup is really inconvenient for no real gain. One needs to create an archive from the cppreference-doc repository in order to test even simple code change in DDG parsing scripts. Now one is able to simply change stuff in this repository, run parse.sh and immediately see the results. We still depend on cppreference-doc repository to make an update when the structure of the archive changes, but we don't depend on that repository for changing output.txt.

@manrajgrover
Copy link

There are two cases to it:

  1. If CPPReference makes changes to the archive and parsing scripts need to be updated.
  2. If output.txt file structure changes, and contributors need to change the parsing script in order to make the required change.

Let's solve first point. If the scripts are kept in cppreference-doc and in this repository, then we need to send a PR both places to keep things in sync. Why not just keep them in this repository only and make an update here if there are any changes in archive structure? Archiving does not depend on these scripts. Any testing for parsing script can be done locally before making a PR here.

Now for second point. If the scripts are both repository, then we need to send PR both ways to keep things sync. If they are here, one can modify them here by pulling the archive from the link. There won't be a dependency then to cppreference-doc.

Keeping scripts in one place, ie. this repository makes sense if above is true. Correct me if I'm wrong.

@p12tic
Copy link
Contributor Author

p12tic commented May 25, 2017

I should perhaps clarify one thing: the scripts are themselves are not that much coupled together. There are two parts: the scripts that parse the archive and scripts that make output.txt in the appropriate format. If the format of the archive changes, then the Duckduckgo contributors can simply copy-paste the cppreference-doc version here with no changes. If the format of output.txt changes, it's not necessary to make a second PR to cppreference-doc repository. That's the primary motivation of this PR -- that DuckDuckGo contributors can make changes even if the maintainer of cppreference-doc disappears or becomes non-cooperative or something like that.

Of course, the fact that I'm maintainer of cppreference-doc simplifies things a bit -- I don't plan to disappear or become non-cooperative :-)

For the foreseeable future, I can make PRs to zeroclickinfo-fathead whenever cppreference-doc archive parsing scripts are updated. Also, for the foreseeable future I can integrate changes from zeroclickinfo-fathead repository into cppreference-doc repository as long as I get notified of the PR. This solves both points as far as I can see. The only problem so far is that PRs to zeroclickinfo-fathead repository take rather long time to get approved and merged :-)

My own primary motivation was this and similar comments #866 (comment). I understood that keeping parsing scripts within zeroclickinfo-fathead repository was preferred by DuckDuckGo staff, so I wanted to align with this.

@p12tic
Copy link
Contributor Author

p12tic commented May 27, 2017

@manrajgrover Just to make it clear: I'm perfectly fine with this PR being rejected due to reasons outlined in your comment. However, whole purpose of the PR was to align to DuckDuckGo guideline to keep the parsing scripts within the repository (as outlined by @pjhampton here #781 and by @rasikapohankar here #866 (comment)). A rejection would go directly against these guidelines, so I'm a bit confused about what is the current DuckDuckGo stance with regard to script location :-)

@manrajgrover
Copy link

@p12tic Keeping scripts here make sense. I'm just clearing the flow of contribution after merging this PR.

Let's discuss about the licenses in the scripts in the current PR. I would need inputs from @pjhampton and @moollaza for the same.

@p12tic
Copy link
Contributor Author

p12tic commented Jun 25, 2017

@pjhampton, @moollaza Ping :-) What input is needed from my side? The code is released under GPL and as far as I can see that should be fine, as the repository already contains multiple scripts under GPL, see the results of this or this search.

@p12tic
Copy link
Contributor Author

p12tic commented Jul 18, 2017

Ping :)

@pjhampton
Copy link
Contributor

pjhampton commented Jul 18, 2017

Hey @p12tic

Sorry, this is currently on hold while we work through a lot of stuff on the goodies / spice repo... hence the lack of contact. TBH, I'm fine with this being accepted but @moollaza who is in charge of DDH, the final call is his.

I'm not sure if the GPL is compatible with the Apache 2.0 though? We might want to double check that. I really appreciate your commitment here @p12tic 👍

@moollaza
Copy link
Member

@p12tic Thanks! This LGTM 👍

@moollaza moollaza merged commit a720040 into duckduckgo:master Sep 11, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants