Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue295 fixes #295 #296

Merged
merged 3 commits into from
Sep 22, 2022
Merged

Issue295 fixes #295 #296

merged 3 commits into from
Sep 22, 2022

Conversation

JessicaTegner
Copy link
Owner

This issue fixes a critical error, in download_pandoc, where the urls to the binaries stopped being populated.

This is a suggested solution, that I'll leave open for discussion before merging.

@kevalmorabia97
Copy link

kevalmorabia97 commented Sep 20, 2022

Thanks for the fix!

I always see another error when running the pypandoc download:

[INFO] Downloading pandoc from https://github.com/jgm/pandoc/releases/download/2.19.2/pandoc-2.19.2-1-amd64.deb ...
[INFO] Unpacking /tmp/tmpyhpyagw4/pandoc-2.19.2-1-amd64.deb to tempfolder...
[INFO] Copying pandoc to /tmp/runner/build-docs/bin ...
[INFO] Making /tmp/runner/build-docs/bin/pandoc executeable...
[INFO] Copying pandoc-citeproc to /tmp/runner/build-docs/bin ...
Error:  Didn't copy pandoc-citeproc

This is just a print statement and does the cause the program to crash but still is a bit annoying. Is this something that can be easily disabled?

@JessicaTegner
Copy link
Owner Author

@kevalmorabia97 this can be disabled indeed. See the README for instructions, specifically under the "logging messages" section.

Copy link

@kevalmorabia97 kevalmorabia97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix! I tested it on MacOS and it works.

@FedeMPouzols
Copy link

I came across this issue, thanks for the quick fix! I can confirm it works well on Ubuntu 22.04.

Querying api.github.com makes perfect sense to me, and will hopefully be more stable.

@JessicaTegner
Copy link
Owner Author

@FedeMPouzols good that it also works there.

Issue that I don't like, is that api.github.com has a ratelimiting of 60 requests / hour for non-authenticated users.

This might not seem like a big deal, but consider that before it was unlimite due to reading the html, so bigger pipelines might break if it's ran many times an hour (even our own tests, breaks sometime because of it).

I have not yet found a fix for this, that I like.
I do take suggestions.

@kevalmorabia97
Copy link

@JessicaTegner how about we use https://github.com/jgm/pandoc/releases/expanded_assets/2.19.2 urls? I think it might not work for latest version in which case we would need to fall back to what you have now. What do you think?

@JessicaTegner
Copy link
Owner Author

@kevalmorabia97 that would solve the rate limiting issue for every version except if using "latest".
Since we are only hitting api.github.com once, if we use "latest" as version, the rate limiting will still happen.

Reason for this, is that we are extracting the asset html download urls from the json data returned from the call.

If we could find another way of getting the latest version number, that didn't involve hitting api.github.com, your solution might just work perfectly.

@kevalmorabia97
Copy link

What if we cache the latest version and let the cache be alive for a minute, this would ensure we hit the api almost once per minute

@JessicaTegner
Copy link
Owner Author

that could also work, however I think I have found a solution between what you suggested and what I have found after doing some research. Do let me know what you think :)

Copy link

@kevalmorabia97 kevalmorabia97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is even better! I tested it in MacOS and it works

@JessicaTegner
Copy link
Owner Author

@kevalmorabia97 also works on Windows here.

I don't like that the tests are failing, but that seems to be for other reasons, not a cause of this pr.

I'll leave this open for a little, to see if other people has comments, while I look at better CI pipeline options.

@kevalmorabia97
Copy link

Sounds good. For now I'm passing a hard coded url to the pandoc_download function

@JessicaTegner JessicaTegner merged commit f06a23b into master Sep 22, 2022
@JessicaTegner JessicaTegner deleted the issue295 branch September 22, 2022 07:29
@enochtangg
Copy link

Is it possible to make a new release/tag for this working state?

response = urlopen(url)
content = response.read()
pattern = re.compile(r"pandoc\s*([\d.]+)")
version = re.search(pattern, content.decode("utf-8")).group(1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easier/faster to parse the returned URL here:

>>> from urllib.request import urlopen
>>> url = "https://github.com/jgm/pandoc/releases/latest"
>>> response = urlopen(url)
>>> response.url
'https://github.com/jgm/pandoc/releases/tag/2.19.2'

And once you have that URL, you can .replace('tag', 'expanded_assets') to get the one with links.

@davidbgk
Copy link

@enochtangg as a temporary option, you can update your dependencies to pypandoc @ https://github.com/JessicaTegner/pypandoc/archive/refs/heads/master.zip to try that it works on your environment.

@JessicaTegner
Copy link
Owner Author

@enochtangg that has now been done.

PyPandoc v1.9 - pypandoc-binary v1.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants