Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] number of bytes exceeds in rich's wrap_file() function #2307

Closed
777advait opened this issue May 28, 2022 · 5 comments
Closed

[BUG] number of bytes exceeds in rich's wrap_file() function #2307

777advait opened this issue May 28, 2022 · 5 comments

Comments

@777advait
Copy link

i am using rich's progress bars to show download progress of files, even though the number of bytes always exceeds the "Content-Length" the file is downloaded perfectly. following is my code:

import os
from urllib.request import Request, urlopen
import ssl
from rich.console import Console
from rich.progress import wrap_file
from get_headers import get_header


console = Console()
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}


def getFilename(url):
    if not ".html" in url.split("/")[-1] and "text/html" in get_header("Content-Type", url):
        return url.split("/")[-1] + ".html"
    

    elif not ".json" in url.split("/")[-1] and "application/json" in get_header("Content-Type", url):
        return url.split("/")[-1] + ".json"

    
    else:
        return url.split("/")[-1]


def download_url(url: str, path: str):
    ssl._create_default_https_context = ssl._create_unverified_context
    url = url.rstrip("/")

    res = urlopen(Request(url, headers=headers))
    console.log(f"Requesting {url}")

    filename = getFilename(url)
    filesize = int(get_header("Content-Length", url))
    dest_path = os.path.join(path, filename)

    with wrap_file(res, filesize, description=filename) as data, open(dest_path, "wb") as f:
        for bytes in data:
            f.write(bytes)


if __name__=="__main__":
    download_url("https://api.github.com/", "./")

issue

@willmcgugan
Copy link
Collaborator

The most likely explanation is that your filesize value is incorrect. You haven't provided a full working example, so I can't be certain. But it looks like you aren't getting the filesize from the response. try filesize = int(res.headers["Content-Length"])

@777advait
Copy link
Author

thnx! that works, i think this issue can now be closed. but do u know a way through which i can always get Content-Length of the web page? coz some webpages don't have that header(eg. google.com)

@github-actions
Copy link

Did I solve your problem?

Why not buy the devs a coffee to say thanks?

@willmcgugan
Copy link
Collaborator

Not all web-servers return the Content-Length I'm afraid. Sometimes the server doesn't know in advance how large the file is, or doesn't report it for some other reason.

@cta102
Copy link

cta102 commented Mar 31, 2024

Not all web-servers return the Content-Length I'm afraid. Sometimes the server doesn't know in advance how large the file is, or doesn't report it for some other reason.

Content-Length will be dropped on some servers as they will compress certain file types with one of them being .txt files where the server will gzip the stream, and the file size is unknown.

You can disable the gzip process by adding the "Accept-Encoding": None option within your request header,

The following is fairly typical of what I use:
with requests.get(fromAdd, stream=True, timeout=60, headers={"Accept-Encoding": None , "User-Agent": userAgent}) as r:

Alternately in the absence of a Content_length substitute a spinner or something for the progress bar if you want the gzip file (but remember to append .gzip so people realise what it is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants