[BUG] number of bytes exceeds in rich's wrap_file() function #2307

777advait · 2022-05-28T16:53:35Z

i am using rich's progress bars to show download progress of files, even though the number of bytes always exceeds the "Content-Length" the file is downloaded perfectly. following is my code:

import os
from urllib.request import Request, urlopen
import ssl
from rich.console import Console
from rich.progress import wrap_file
from get_headers import get_header


console = Console()
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}


def getFilename(url):
    if not ".html" in url.split("/")[-1] and "text/html" in get_header("Content-Type", url):
        return url.split("/")[-1] + ".html"
    

    elif not ".json" in url.split("/")[-1] and "application/json" in get_header("Content-Type", url):
        return url.split("/")[-1] + ".json"

    
    else:
        return url.split("/")[-1]


def download_url(url: str, path: str):
    ssl._create_default_https_context = ssl._create_unverified_context
    url = url.rstrip("/")

    res = urlopen(Request(url, headers=headers))
    console.log(f"Requesting {url}")

    filename = getFilename(url)
    filesize = int(get_header("Content-Length", url))
    dest_path = os.path.join(path, filename)

    with wrap_file(res, filesize, description=filename) as data, open(dest_path, "wb") as f:
        for bytes in data:
            f.write(bytes)


if __name__=="__main__":
    download_url("https://api.github.com/", "./")

willmcgugan · 2022-05-29T08:58:02Z

The most likely explanation is that your filesize value is incorrect. You haven't provided a full working example, so I can't be certain. But it looks like you aren't getting the filesize from the response. try filesize = int(res.headers["Content-Length"])

777advait · 2022-05-29T16:08:03Z

thnx! that works, i think this issue can now be closed. but do u know a way through which i can always get Content-Length of the web page? coz some webpages don't have that header(eg. google.com)

github-actions · 2022-05-29T16:08:15Z

Did I solve your problem?

Why not buy the devs a coffee to say thanks?

willmcgugan · 2022-05-29T16:14:38Z

Not all web-servers return the Content-Length I'm afraid. Sometimes the server doesn't know in advance how large the file is, or doesn't report it for some other reason.

cta102 · 2024-03-31T20:24:44Z

Not all web-servers return the Content-Length I'm afraid. Sometimes the server doesn't know in advance how large the file is, or doesn't report it for some other reason.

Content-Length will be dropped on some servers as they will compress certain file types with one of them being .txt files where the server will gzip the stream, and the file size is unknown.

You can disable the gzip process by adding the "Accept-Encoding": None option within your request header,

The following is fairly typical of what I use:
with requests.get(fromAdd, stream=True, timeout=60, headers={"Accept-Encoding": None , "User-Agent": userAgent}) as r:

Alternately in the absence of a Content_length substitute a spinner or something for the progress bar if you want the gzip file (but remember to append .gzip so people realise what it is

777advait added the Needs triage label May 28, 2022

777advait closed this as completed May 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] number of bytes exceeds in rich's wrap_file() function #2307

[BUG] number of bytes exceeds in rich's wrap_file() function #2307

777advait commented May 28, 2022

willmcgugan commented May 29, 2022

777advait commented May 29, 2022

github-actions bot commented May 29, 2022

willmcgugan commented May 29, 2022

cta102 commented Mar 31, 2024

[BUG] number of bytes exceeds in rich's wrap_file() function #2307

[BUG] number of bytes exceeds in rich's wrap_file() function #2307

Comments

777advait commented May 28, 2022

willmcgugan commented May 29, 2022

777advait commented May 29, 2022

github-actions bot commented May 29, 2022

willmcgugan commented May 29, 2022

cta102 commented Mar 31, 2024