Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload of content using github.Repository.update_file produces garbled content #2972

Open
miha42-github opened this issue May 18, 2024 · 2 comments

Comments

@miha42-github
Copy link

Problem Introduction

When trying to upload a json stringified dict to a repository resulting content in the repository is garbled.

Example calling code with explanation

As per Repository.py inputs can either be bytes or a string that is ready to be encoded and therefore transmitted to a repository.

        content_to_transmit = json.dumps(obj)
        try:
            repo = self.github_instance.get_repo(f"{self.org_name}/{self.repo_name}")
            file_path = f"{container_name}/{self.object_files[container_name]}"
            # file_contents = repo.get_contents(file_path, ref=ref, sha=sha)
            write_response = repo.update_file(
                file_path, 
                f"Update object [{self.object_files[container_name]}]", 
                content=content_to_transmit,
                sha=sha, 
                branch=ref
            )
            return [
                True, 
                {
                    "status_msg": f"wrote object [{self.object_files[container_name]}] to container [{container_name}]",
                    "status_code": 200 
                },
                write_response
            ]
        except Exception as e:
            print(e)
            return [
                False, 
                {
                    "status_code":f"unable to write object [{self.object_files[container_name]}] to container [{container_name}] due to [{str(e)}]",
                    "status_msg": 503
                }, 
                str(e)
            ]

When this is called with just the string this error is returned 'bytes' object has no attribute 'encode' which caused me to explore github.Repository.update_file.

Problem resolution

As I inspected the file I found that the if block on line 2495 was improperly indented.

Version in main branch

if not isinstance(content, bytes):
            content = content.encode("utf-8")
content = b64encode(content).decode("utf-8")

As I looked at the code I realized that even if bytes or string was supplied the output would not be correct; therefore, I indented the if block and tried the call again.

Verified fixed version

if not isinstance(content, bytes):
           content = content.encode("utf-8")
           content = b64encode(content).decode("utf-8")

With that changed I tried the call again and was able to find the content in the repository to be correctly received, stored as json, and viewable in the repository. I've provided the original file, the fixed file and a patch file for review.

PyGithub.zip

@EnricoMi
Copy link
Collaborator

Given content is a string, it has to be converted into bytes (UTF8 encoded) and then base64 encoded (UTF8 string).
Given content is bytes, it only has to be base64 encoded (UTF8 string).

The error 'bytes' object has no attribute 'encode' sounds like it entered the if clause though content were bytes.

Can you please provide the full error message and the type of content?

There is a test case testUpdateFile in tests/Repository.py that tests with a string.

@miha42-github
Copy link
Author

In the code above content_to_update is a string and not a bytes object -- see below.

content_to_transmit = json.dumps(obj) # obj a Python dict(), json.dumps() converts to a string and not bytes object
        try:
            repo = self.github_instance.get_repo(f"{self.org_name}/{self.repo_name}")
            file_path = f"{container_name}/{self.object_files[container_name]}"
            # file_contents = repo.get_contents(file_path, ref=ref, sha=sha)
            write_response = repo.update_file(
                file_path, 
                f"Update object [{self.object_files[container_name]}]", 
                content=content_to_transmit, # I'm sending this string to repo.update_file()
                sha=sha, 
                branch=ref
            )

Unfortunately, since I'm in the middle of some critical development work getting back to reproducing the error won't happen for quite some time. However, I've included the patch which indented the if stanza on line 2495 in github.Repository.update_file. After I made the change I was able to proceed by passing the string into update_file() and saw the result as a JSON document in my target repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants