Skip to content
This repository has been archived by the owner on May 31, 2020. It is now read-only.

Bytearray expandtabs #729

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
67 changes: 64 additions & 3 deletions python/common/org/python/types/Bytes.java
@@ -1,5 +1,6 @@
package org.python.types;

import java.io.ByteArrayOutputStream;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
Expand Down Expand Up @@ -971,10 +972,70 @@ public org.python.Object endswith(org.python.Object suffix, org.python.Object st
}

@org.python.Method(
__doc__ = "B.expandtabs(tabsize=8) -> copy of B\n\nReturn a copy of B where all tab characters are expanded using spaces.\nIf tabsize is not given, a tab size of 8 characters is assumed."
__doc__ = "B.expandtabs(tabsize=8) -> copy of B\n\nReturn a copy of B where all tab characters are expanded using spaces.\nIf tabsize is not given, a tab size of 8 characters is assumed.",
default_args = {"tabsize"}
)
public org.python.Object expandtabs(java.util.List<org.python.Object> args, java.util.Map<java.lang.String, org.python.Object> kwargs, java.util.List<org.python.Object> default_args, java.util.Map<java.lang.String, org.python.Object> default_kwargs) {
throw new org.python.exceptions.NotImplementedError("bytes.expandtabs has not been implemented.");
public org.python.Object expandtabs(org.python.types.Object tabsize) {
// CPython implementation https://github.com/python/cpython/blob/master/Objects/stringlib/transmogrify.h#L21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that this link's L21 line number will become outdated as the master branch changes – it will be better to link it to a tag (v3.6.4) or a specific commit, either of which won't change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very neat suggestion -never thought about it this way before-, Thanks! I'll fix it

// TODO: check overflows
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is needed checking here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the C implementation, it checks for integer overflows. I'm not sure if we should check it here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, tbh I'm having trouble to follow the logic in this implementation. Maybe you got too much inspiration from the C implementation? =)
It's a bit strange that this is so different from the one for Str, have you checked that one? Why is this so much more involved, is this handling some cases which the other doesn't?


if (this.value == null) {
return null;
}

if (this.value.length == 0) {
return new org.python.types.Bytes(new byte[0]);
}

int tabsize_int = 8;

if (tabsize != null) {
tabsize_int = (int) ((org.python.types.Int) tabsize).value;
}


int output_size = 0;
for (int i = 0, j = 0; i < this.value.length; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this for loop is accomplishing, could you clarify?

It looks like it's meant to compute the output_size, but the code below doesn't seem to be using it in a meaningful way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I'm sorry about that I've forgotten it from an older version of the code, the C implementation first calculates the size of the output then makes a buffer accordingly, we don't need that now.

byte c = this.value[i];

if (c == '\t') {
if (tabsize_int > 0) {
j += (tabsize_int - (j % tabsize_int));
}
} else {
j++;

if (c == '\n' || c == '\r') {
output_size += j;
j = 0;
}
}
}

StringBuilder sb = new StringBuilder();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably not use StringBuilder here -- a Bytes object contains only bytes (8 bits) and is backed by a byte[] (Java byte array), while a String contain Unicode codepoints (by default using UTF-16 charset, which means double the usage of memory).

You should probably just use an ArrayList, initializing it with capacity of this.value.length, let it expand to the right size and at the end convert it back to an array of byte primitives.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information, I updated it +1


for (int i = 0, j = 0; i < this.value.length; i++) {
byte c = this.value[i];

if (c == '\t') {
if (tabsize_int > 0) {
output_size = tabsize_int - (j % tabsize_int);
j += output_size;
while (output_size-- > 0) {
sb.append(' ');
}
}
} else {
j++;
sb.append((char)c);

if (c == '\n' || c == '\r') {
j = 0;
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion from ArrayList<Byte> to byte[] would probably be useful as a listToArray method, to be reused by other code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should I place that new method?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can start by just creating a public static method in this same class.
Another option is to create a BytesUtils.java file for adding static methods like these, since they are probably going to be used in ByteArray.java as well.


return new Bytes(sb.toString());
}

@org.python.Method(
Expand Down
14 changes: 14 additions & 0 deletions tests/datatypes/test_bytes.py
Expand Up @@ -226,6 +226,20 @@ def test_count(self):
print(b'abcabca'.count(97, 3, [])) #Test Slicing Error on End
""", exits_early=True)

def test_expandtabs(self):
self.assertCodeExecution("""
print(b''.expandtabs())
print(b'\t'.expandtabs())
print(b'a\t'.expandtabs())
print(b'a\t'.expandtabs())
print(b'a \t'.expandtabs(tabsize=3))
print(b'a \t '.expandtabs(tabsize=3))
print(b'a \t '.expandtabs(tabsize=3))
print(b'a \t b'.expandtabs(tabsize=-1))
print(b'a \t b \t'.expandtabs(tabsize=-1))
print(b'a \t b\t '.expandtabs(tabsize=4))
""")

def test_find(self):
self.assertCodeExecution("""
print(b''.find(b'a'))
Expand Down