Skip to content

Commit

Permalink
Update zip.c
Browse files Browse the repository at this point in the history
  • Loading branch information
kuba-- committed Jul 7, 2022
1 parent e14fef2 commit 4215161
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions src/zip.c
Original file line number Diff line number Diff line change
Expand Up @@ -692,12 +692,12 @@ static int zip_central_dir_delete(mz_zip_internal_state *pState,
int end = 0;
int d_num = 0;
while (i < entry_num) {
while ((!deleted_entry_index_array[i]) && (i < entry_num)) {
while ((i < entry_num) && (!deleted_entry_index_array[i])) {
i++;
}
begin = i;

while ((deleted_entry_index_array[i]) && (i < entry_num)) {
while ((i < entry_num) && (deleted_entry_index_array[i])) {
i++;
}
end = i;
Expand All @@ -706,14 +706,14 @@ static int zip_central_dir_delete(mz_zip_internal_state *pState,

i = 0;
while (i < entry_num) {
while ((!deleted_entry_index_array[i]) && (i < entry_num)) {
while ((i < entry_num) && (!deleted_entry_index_array[i])) {
i++;
}
begin = i;
if (begin == entry_num) {
break;
}
while ((deleted_entry_index_array[i]) && (i < entry_num)) {
while ((i < entry_num) && (deleted_entry_index_array[i])) {
i++;
}
end = i;
Expand Down Expand Up @@ -759,21 +759,21 @@ static ssize_t zip_entries_delete_mark(struct zip_t *zip,
}

while (i < entry_num) {
while ((entry_mark[i].type == MZ_KEEP) && (i < entry_num)) {
while ((i < entry_num) && (entry_mark[i].type == MZ_KEEP)) {
writen_num += entry_mark[i].lf_length;
read_num = writen_num;
i++;
}

while ((entry_mark[i].type == MZ_DELETE) && (i < entry_num)) {
while ((i < entry_num) && (entry_mark[i].type == MZ_DELETE)) {
deleted_entry_flag_array[i] = MZ_TRUE;
read_num += entry_mark[i].lf_length;
deleted_length += entry_mark[i].lf_length;
i++;
deleted_entry_num++;
}

while ((entry_mark[i].type == MZ_MOVE) && (i < entry_num)) {
while ((i < entry_num) && (entry_mark[i].type == MZ_MOVE)) {
move_length += entry_mark[i].lf_length;
mz_uint8 *p = &MZ_ZIP_ARRAY_ELEMENT(
&pState->m_central_dir, mz_uint8,
Expand Down

6 comments on commit 4215161

@tansy
Copy link

@tansy tansy commented on 4215161 Sep 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it change anything?
Logically (entry_mark[i].type == MZ_MOVE) && (i < entry_num) <=> (i < entry_num) && (entry_mark[i].type == MZ_MOVE).

Conjunction is commutative.
p ⋀ q ⇔ q ⋀ p

#include <stdio.h>

int main()
    {
    int p, q, y;

    printf("\np ^ q:\n");
    for(p=0; p<=1; p++)
        for(q=0; q<=1; q++)
            {
            y = (p && q);
            printf("%d & %d = %d\n", p, q, y);
            }

    printf("\nq ^ p:\n");
    for(q=0; q<=1; q++)
        for(p=0; p<=1; p++)
            {
            y = (q && p);
            printf("%d & %d = %d\n", q, p, y);
            }

    return 0;
    }
$ gcc -o pnq pnq.c; ./pnq

p ^ q:
0 & 0 = 0
0 & 1 = 0
1 & 0 = 0
1 & 1 = 1

q ^ p:
0 & 0 = 0
0 & 1 = 0
1 & 0 = 0
1 & 1 = 1

@kuba--
Copy link
Owner Author

@kuba-- kuba-- commented on 4215161 Sep 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it change anything? It's not about logic rules (this theory knows every software engineer, or at least I believe should know!!!). It's about avoiding out of index failure!
Look at:

(i < entry_num) && (entry_mark[i].type == MZ_MOVE)

If i is >= entry_num then if you check (i < entry_num) first, then it's false and we do not need to check rest of the condition (unless you really want to compile with flags forcing to check all condition terms).
But if you check entry_mark[i].type and i is >= entry_num then you will get null pointer exception or index out of range segfault, so checking (i < entry_num) after (entry_mark[i].type == MZ_MOVE) does not make sense.

I hope it's clear

@tansy
Copy link

@tansy tansy commented on 4215161 Sep 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dare to disagree. This is diff of assembled zip.c-e14fef2-1 and zip.c-4215161-2.c"

--- zip_zip_central_dir_delete-e14fef2-1.s	2023-09-10 22:34:09.781698309 +0000
+++ zip_zip_central_dir_delete-4215161-2.s	2023-09-10 22:34:14.239642577 +0000
@@ -1,4 +1,4 @@
-	.file	"zip_zip_central_dir_delete-e14fef2-1.c"
+	.file	"zip_zip_central_dir_delete-4215161-2.c"
 	.section	.text.unlikely,"ax",@progbits
 	.type	mz_zip_writer_create_zip64_extra_data, @function
 mz_zip_writer_create_zip64_extra_data:

It doesn't support your thesis.

There is a script attached to the post showing my point.

diff-4215161-2--e14fef2-1.zip

@codylico
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outsider comment; feel free to ignore:

@tansy, I noticed that your script uses -O2 for compilation. The optimizer likely saw the two conditions and concluded two things:

  • (i < entry_num) is easier to calculate than (entry_mark[i].type == MZ_MOVE)
  • (i < entry_num). if false, would invalidate the array access in (entry_mark[i].type == MZ_MOVE)

Running the compiler with -g instead reveals actual differences in the assembly. Feel free to analyze it for meaning, since I am not an assembly expert. Here is an excerpt; the rest is in the attached zip.

@@ -45634,22 +45634,22 @@ zip_central_dir_delete:
 	.loc 2 166 8
 	addl	$1, -24(%rbp)
 .L2898:
-	.loc 2 165 38
+	.loc 2 165 28
+	movl	-24(%rbp), %eax
+	cmpl	-52(%rbp), %eax
+	jge	.L2899
+	.loc 2 165 57 discriminator 1
 	movl	-24(%rbp), %eax
 	cltq
 	leaq	0(,%rax,4), %rdx
 	movq	-48(%rbp), %rax
 	addq	%rdx, %rax
 	movl	(%rax), %eax
-	.loc 2 165 43
+	.loc 2 165 28 discriminator 1
 	testl	%eax, %eax
-	je	.L2899
-	.loc 2 165 43 is_stmt 0 discriminator 1
-	movl	-24(%rbp), %eax
-	cmpl	-52(%rbp), %eax
-	jl	.L2900
+	jne	.L2900
 .L2899:
-	.loc 2 168 9 is_stmt 1
+	.loc 2 168 9
 	movl	-24(%rbp), %eax
 	movl	%eax, -4(%rbp)
 	.loc 2 169 5

diff-4215161-2--e14fef2-g.zip

@kuba--
Copy link
Owner Author

@kuba-- kuba-- commented on 4215161 Sep 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tansy - what's your point? We do not have one assembler, one compiler and one operating system?
You can disagree, I'll not discuss it, it's up to you.

Look at the following silly code:

#include <stdio.h>
#include <stdlib.h>

#define N 10

struct S {
	int n;
};

int main() {
	struct S *arr[N + N] = {0};
	for (int i = 0; i < N; ++i) {
		arr[i] = malloc(sizeof(struct S));
		arr[i]->n = i;
	}

	for (int i = 0; i <= N + N; ++i) {
		if (i < N && arr[i]->n == i) {
			arr[i]->n = 0;
			printf("arr[%d] = %d\n", i, arr[i]->n);
		} else {
			printf("arr[%d] index out of range\n", i);
		}
	}

	for (int i = 0; i < N; ++i) {
		if (arr[i]) {
			free(arr[i]);
		}
	}
	return 0;
}

it prints:

arr[0] = 0
arr[1] = 0
arr[2] = 0
arr[3] = 0
arr[4] = 0
arr[5] = 0
arr[6] = 0
arr[7] = 0
arr[8] = 0
arr[9] = 0
arr[10] index out of range
arr[11] index out of range
arr[12] index out of range
arr[13] index out of range
arr[14] index out of range
arr[15] index out of range
arr[16] index out of range
arr[17] index out of range
arr[18]index out of range
arr[19] index out of range
arr[20] index out of range

Right?

Now, change the ordering in if statements to
if (arr[i]->n == i && i < N) {

What would you get?

arr[0] = 0
arr[1] = 0
arr[2] = 0
arr[3] = 0
arr[4] = 0
arr[5] = 0
arr[6] = 0
arr[7] = 0
arr[8] = 0
arr[9] = 0
[1]    13592 segmentation fault  ./a.out

@tansy
Copy link

@tansy tansy commented on 4215161 Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have one assembler, one compiler and one operating system?

Yes, but every program has to be assembled before it will me compiled into machine code.

Now, change the ordering in if statements to
if (arr[i]->n == i && i < N) {

What would you get?

segmentation fault

You're right. I didn't think it through (afer all, arguments have to be calculated first, before they will be used in expression, here: AND). It took me good few minutes to understand why it (your second and not first example) causes undefined behaviour.
It also implies that compiler doesn't do exactly what's been told, I mean does not perform logical AND. In assembler one can see it clearly - it does two comparisons and that's it. I guess it's how optimization works.

It also implies that in your example (if (i < N && arr[i]->n == i)) you don't mitigate UB either, but count on compiler's optimization; that it will do check i < N first and jmp out before computing second argument arr[i]->n == i.

--

Running the compiler with -g instead reveals actual differences in the assembly

Good point - that's true. I used -O2 as it what is used in real life.

Interestingly enough in kuba--'s example, using -O2 optimization makes compiler to warn about UB,

$ gcc -O2 oub2.c

oub2.c: In function ‘main’:
oub2.c:20:16: warning: iteration 20 invokes undefined behavior [-Waggressive-loop-optimizations]
         if (arr[i]->n == i && i < N) {
             ~~~^~~
oub2.c:18:5: note: within this loop
     for (int i = 0; i <= N + N; ++i) {
     ^~~

but using -O0 doesn't, although executable segfaults anyways.

Please sign in to comment.