Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebased version the PR to add support for ConvTranspose layers #844

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

jmitrevs
Copy link
Contributor

@jmitrevs jmitrevs commented Aug 3, 2023

Description

This is a rebased version of #644. Please look there for details, but in summary it adds support for both io_stream and io_parallel compilation of Conv1DTranspose and Conv2DTranspose.

Type of change

  • New feature (non-breaking change which adds functionality)

Tests

Still a to-do.

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

Jonathan-Shoemaker and others added 3 commits August 2, 2023 18:54
add new files for conv1dtranspose resource

clean up so that conv code is reached. Still need to get the actual implementation matching keras

implement conv1dtranspose super inefficiently (gets correct answer though)

try to fix indices to make code work

make the c code work for conv1dtranspose

reduce weight dimensions to properly reflect transposed kernel size

clean up so that transpose filter width is passes around from config

fix code such that simple transpose layer gets synthesized

move variables out of loops, optimize slightly and add in alternative method of computation to compute by kernel (that option is not optimized as of now)

add in conv1d transpose linebuffer format code. seems to work, unsure of if it is optimized yet

trying to fix stream behavior

get transpose compilation working mostly as expected. weird jump in latency from reuse 1 to 2 still exists

initial conv2dtranspose addition. Output is permuted as of now.

output in correct order. using large array to buffer output though

fix up conv1dtranspose a bit to pad correctly. fix up stream instructions for both 1d and 2d transposes

fix allowed reuse factors for transpose layers

update to new conv methods for io_parallel. Still some issues with multiple filters as well as some padding issues

clean up error with multiple filters and larger kernels

optimize conv transpose resource to get it working reasonably well. may still have slight optimization left

fix output to conv1d transpose resource

add conv2dtranspose io_parallel implementation. Can still be optimized

small changeup to data storage in conv1d parallel

fix zero padding pass addition for transpose stream layers

move transposing of weight matrix to resource_strategy for transpose layers

change how stream loads in weights to be like parallel for conv transposes. unroll all stride steps completely

 fix output of 1d transpose parallel to be faster

change 1d transpose weight input to be 2-dimensional (passed from python code)

change 2d transpose weight input to be 3-dimensional (passed from python code)

small changes to transposes

Revert "fix nondefault project name handling (#626)". The commit breaks the Vivado Accelerator workflow, and the fix is unclear to me right now.

This reverts commit e8f048a.

steps towards getting integer inputs to work
@jmitrevs jmitrevs marked this pull request as draft August 3, 2023 23:01
@vloncar
Copy link
Contributor

vloncar commented Aug 7, 2023

Why do we need multidimensional weights here? Passing pointers doesn't work as expected? I like the idea, we'll need it in the future for Vitis, but the keep_dims approach feels very clunky.

Comment on lines +476 to +488
# (W,F,C) => (F,W,C)
data = np.transpose(data, axes=[1, 0, 2])
# now split the kernel into stride width kernels (F, W, C) -> (S, F, W/S, C)
n_filts, kern_width, n_chan = data.shape
new_weights = np.zeros((self.attributes['stride_width'], n_filts, self.attributes['trfilt_width'], n_chan))
for i_sw in range(self.attributes['stride_width']):
for i_fw in range(self.attributes['trfilt_width']):
filt_ind = i_sw + (self.attributes['trfilt_width'] - i_fw - 1) * self.attributes['stride_width']
for i_nf in range(n_filts):
for i_nc in range(n_chan):
if filt_ind < kern_width:
new_weights[i_sw][i_nf][i_fw][i_nc] = data[i_nf][filt_ind][i_nc]
data = new_weights
Copy link
Member

@jmduarte jmduarte Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vloncar I think the "need" for the multidimensional weights comes from this transpose. I think this can still be done even if all the weights are kept single dimensional, but maybe it's less "clean" or "obvious" what's going on?

maybe @Jonathan-Shoemaker can also comment more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and @Jonathan-Shoemaker's reply:

#644 (comment)

What is the meaning of "keep_dims"?

keep_dims keeps the weight matrix from being entirely flattened, keeping the first "keep_dims" dimensions as is, flattening along all other dimensions.

The reason for this is that the ConvTranspose is computed as the interleaving of "stride" number of Conv layers. The dimensions kept are for indexing into these different Conv layers. The idea was that the weight matrix of a ConvTranspose layer can be thought of as a disjoint set of weight matrices for Conv layers and treating it as such was easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to essentially consistently use disjoint submatrices of the weight matrix for sub-computations, and interleave the results from these sub-computations. My understanding is the alternative would be to pass large slices of the matrix. I can't remember if I tested this, but assume it would work, I think I just found it messier.

@jmitrevs
Copy link
Contributor Author

I pushed a first version of a test and tried to make initial fixes, though there are many remaining errors.

@jmitrevs
Copy link
Contributor Author

Looking at the new test, the results see to be:

y_keras.shape=(10, 7, 7, 4)
y_hls4ml.shape=(10, 100)

and model_default_t w2[1][1][108]; seem to be the weight dimensions. Will need to investigate further.

@jmitrevs jmitrevs added this to the v0.8.0 milestone Aug 11, 2023
@jmitrevs
Copy link
Contributor Author

The output dimensions inconsistency is for padding = "valid". For "same" the output is flattened but there are the correct number of values. However, they still don't match.

@jmitrevs
Copy link
Contributor Author

I am having trouble getting this to work. I wrote a python implementation of conv2d_transpose that works and I think I understand the im2col style io_parallel implementation that's here, but it's not coming together. The keep_dim weights that are written out actually look strangely reordered. I may branch and try to use the code here while dropping the keep_dim concept.

@Jonathan-Shoemaker
Copy link
Contributor

I am having trouble getting this to work. I wrote a python implementation of conv2d_transpose that works and I think I understand the im2col style io_parallel implementation that's here, but it's not coming together. The keep_dim weights that are written out actually look strangely reordered. I may branch and try to use the code here while dropping the keep_dim concept.

One of the purposes of keep_dims is that the weights are purposefully written in a different order than in the original order of the conv_transpose weight matrix. Purpose of this is that the kernel of conv transpose is better computed as separate kernels of smaller conv layers. These kernels are within the transpose kernel but are not continuous in the normal transpose matrix output.

@jmitrevs
Copy link
Contributor Author

With these few changes conv2dtranspose seems to work for io_parallel. @Jonathan-Shoemaker, do you want to take a look at why the other cases in the pytest are failing?

@jmitrevs
Copy link
Contributor Author

Actually, the conv1dtranspose io_parallel test was due to the pytest being misconfigured, but the io_stream tests are still failing.

@jmitrevs
Copy link
Contributor Author

jmitrevs commented Sep 1, 2023

I think the issue is with padding. With padding="same" it seems to work well.

@jmitrevs
Copy link
Contributor Author

jmitrevs commented Sep 1, 2023

Also needed is checking for valid reuse factors for conv transpose.

@jmitrevs jmitrevs modified the milestones: v0.8.0, v1.0.0 Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants