Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error about transform #53

Closed
TuDou-PK opened this issue May 24, 2024 · 5 comments
Closed

Error about transform #53

TuDou-PK opened this issue May 24, 2024 · 5 comments

Comments

@TuDou-PK
Copy link

Hi, I think something may be wrong when computing the mapped result after getting matrix pi.

please see the transform code from line 338 to 342 in fugw/src/fugw/mappings/dense.py

  pi.T
  @ source_features_tensor.T
  / pi.sum(dim=0).reshape(-1, 1)

You use $pi^{T} \cdot S^{T}$

But the formula should be $pi \cdot Target$, not source data. Please check the transform code from POT

  transp = self.coupling_ / nx.sum(self.coupling_, axis=1)[:, None]

  # set nans to 0
  transp = nx.nan_to_num(transp, nan=0, posinf=0, neginf=0)

  # compute transported samples
  transp_Xs = nx.dot(transp, self.xt_)

I can show you the proofs based on the application and theory.

Proof 1 - application

Here is an example based on your example Transport distributions using dense solvers

After training and getting the pi, you can show the training points to compare with the mapped points,

# modified from transformed_data = mapping.transform(source_features_test)
transformed_data_train = mapping.transform(source_features_train)

fig = plt.figure(figsize=(4, 4))
ax = fig.add_subplot()
ax.set_title("Source and target features")
ax.set_aspect("equal", "datalim")
ax.scatter(source_features_train[0], source_features_train[1], label="Source")
ax.scatter(target_features_train[0], target_features_train[1], label="Target")
ax.scatter(transformed_data_train[0], transformed_data_train[1], label="trans")
ax.legend()
plt.show()

The plot will be like:

f9857eae-f839-45e6-a382-ff16c82a59a8

You can see the mapped data actually close to source data,
and if you use POT way,

mapped_data_train = np.dot(pi, target_features_train.T) / pi.sum(dim=1).reshape(-1, 1)

fig = plt.figure(figsize=(4, 4))
ax = fig.add_subplot()
ax.set_title("Source and target features")
ax.set_aspect("equal", "datalim")
ax.scatter(source_features_train[0], source_features_train[1], label="Source")
ax.scatter(target_features_train[0], target_features_train[1], label="Target")
ax.scatter(mapped_data_train.T[0], mapped_data_train.T[1], label="trans")
ax.legend()
plt.show()

Then the plot will be:

download

You can see the mapped data close to the target data.

Proof 2 - theory

Here I can show you the result does not make sense.

We assume:

$S_{s}$ means the Source data in the source space. The shape is [3000, 50], which means 3000 points, 50 features of each point in a 50-dim space.

$T_{t}$ means the Target data in the target space. The shape is [9000, 100], which means 9000 points, 100 features of each point in a 100-dim space.

OT matrix pi, the shape is [3000, 9000].

POT code

From POT code, if we want to get the mapped source data in target space $S_{t}$, we can use:

$$S_{T} = pi \cdot T_{t}$$

The $S_{t}$ shape will be [3000, 100], the details of shapes according to the formula before:
$$[3000, 100] = [3000, 9000] \cdot [9000, 100]$$

The source data shape from [3000, 50] in the 50-dim space map to [3000, 100] in the 100-dim space, the point number does not change. Each point just moves from the 50-dim space to the 100-dim space.

So the explanation of the OT algorithm is:

OT algorithm can map the data from the source space to the target space, without point number change.

FUGW code

According to FUGW code, if we want to get the mapped source data in target space $S_{t}$, we can use:

$$S_{T} = pi^{T} \cdot S_{s}$$

The $S_{t}$ shape will be [3000, 100], and the details of shapes according to the formula will be:
$$[9000, 50] = [9000, 3000] \cdot [3000, 50]$$

So the source data from [3000, 50] in the 50-dim map to [9000, 50] is still in 50-dim space, the data not in the 100-dim target space! It does not make sense!

Please let me know if I was wrong :)

Btw, thanks a lot for the contribution to FUGW, it helps me a lot.

@bthirion
Copy link
Collaborator

@pbarbarant @alexisthual can you take a look ?

@pbarbarant
Copy link
Collaborator

Thanks @TuDou-PK for your interest in our repo!

FUGW basically consists in solving a Gromov-Wasserstein problem between graphs, leading to weighted assignments between source and target nodes. The problem thus differs from a domain adaptation setting as there is no geometric displacement of the points.

The idea of mapping.transform is to transport a fresh set of features from the source to the target space using a previously fitted transport plan.

Going back to your example, if you have $50$ new features (for example fMRIs contrast maps) that you want to send from a graph with $3000$ nodes to a graph supported over $9000$ nodes, you effectively want your transformed data to take the shape $[9000, 50]$.

The notion of ambient dimensions in which each graph is embedded is only relevant for the computation of the geometric cost and is not related to the number of features. @alexisthual It might be worth to emphasize this in the documentation.

I hope my explanation is clear :)

@TuDou-PK
Copy link
Author

TuDou-PK commented May 27, 2024

@pbarbarant Thanks for your reply, please help me to confirm if I understand correctly:

In your graph application, your aim is to map the 3000-node graph to the 9000-node graph, then you can do further process to compare with the mapped 9000-node graph and the target 9000-node graph. As for the node features, no matter if it's 50-dim or 100-dim, you don't care, right? Your aim only cares about the node(graph) geometry information.

Ah! Maybe I got you, that's why in your code, the array shape is like [position, node_number], then actually you treat the "node_number" as the feature, that looks make sense.

So the conclusion is POT and FUGW all are not wrong, just the aim is different?

@pbarbarant
Copy link
Collaborator

I believe the easiest example is an image where the underlying graph is the 2D grid and over each pixel (node of the graph) you will have a set of features which corresponds to the three RGB channels for example.

In POT's DA example, you would only work in the RGB space, thus your RGB points would move geometrically in this color space, therefore you would get a mapping between colors.

With FUGW, you also take into account the graph geometry between pixels, thus a slice of your data array at pixel $i$ would look like $[\textbf{red}_i, \textbf{green}_i, \textbf{blue}_i]$ and the total array has a size [number_of_pixel, 3], therefore you get a mapping between pixels.

@TuDou-PK
Copy link
Author

OK, Thanks for your explanation 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants