Support Embedding visualization #247

cs2be · 2018-02-01T22:48:06Z

projection to 2D graphs
projection to 3D graphs

varunarora · 2018-02-02T20:01:28Z

@Superjomn Could you kindly give concrete examples of such functionality that you would like to see? If you wish to share references to other visualization tools or papers, please do so. Or if you don't have concrete ideas and just want us to figure it out, we could do that too.

jacquesqiao · 2018-02-07T08:46:04Z

@varunarora https://www.tensorflow.org/programmers_guide/embedding this is a good reference. Embedding visualization is very useful to understand word to vector.

jetfuel · 2018-03-27T18:35:16Z

We can possibly divide this issues into smaller tasks

Frontend: How can we plot the chart for the embedding? If we were just to create a demo/first version with just 2D, we can use EChart's Scatter. We can easily put dots and label on it.
Backend or Frontend (This can be on the both side, but if we were to let user switch reduction method, this should be on the frontend): How to reduce the dimensions? Scikit-learn has this function already. We can borrow it in our demo. TSNE can reduce dimensions.
Example:

>>> import numpy as np
>>> from sklearn.manifold import TSNE
>>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
>>> X_embedded = TSNE(n_components=2).fit_transform(X)

Backend: How should we persist the data? We could use the same old trick, where we just do add_record(reduced_dimension_vect1, label1) and repeat over and over. Or we can do add_records(embeddings, labels). and let the frontend to do the transform.

jetfuel · 2018-03-28T21:43:38Z

Here are some thoughts on how to persist the data

We could mimic what Histogram data does. The histogram record is (int step, vector<float> data). We can change the parameters a bit to (string label, vector<float> data) to keep track one single word. We might need to modified the proto format to be able to save the label.
Create a protobuf message, the suggested format will be

message embeddingWord {
    repeated float latent
    string word
}

Con: This will break the convention of using tablet, record and entry

Save the embedding in a separated file, like csv format. This will make the reading and loading to be very easy. But this is not very performant and defeat the purpose of using protobuf

PeterPanZH · 2020-06-28T06:24:58Z

close outdated issue.

cs2be added this to TODO in 0.0.2 Release Feb 1, 2018

varunarora added the needs clarification label Feb 2, 2018

varunarora self-assigned this Feb 5, 2018

jetfuel added this to To do in 0.0.3 Release Mar 15, 2018

jetfuel removed this from To do in 0.0.3 Release Mar 15, 2018

jetfuel self-assigned this Mar 27, 2018

PeterPanZH removed the needs clarification label Jun 28, 2020

PeterPanZH closed this as completed Jun 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Embedding visualization #247

Support Embedding visualization #247

cs2be commented Feb 1, 2018

varunarora commented Feb 2, 2018

jacquesqiao commented Feb 7, 2018

jetfuel commented Mar 27, 2018

jetfuel commented Mar 28, 2018

PeterPanZH commented Jun 28, 2020

Support Embedding visualization #247

Support Embedding visualization #247

Comments

cs2be commented Feb 1, 2018

varunarora commented Feb 2, 2018

jacquesqiao commented Feb 7, 2018

jetfuel commented Mar 27, 2018

jetfuel commented Mar 28, 2018

PeterPanZH commented Jun 28, 2020