Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Embedding visualization #247

Closed
cs2be opened this issue Feb 1, 2018 · 5 comments
Closed

Support Embedding visualization #247

cs2be opened this issue Feb 1, 2018 · 5 comments
Assignees

Comments

@cs2be
Copy link
Collaborator

cs2be commented Feb 1, 2018

  • projection to 2D graphs
  • projection to 3D graphs
@cs2be cs2be added this to TODO in 0.0.2 Release Feb 1, 2018
@varunarora
Copy link

@Superjomn Could you kindly give concrete examples of such functionality that you would like to see? If you wish to share references to other visualization tools or papers, please do so. Or if you don't have concrete ideas and just want us to figure it out, we could do that too.

@jacquesqiao
Copy link
Member

@varunarora https://www.tensorflow.org/programmers_guide/embedding this is a good reference. Embedding visualization is very useful to understand word to vector.

@jetfuel jetfuel added this to To do in 0.0.3 Release Mar 15, 2018
@jetfuel jetfuel removed this from To do in 0.0.3 Release Mar 15, 2018
@jetfuel jetfuel self-assigned this Mar 27, 2018
@jetfuel
Copy link
Collaborator

jetfuel commented Mar 27, 2018

We can possibly divide this issues into smaller tasks

  1. Frontend: How can we plot the chart for the embedding? If we were just to create a demo/first version with just 2D, we can use EChart's Scatter. We can easily put dots and label on it.

  2. Backend or Frontend (This can be on the both side, but if we were to let user switch reduction method, this should be on the frontend): How to reduce the dimensions? Scikit-learn has this function already. We can borrow it in our demo. TSNE can reduce dimensions.
    Example:

>>> import numpy as np
>>> from sklearn.manifold import TSNE
>>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
>>> X_embedded = TSNE(n_components=2).fit_transform(X)
  1. Backend: How should we persist the data? We could use the same old trick, where we just do add_record(reduced_dimension_vect1, label1) and repeat over and over. Or we can do add_records(embeddings, labels). and let the frontend to do the transform.

@jetfuel
Copy link
Collaborator

jetfuel commented Mar 28, 2018

Here are some thoughts on how to persist the data

  1. We could mimic what Histogram data does. The histogram record is (int step, vector<float> data). We can change the parameters a bit to (string label, vector<float> data) to keep track one single word. We might need to modified the proto format to be able to save the label.

  2. Create a protobuf message, the suggested format will be

message embeddingWord {
    repeated float latent
    string word
}

Con: This will break the convention of using tablet, record and entry

  1. Save the embedding in a separated file, like csv format. This will make the reading and loading to be very easy. But this is not very performant and defeat the purpose of using protobuf

@PeterPanZH
Copy link
Collaborator

close outdated issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

5 participants