Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions to understand this repo well :D #2

Open
FormerAutumn opened this issue Mar 2, 2021 · 16 comments
Open

Questions to understand this repo well :D #2

FormerAutumn opened this issue Mar 2, 2021 · 16 comments

Comments

@FormerAutumn
Copy link

First, thank you for your great work :D

Question is like the title asked.

Using the centroids, videos are tokenized and text captions are punctuated. Using the timestamps for each caption, video ids are extracted and paired with the text captions in the training data file. Captions can be found here: https://www.rocq.inria.fr/cluster-willow/amiech/howto100m/.

In short, which file should I download if I want to match the 'Captions'(appears in the last sentence of quotes) ? I have downloaded the howto100m_captions.zip(2.3G), is it correct ? I saw the files in it are all .csv file ;_;
What's more, Has anyone run the repo which the author said that he/she inspired by ? the repo is https://github.com/MDSKUL/MasterProject

@ammesatyajit
Copy link
Owner

Hi, so what I used was the raw_caption_superclean.json for the captions file, which you can download with the raw caption zip file I believe. I did run the other repo you mentioned, but I couldn't get any visual results, I only got the quantitative results the author did. Also, just a tip: try to increase the model size as much as possible and run it on a GPU for the best results.

Thanks for your interest!

@FormerAutumn
Copy link
Author

@ammesatyajit Thank you for your reply !!!
As some personal reasons, I wanna run the other repo(masterproject) more. And these days I always keep your repo(this and the hkmeans repo you owned) comparing with the masterproject repo. You really did a good job!
If met problems later, may I bother you again ? (or maybe again and again :( )

@ammesatyajit
Copy link
Owner

@FormerAutumn Sure! Im happy to answer any questions you have.

@FormerAutumn
Copy link
Author

FormerAutumn commented Mar 3, 2021

@ammesatyajit Thanks for your kindness !

Do you know where to get the 'data/newest-data-max-len-20.npy' in https://github.com/MDSKUL/MasterProject/blob/master/stap5/globals.py ?
(I scan all the urls the author mentioned and only find the centers.npy has the .npy suffix)

Can I get your email or other contact ways ? Or just make problems here is ok ? If you will provide your contact way, you can email to cnzsy98@163.com. Thank you so much :D

@FormerAutumn
Copy link
Author

Hi, sorry to disturb you.
In your inference.py, you define a function named 'text_next_tok_pred' .
Does it take some video clips as input and the choose some center image according to output to make the whole sequence completely (I know that you just choose first 5 images to do visualization)? (correct me if I was wrong)

@FormerAutumn FormerAutumn changed the title What's the specific name of 'Captions' mentioned in step4? Questions to understand this reop well :D Mar 3, 2021
@FormerAutumn FormerAutumn changed the title Questions to understand this reop well :D Questions to understand this repo well :D Mar 3, 2021
@ammesatyajit
Copy link
Owner

So for the text next token prediction, there is no video involved, and I am just using the model for next word prediction in a sentence (similar to GPT). This was useful as a sanity check to see if the model gained useful information which I later built on when I tested it on video (Note that I haven't added more inference functionality yet and it is relatively simple to do so). Hope that answers your question

@ammesatyajit
Copy link
Owner

@ammesatyajit Thanks for your kindness !

Do you know where to get the 'data/newest-data-max-len-20.npy' in https://github.com/MDSKUL/MasterProject/blob/master/stap5/globals.py ?
(I scan all the urls the author mentioned and only find the centers.npy has the .npy suffix)

Can I get your email or other contact ways ? Or just make problems here is ok ? If you will provide your contact way, you can email to cnzsy98@163.com. Thank you so much :D

@FormerAutumn Sorry for not replying earlier. I believe the author links the google drive file that you ask for. You can definitely contact me by email, my email is ammesatyajit@gmail.com.

@FormerAutumn
Copy link
Author

FormerAutumn commented Mar 4, 2021

Thank you for your consistent reples :D
I'm now try to understanding your trainning pipeline as I found that your code seems more clear to me compared to the masterproject (the other repo).
I'm so sorry that I wrote the wrong name, the function actually I wanna ask is 'video_next_tok_pred'......
In short I wanna ask is that, does it get in a batch of video clips(and tok_embed embed them use the cluster center('s embedding) which them belong to?) and output logits which you use to choose a corresponding clsuter center ?
If so, why not just use the video's cluster centers as seq(the model input) ?

I transfer this method to implement a idea occurs to me(doesn't work now, I might show it github in the future if possible)

@ammesatyajit
Copy link
Owner

So video_next_tok_pred takes in the tokens from the validation set. It doesn't take in video clips. Hope that answers your question.

@joaanna
Copy link

joaanna commented Mar 17, 2021

hey, great work! I am also trying to understand your code better. In Video Bert and also the parameters used here you take 4 HIERARCHIES and 12 clusters. The paper says that yields 12**4 = 20736 clusters, but in this code in README you mention concatenating the centroids, and then the label_data labels features by the closest centroid. Wouldn't that yield 124 clusters, effectively 124 video tokens? How does it become 20736 clusters?

@ammesatyajit
Copy link
Owner

Hi, sorry if the readme was slightly confusing. The 20736 centroids were stored in separate files due to the hierarchical k-means. The only purpose of concatenating them was so I could access all of the centroids with one file. the label data takes in the video feature vectors and finds the closest of these 20736 centroids to effectively tokenized each video. Hope that clears up any confusion.

@joaanna
Copy link

joaanna commented Mar 17, 2021

That makes sense, thank you!

Another question, I am able to run the clustering with this command:
python3 -m hkmeans_minibatch -r features -p ft_hp -b 40 -s vecs_dir -c centroid_dir -hr 2 -k 12 -e 1
which yield 12 clusters each of shape 12, feature dimension.
But when tuning the k and hr parameters I run into different issues:
for : python3 -m hkmeans_minibatch -r features -p ft_hp -b 40 -s vecs_dir2r -c centroid_dir2 -hr 3 -k 15 -e 1
I get this error:
Traceback (most recent call last): File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/__main__.py", line 39, in <module> main() File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/__main__.py", line 35, in main hkmeans(root, prefix, h, k, batch_size, epochs, save_dir, 'vecs', centroid_dir) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 99, in hkmeans hkmeans_recursive(root, prefix, h, k, batch_size, epochs, save_dir, save_prefix, centroid_dir) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 91, in hkmeans_recursive save_prefix.format(i), centroid_dir, cur_h=cur_h + 1) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 91, in hkmeans_recursive save_prefix.format(i), centroid_dir, cur_h=cur_h + 1) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 94, in hkmeans_recursive centroids, labelled_data = minibatch_kmeans(root, prefix, k, batch_size, epochs) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 40, in minibatch_kmeans labelled_data[path] = list(kmeans.predict(np.load(path))) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/sklearn/cluster/_kmeans.py", line 1913, in predict check_is_fitted(self) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/sklearn/utils/validation.py", line 72, in inner_f return f(**kwargs) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/sklearn/utils/validation.py", line 1019, in check_is_fitted raise NotFittedError(msg % {'name': type(estimator).__name__}) sklearn.exceptions.NotFittedError: This MiniBatchKMeans instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

for python3 -m hkmeans_minibatch -r features -p ft_hp -b 60 -s vecs_dir2r -c centroid_dir2 -hr 3 -k 15 -e 1
Traceback (most recent call last): File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/__main__.py", line 39, in <module> main() File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/__main__.py", line 35, in main hkmeans(root, prefix, h, k, batch_size, epochs, save_dir, 'vecs', centroid_dir) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 99, in hkmeans hkmeans_recursive(root, prefix, h, k, batch_size, epochs, save_dir, save_prefix, centroid_dir) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 91, in hkmeans_recursive save_prefix.format(i), centroid_dir, cur_h=cur_h + 1) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 87, in hkmeans_recursive save_sorted_vectors(centroids, labelled_data, batch_size, save_dir, save_prefix) File "/Users/joanna/miniconda3/envs/lola/lib/python3.6/site-packages/hkmeans_minibatch/hkmeans.py", line 56, in save_sorted_vectors sorted_vecs.append(np.expand_dims(vectors[j], axis=0))

Should the hr and k be in some relation to the batch size?

@FormerAutumn
Copy link
Author

FormerAutumn commented Mar 18, 2021

@ammesatyajit sorry for my late reply. Thank you for your kindness, I reeeeeeee-read the VideoBERT and found that seems ViT model is more similar to what I wanna implement, so I turn to ViT. :D

@ammesatyajit
Copy link
Owner

@joaanna Sorry for not replying earlier. I am not going to be able to provide a detailed response because I am a little busy at the moment due to personal reasons, but if you want to, you can read the code/the docs for my hkmeans code: https://github.com/ammesatyajit/hierarchical-minibatch-kmeans. I will try to reproduce your error as soon as possible and get back to you on what the problem is. Also, could I ask for you to tell me the dimensions of your input data files? The batch size should ideally be larger than the number of vectors in each input file. For example, I used a batch size of 500 when I did hkmeans on files with 20 vectors each.

@ammesatyajit
Copy link
Owner

@FormerAutumn no problem. Vision transformer is really interesting, hope you find what you are looking for :)

@harshraj32
Copy link

@joaanna can u share the data u downloaded the site is down or something i am unable to download the cooking videos data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants