Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composite Caption LineList Creation ? #38

Open
nmonet opened this issue Aug 17, 2023 · 1 comment
Open

Composite Caption LineList Creation ? #38

nmonet opened this issue Aug 17, 2023 · 1 comment

Comments

@nmonet
Copy link

nmonet commented Aug 17, 2023

Hi ! Thanks for your Disco paper and explanation for the TSV file preparation.

In the composite yaml file, you have a 'caption linelist' file which is used.
caption_linelist: train_TiktokDance-coco-single_person-Lindsey_0411_youtube-SHHQ-1.0-deepfashion2-laion_human-masks-single_cap.caption.linelist.tsv
Could you explain how you make this file ?

@uk9921
Copy link

uk9921 commented Oct 19, 2023

I looked into xxx.caption.linelist.tsv file, which only has two columns.

0	0
1	0
2	0
3	0
4	0
5	0
6	0
7	0
8	0
9	0
10	0
11	0
12	0
...

Please refer to the code here.
https://github.com/Wangt-CN/DisCo/blob/main/dataset/tsv_dataset.py#L490

if self.is_composite:
    assert op.isfile(self.cap_linelist_file)
    self.cap_line_list = [
        int(row[2]) for row in tsv_reader(self.cap_linelist_file)]
    self.img_line_list = [i for i in range(len(self.cap_line_list))]
elif self.cap_linelist_file:
    line_list = load_box_linelist_file(self.cap_linelist_file)
    self.img_line_list = line_list[0]
    self.cap_line_list = line_list[1]
else:
    # one caption per image/video
    self.img_line_list = [i for i in range(self.cap_tsv.num_rows())]
    self.cap_line_list = [0 for i in range(self.cap_tsv.num_rows())]

On the left side, there is a column called img_line_list which seems to range from 0 to the number of data rows. On the right side, there is a column called cap_line_list that remains as 0.

Since there is no official documentation provided regarding xxx.caption.linelist, it seems that we only need to make a simple modification here:

if self.is_composite:  # false
    assert op.isfile(self.cap_linelist_file)
    self.cap_line_list = [
        int(row[2]) for row in tsv_reader(self.cap_linelist_file)]
    self.img_line_list = [i for i in range(len(self.cap_line_list))]
elif self.cap_linelist_file:  # official data training in
    line_list = load_box_linelist_file(self.cap_linelist_file)
    self.img_line_list = line_list[0]
    self.cap_line_list = line_list[1]
elif self.cap_tsv is not None:
    # one caption per image/video
    self.img_line_list = [i for i in range(self.cap_tsv.num_rows())]
    self.cap_line_list = [0 for i in range(self.cap_tsv.num_rows())]
else:  # w/o cap_line and cap_tsv, user data training in,
    self.img_line_list = [i for i in range(self.visual_tsv.num_rows())]
    self.cap_line_list = [0 for i in range(self.visual_tsv.num_rows())]

I will give it a try first, and if there is any progress, I will update it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants