Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] questions on detailed about dataset #2

Closed
Kin-Zhang opened this issue Mar 23, 2022 · 30 comments
Closed

[Question] questions on detailed about dataset #2

Kin-Zhang opened this issue Mar 23, 2022 · 30 comments

Comments

@Kin-Zhang
Copy link
Contributor

Kin-Zhang commented Mar 23, 2022

Thanks for providing the codes, it's amazing work. 🤩

Here are some questions after I read the paper and code readme:

  1. Is the dataset provided in the repo the whole dataset as in the paper Table 7 which has 399776 frames?
  2. Is the expert agent from lbc repo as the link is shown here? or as the paper said the CARLA behavior agent as link shown here? Since I didn't see the code for collecting the dataset.
  3. As the dataset said in the paper is collected in all towns, as I knew leaderboard official public routes just have Town01-06 routes file? and I didn't find any additional routes files in this repo? would you mind to release that the route file you collected for data? since if we want to compare with the method it should have the same training routes for fair. If not, the all towns in the paper said, is that includes other towns like Town07 and Town10HD as said in paper table 7? or you also build another map for training?

Looking forward to your reply, and thanks again for this paper and codes.

@dotchen
Copy link
Owner

dotchen commented Mar 23, 2022

Thank you for your interest in our project.

  1. Yes
  2. It is a slightly modified version of the behavior agent.
  3. All towns include Town07 and Town10HD. You may follow these instructions to install them in CARLA.

@Kin-Zhang
Copy link
Contributor Author

Thank you for your interest in our project.

  1. Yes
  2. It is a slightly modified version of the behavior agent.
  3. All towns include Town07 and Town10HD. You may follow these instructions to install them in CARLA.

Thanks for replying, and question 3 is about the route file, I knew there are additional maps but the CARLA leaderboard public routes didn't provide the route files about these two. That's why I am curious about it.

Thanks again.

@dotchen
Copy link
Owner

dotchen commented Mar 23, 2022

We use randomized routes the collect our dataset. This is similar to our previous project World on Rails.

@Kin-Zhang
Copy link
Contributor Author

Kin-Zhang commented Mar 23, 2022

We use randomized routes the collect our dataset. This is similar to our previous project World on Rails.

Oh, I see. I will try to check again. Thanks for answering. Really appreciate.


Related to: issue comment

@Kin-Zhang
Copy link
Contributor Author

Sorry to bother you, is there any method that I can download the dataset through python script without box account ( I searched for a while which need box SDK and login as subscribed account)
Since when I click the page download... It cannot download at once.
image

@dotchen
Copy link
Owner

dotchen commented Mar 23, 2022

Yes, I also just realized this issue and I am currently compressing the trajectories into .gz files. I will upload them in a few hours.

@Kin-Zhang
Copy link
Contributor Author

ha! thanks for replying so quickly.

btw, I notice even for the business box it still have the maximum upload size which is 150G here

@dotchen
Copy link
Owner

dotchen commented Mar 23, 2022

I will split the gz file so each file is around 8G

@Kin-Zhang
Copy link
Contributor Author

I will split the gz file so each file is around 8G

Thanks! and is that possible for once download button to download all? will appear the error as shown above the selected items exceed the download size limit?

@dotchen
Copy link
Owner

dotchen commented Mar 23, 2022

Yes should be possible, the files split format are going to be same as the world on rails dataset, except the format is in lmdb and can be directly used with the repo. If not separating them as like two downloads should also work.

@Kin-Zhang
Copy link
Contributor Author

Kin-Zhang commented Mar 24, 2022

Yes should be possible, the files split format are going to be same as the world on rails dataset, except the format is in lmdb and can be directly used with the repo. If not separating them as like two downloads should also work.

Thanks! I will wait for updating. Thanks agin!

@Watson52
Copy link

Yes should be possible, the files split format are going to be same as the world on rails dataset, except the format is in lmdb and can be directly used with the repo. If not separating them as like two downloads should also work.

Hi, may I ask how to open the .mdb file? I want to have a look on the data.

@Kin-Zhang
Copy link
Contributor Author

Kin-Zhang commented Mar 24, 2022

Hi, may I ask how to open the .mdb file? I want to have a look on the data.

you should read code on dataset here, the repo show how to see data. For example:

for full_path in glob.glob('{}/**'.format(self.data_dir)):
# Toss a coin
if np.random.random() > self.percentage_data:
continue
txn = lmdb.open(
full_path,
max_readers=1, readonly=True,
lock=False, readahead=False, meminit=False).begin(write=False)
n = int(txn.get('len'.encode()))
town = str(txn.get('town'.encode()))[2:-1]

@dotchen
Copy link
Owner

dotchen commented Mar 25, 2022

Hi @Kin-Zhang , the dataset is now on box: https://utexas.box.com/s/fcj52g9juilnp4mt5k5fsqcqkxae77cb

Let me know if you encounter any issues downloading or using the dataset. Thanks!

@Watson52
Copy link

Watson52 commented Mar 25, 2022

Hi @Kin-Zhang , the dataset is now on box: https://utexas.box.com/s/fcj52g9juilnp4mt5k5fsqcqkxae77cb

Let me know if you encounter any issues downloading or using the dataset. Thanks!

Hi @dotchen, I also see the new dataset link, thank you very much! btw, may I ask how long did it take to train LAV on 4 Titan Pascal? And how long will it take to collect the data for about 400K frames?

@penghao-wu
Copy link

Hi, I wonder how I could decompress the downloaded files? I have removed the postfix, but it says that they are not in gzip format.

@dotchen
Copy link
Owner

dotchen commented Mar 25, 2022

Hi @Watson52 ,

Each stage takes different amount of time, but they are all around 2-3 days of time with 4 Titan pascal. It might be faster if you have better GPUs.

@dotchen dotchen reopened this Mar 25, 2022
@dotchen
Copy link
Owner

dotchen commented Mar 25, 2022

Hi, I wonder how I could decompress the downloaded files? I have removed the postfix, but it says that they are not in gzip format.

The files are split tar.gz files, please download them all and then decompress them, no need to remove the postfix.

@Kin-Zhang
Copy link
Contributor Author

Kin-Zhang commented Mar 26, 2022

Here is stack answer for how to extract split file

you can try following command:

zcat LAV-full.gz.* | tar -x

processing figure:
image

@Kin-Zhang
Copy link
Contributor Author

Kin-Zhang commented Mar 27, 2022

Hi @dotchen. I have some questions before training:

  1. [num of epoch on every step] there are four steps to the training. and does the epoch shows the same config with your train result on paper in this repo? Since I saw the train_eg.py just use the num of the epoch as 1 as default, it's quite weird. I'd like to know this setting on your training since I didn't see this config setting in the paper appendix like you illustrate other hyperparameters on training.

    If possible, could you please tell me the num epoch you set for these four training?

  2. [lidar sem data] the step of Point Painting, is to write the lidar_sem_ to the dataset, is the dataset provided here already have the lidar_sem data?

@dotchen
Copy link
Owner

dotchen commented Mar 27, 2022

[num of epoch on every step] there are four steps to the training. and does the epoch shows the same config with your train result on paper in this repo? Since I saw the train_eg.py just use the num of the epoch as 1 as default, it's quite weird. I'd like to know this setting on your training since I didn't see this config setting in the paper appendix like you illustrate other hyperparameters on training.

The provided weight is the 45 DS entry in the ablations. The number in the file names corresponds to the number of epoch they are trained in.

[lidar sem data] the step of Point Painting, is to write the lidar_sem_ to the dataset, is the dataset provided here already have the lidar_sem data?

Yes, it is already provided in the released dataset.
EDIT: Please use the point painting script

@Kin-Zhang
Copy link
Contributor Author

Kin-Zhang commented Mar 28, 2022

The provided weight is the 45 DS entry in the ablations.

Thanks for letting me know.

Yes, it is already provided in the released dataset.

        self.seg_model = RGBSegmentationModel(self.seg_channels).to(self.device)
        self.seg_model.load_state_dict(torch.load(self.seg_model_dir, map_location=self.device))
        self.seg_model.eval()

@dotchen is this seg model use the seg_1.th with only one epoch trained?


[lidar sem data] the step of Point Painting, is to write the lidar_sem_ to the dataset, is the dataset provided here already have the lidar_sem data?

I also found the whole datasets may not provide all lidar_sem data in it since when I tried trained all towns here lack of NoneType, just check with you to ensure I didn't miss something:

  File "/LAV/lav/utils/datasets/lidar_painted_dataset.py", line 27, in __getitem__
    lidar_painted = self.__class__.access('lidar_sem', lmdb_txn, index, 1).reshape(-1,len(self.seg_channels))
  File "/LAV/lav/utils/datasets/basic_dataset.py", line 83, in <listcomp>
    return np.stack([np.frombuffer(lmdb_txn.get((f'{tag}_{t:05d}{suffix}').encode()), dtype) for t in range(index,index+T)])
TypeError: a bytes-like object is required, not 'NoneType'

@dotchen
Copy link
Owner

dotchen commented Mar 29, 2022

I will have to take a deeper look. In the mean-time you can relabel the dataset by running the point painting script.

@dotchen
Copy link
Owner

dotchen commented Mar 29, 2022

Ok looks like I might have overwritten some of the lmdb while testing the refactored code, causing it to miss some of the frames...
Please relabel the dataset by running the point painting script, sorry for inconvenience.

@Kin-Zhang
Copy link
Contributor Author

Thanks for letting me know.

@keishihara
Copy link

Thanks for all discussions done here. It was very helpful to catch up the work.
I've read through them and here are some trivial comments for those working on this repo.

  1. Extracting separate gzip files with zcat LAV-full.gz.* | tar -x didn't work for me. Instead, just cat LAV-full.gz.* | tar -xz worked.
  2. I found that when loading datasets atefbmmouv causes core dump, therefore I had to skip that one to dismiss this error mentioned here
File "/LAV/lav/utils/datasets/lidar_painted_dataset.py", line 27, in __getitem__
  lidar_painted = self.__class__.access('lidar_sem', lmdb_txn, index, 1).reshape(-1,len(self.seg_channels))
File "/LAV/lav/utils/datasets/basic_dataset.py", line 83, in <listcomp>
  return np.stack([np.frombuffer(lmdb_txn.get((f'{tag}_{t:05d}{suffix}').encode()), dtype) for t in range(index,index+T)])
TypeError: a bytes-like object is required, not 'NoneType'

@Kin-Zhang
Copy link
Contributor Author

I fixed the problem, I forgot to do a pull request that can let others know.
Here is the commit: Kin-Zhang@fa045b6
@keishihara

@keishihara
Copy link

@Kin-Zhang Thank you for your comment!
I will check it out :)

@JianLiMech
Copy link

Thanks for all discussions done here. It was very helpful to catch up the work. I've read through them and here are some trivial comments for those working on this repo.

1. Extracting separate gzip files with `zcat LAV-full.gz.* | tar -x` didn't work for me. Instead, just `cat LAV-full.gz.* | tar -xz` worked.

2. I found that when loading datasets `atefbmmouv` causes core dump, therefore I had to skip that one to dismiss this error mentioned [here](https://github.com/dotchen/LAV/issues/2#issuecomment-1080442621)
File "/LAV/lav/utils/datasets/lidar_painted_dataset.py", line 27, in __getitem__
  lidar_painted = self.__class__.access('lidar_sem', lmdb_txn, index, 1).reshape(-1,len(self.seg_channels))
File "/LAV/lav/utils/datasets/basic_dataset.py", line 83, in <listcomp>
  return np.stack([np.frombuffer(lmdb_txn.get((f'{tag}_{t:05d}{suffix}').encode()), dtype) for t in range(index,index+T)])
TypeError: a bytes-like object is required, not 'NoneType'

Hello thank you for your information!

I had a problem when I tried to download and extract the dataset here

I downloaded 2 parts of the copressed file (16GB), and ran zcat LAV-full.gz.* | tar -x and cat LAV-full.gz.* | tar -xz, but it didn't work. The error was:


gzip: LAV-full.gz.ab: not in gzip format

gzip: LAV-full.gz.ac: not in gzip format
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors

Do you know how I should extract the file?

@Kin-Zhang
Copy link
Contributor Author

Do you know how I should extract the file?

You must download all the dataset, only have two of them, cannot extract correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants