Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polygonize mask - Inria dataset #9

Closed
sarathsrk opened this issue Oct 12, 2020 · 19 comments
Closed

Polygonize mask - Inria dataset #9

sarathsrk opened this issue Oct 12, 2020 · 19 comments

Comments

@sarathsrk
Copy link

From your answer of the "training flowline" issue, I have understood the following,

  1. Masks in .png format can be converted to .geojson files and it can be used for training
  2. polygonize_mask.py script can be used to convert images masks to polygon masks in .geojson format

When I run polygonize_mask.py script, it asks for run_name to convert binary masks into polygon masks. Can you please provide trained weights to do this operation?

@Lydorn
Copy link
Owner

Lydorn commented Oct 12, 2020

Well the easiest thing to do is download the already converted polygons. But I can upload the trained model to perform this operation tomorrow if you wish.

@sarathsrk
Copy link
Author

sarathsrk commented Oct 12, 2020

@Lydorn Yes thats right if I am about to re-train the same data. But I would like to combine and train the other open sourced datasets using this method. Please upload that model and share the URL. Thanks!!

@sarathsrk
Copy link
Author

@Lydorn Is it possible to upload trained model today? As soon as I get it I will start training a model using currently available GPU resources

@Lydorn
Copy link
Owner

Lydorn commented Oct 13, 2020

I tried to upload it today but I cannot find that run on my work computer... I'll look this evening on my home computer, maybe it's still there. Otherwise I'll have to train it again, which you can also do if you wish like so:
python main.py --config configs/config.inria_dataset_osm_mask_only.unet16
It however requires original OSM annotations, which I just added to the shared data folder on Google Drive: https://drive.google.com/drive/folders/19yqseUsggPEwLFTBl04CmGmzCZAIOYhy?usp=sharing.
The folder name for the original OSM annotations used is "gt_polygons" inside the "train" and "test" folders.

@sarathsrk
Copy link
Author

@Lydorn Thanks for sharing the dataset, I will try to start training the model today. But if the trained model is available in your home computer, just share it with me. It would save some time for me

@Lydorn
Copy link
Owner

Lydorn commented Oct 13, 2020

@sarath0993 I found the trained model! I just uploaded it on the Google Drive. The name of the run is inria_dataset_osm_mask_only.unet16. It was trained on OSM masks from the Inria images but should work for other masks as well. Maybe if in your new dataset there are very weird building shape it might fail but that's quite unlikely.
I'll let you close the issue if it works like you did with the previous one :-)

@sarathsrk
Copy link
Author

Sorry stuck with this error once again

Infer images: 0%| | 0/3073 [00:00<?, ?it/s, status=InferenceTraceback (most recent call last): File "polygonize_mask.py", line 198, in <module> main() File "polygonize_mask.py", line 194, in main polygonize_mask(config, args.filepath, backbone) File "polygonize_mask.py", line 144, in polygonize_mask tile_data = inference.inference(config, model, tile_data, compute_polygonization=False) File "/app/Polygonization-by-Frame-Field-Learning/frame_field_learning/inference.py", line 24, in inference inference_with_patching(config, model, tile_data) File "/app/Polygonization-by-Frame-Field-Learning/frame_field_learning/inference.py", line 120, in inference_with_patching pred, batch = network_inference(config, model, batch) File "/app/Polygonization-by-Frame-Field-Learning/frame_field_learning/inference.py", line 17, in network_inference pred, batch = model(batch, tta=config["eval_params"]["test_time_augmentation"]) KeyError: 'test_time_augmentation'

@Lydorn
Copy link
Owner

Lydorn commented Oct 13, 2020

Ah sorry about that, I fixed the config.json file in the Google Drive to include missing parameters, you can download it again.
I used the polygonize_mask.py script like so:
python polygonize_mask.py --run_name inria_dataset_osm_mask_only.unet16 --filepath ~/data/AerialImageDataset/raw/train/gt/*.tif

@sarathsrk
Copy link
Author

sarathsrk commented Oct 14, 2020

@Lydorn Thanks I can able to infer now, I am doing it right now. Meanwhile I have two questions in this regard

  1. Reading binary image(black-background, white-building), vectorizing white pixels and saving them as geojson really requires Deep Learning approach? Are these tasks cannot be done using OpenCV or pillow to read pixels? Maybe similar to this one

  2. I set batch_size=32 in config.json and args.json files, but still it takes only 1 GB of GPU memory(available 11GB) and the process is quite slow. Any suggestion to correct parameter to improve speed?

@Lydorn
Copy link
Owner

Lydorn commented Oct 14, 2020

  1. So using OpenCV or skimage (which is the one you linked to uses, and is marching squares) still do not perfectly vectorize binary masks because there are some local ambiguities between corners and slanted walls. For example marching squares rounds corners. When not looking very closely it works but in our case we use the polygons to compute the tangent angle ground truth for the frame field so it needs to be the right one, especially for corners.

  2. So for this script the batch_size will not be taken into account because the inference is done on one crop of the image at a time (and then the image is stitched back together). You can however increase the size of this crop in config.json/eval_params/patch_size (1024 by default).

@sarathsrk
Copy link
Author

@Lydorn Got it! Thanks for more detailed answer. I have started to train the model, I have two questions now

  1. I increased train_fraction parameter from 0.75 to 0.90 in the config, it is also updated in the config in runs directory(just a sanity check)
    I used 3073 samples in total, while doing validation it runs for 760 images(ratio 0.1 is only 307 images)

  2. When I train, it uses CPU cores along with GPU memory which results slowing down the machine rapidly. Is there any parameter to be changed to resolve this issue?
    Note: Already I changed num_workers from null to 4

@sarathsrk
Copy link
Author

Another change I noticed is, I have used your config.inria_dataset_polygonized.unet_resnet101_pretrained.leaderboard.json config file to train model. In "polygonize_params" it is showing like "method": ["simple", "acm"] but "asm" is being used by the algorithm. Please refer the config below

"polygonize_params": {"method": ["simple", "acm"], "common_params": {"init_data_level": 0.5}, "simple_method": {"data_level": 0.5, "tolerance": [0.125], "seg_threshold": 0.5, "min_area": 10}, "asm_method": {"init_method": "marching_squares", "data_level": 0.5, "loss_params": {"coefs": {"step_thresholds": [0, 100, 200, 300], "data": [1.0, 0.1, 0.0, 0.0], "crossfield": [0.0, 0.05, 0.0, 0.0], "length": [0.1, 0.01, 0.0, 0.0], "curvature": [0.0, 0.0, 1.0, 0.0], "corner": [0.0, 0.0, 0.5, 0.0], "junction": [0.0, 0.0, 0.5, 0.0]}, "curvature_dissimilarity_threshold": 2, "corner_angles": [45, 90, 135], "corner_angle_threshold": 22.5, "junction_angles": [0, 45, 90, 135], "junction_angle_weights": [1, 0.01, 0.1, 0.01], "junction_angle_threshold": 22.5}, "lr": 0.1, "gamma": 0.995, "device": "cuda", "tolerance": [0.125, 1], "seg_threshold": 0.5, "min_area": 10}, "acm_method": {"steps": 500, "data_level": 0.5, "data_coef": 0.1, "length_coef": 0.4, "crossfield_coef": 0.5, "poly_lr": 0.01, "warmup_iters": 100, "warmup_factor": 0.1, "device": "cuda", "tolerance": [0.125, 1], "seg_threshold": 0.5, "min_area": 10}}}

any comments on this?

@Lydorn
Copy link
Owner

Lydorn commented Oct 15, 2020

  1. The config in runs directory is used when only the --run_name argument is given. But if the --config argument is given, it is that config file that counts (and the one in the run folder is not loaded). It is difficult to know why it didn't take into account your change from 0.75 to 0.90. Maybe you didn't make the change in the right file? When launching main.py, there is the structure of config files being loaded, it could help.

  2. The CPU is just used for loading data, and then GPU performs data augmentations so that CPU does not become the bottleneck (hopefully). In your case, is the CPU still the bottleneck? Are all cores used?

  3. About "asm" being used, that is very weird. If it's not in the "method" list it should not be launched... How did you check about "asm" being used?

@sarathsrk
Copy link
Author

sarathsrk commented Oct 15, 2020

The CPU is just used for loading data, and then GPU performs data augmentations so that CPU does not become the
bottleneck (hopefully). In your case, is the CPU still the bottleneck? Are all cores used?

Yes, While doing preprocess all cores are utilized completely
But while validation part in Training, cores are not utilized but my system is getting stuck or works slowly.

About "asm" being used, that is very weird. If it's not in the "method" list it should not be launched... How did you check about >"asm" being used?

In runs directory, that config file shows "asm_method": {"init_method": "marching_squares" , "junction_angle_weights": [1, 0.01, 0.1, 0.01], these configuration is available under asm_method. Also these values are not there in your config file.

Regarding that train_fraction I am using the correct configuration file from the configs directory. But changing the value doesn't change anything. Can you please tell me which file handles data part for validation?

@sarathsrk
Copy link
Author

sarathsrk commented Oct 15, 2020

Sorry for providing wrong information, While validation phase some CPU cores are utilized 100%. What should I do to get it fixed?
Screenshot from 2020-10-15 17-38-08

And the other observation I noted is, While training the train phase is very fast like 34/34 is completed in 30 seconds wheras val phase is taking more than 20 minutes after every train phase.

@sarathsrk
Copy link
Author

@Lydorn slowness issue resolved after changing num_workers=0.
For validation, I don't know whether or not this is correct I used 3072 samples for training, train_fraction: 0.95, input size: 725, patch size: 512.
# --- Start training --- # Train dataset has 136 samples.

I am using 3072 samples but it is showing "136 samples" is that correct?

It takes 3038 patches for validation,is that correct? please refer screenshot below

Screenshot from 2020-10-16 10-12-48

@Lydorn
Copy link
Owner

Lydorn commented Oct 16, 2020

Hi, I'm sorry I won't have time today to help (I'm defending my PhD :-)) but I'll get back to you when I can !

@sarathsrk
Copy link
Author

@Lydorn Sure! Good luck!

@mohammadreza-sheykhmousa

Hi @sarathsrk Did you find a workaround this issue? Any last update you might have? Thanks in advance :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants