Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RemoteSensing遥感影像分割,多波段train_demo.py报错 #297

Closed
KuntaHu opened this issue Jun 19, 2020 · 17 comments
Closed

RemoteSensing遥感影像分割,多波段train_demo.py报错 #297

KuntaHu opened this issue Jun 19, 2020 · 17 comments
Assignees

Comments

@KuntaHu
Copy link

KuntaHu commented Jun 19, 2020

您好,按照教程,将多波段转为npy格式保存好。同时配置好了数据文件格式。
因为输入有6个波段所以 训练时候 设置为--channel 6,然后运行train_demo.py
有如下报错:

2020-06-19 23:58:38 [INFO] 40 samples in file data/dataset/train.txt
2020-06-19 23:58:38 [INFO] 30 samples in file data/dataset/val.txt
W0619 23:58:39.167647 1220 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W0619 23:58:39.171162 1220 device_context.cc:245] device: 0, cuDNN Version: 7.3.
2020-06-19 23:58:40,660-INFO: Instantiated empty configuration.
HDFS initialization failed, please check if .hdfscli,cfg exists.
Exception in thread Thread-6:
Traceback (most recent call last):
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/aistudio/contrib/RemoteSensing/readers/base.py", line 85, in handle_worker
r = mapper(sample[0], sample[1], sample[2])
File "/home/aistudio/contrib/RemoteSensing/transforms/transforms.py", line 68, in call
outputs = op(im, im_info, label)
File "/home/aistudio/contrib/RemoteSensing/transforms/transforms.py", line 488, in call
im = normalize(im, self.min_val, self.max_val, mean, std)
File "/home/aistudio/contrib/RemoteSensing/transforms/ops.py", line 25, in normalize
im = (im.astype(np.float32, copy=False) - min_value) / range_value
ValueError: operands could not be broadcast together with shapes (256,256,6) (3,)

后来尝试选取3个波段,保存为npy格式,将--channel设置为3,重新读取数据,运行正确。说明我的数据文件格式是正确的。难道train_demo.py只能设定3波段输入?请问如何输入多波段呢?

谢谢!

@KuntaHu
Copy link
Author

KuntaHu commented Jun 20, 2020

后来查明原因,是因为我自备的数据已经归一化了,然后transoforms里面的normalize报错。需要将数据集转为0-255,但是标注集还是0-1的单通道图片。这样就可以运行通过了。但是当我运行train_demo的时候(将数据换成自制的多通道数据),想fine-tune Unet模型时候,loss不下降,维持在1左右,同时IoU一直未0.5,kappa为-1。利用训练之后的模型预测结果都是0.

请问我改如何利用train_demo或者说remotesensing已经训练好的cloud分割的模型 去fine-tune迁移到我的数据集上?谢谢

@wuyefeilin wuyefeilin assigned wuyefeilin and LutaoChu and unassigned wuyefeilin Jun 22, 2020
@chang-png
Copy link

你好,能分享一下转换为npy格式的数据代码吗,我转换的数据中总有一部分呢数据没有转换,谢谢

@LutaoChu
Copy link
Contributor

LutaoChu commented Jun 23, 2020

后来查明原因,是因为我自备的数据已经归一化了,然后transoforms里面的normalize报错。需要将数据集转为0-255,但是标注集还是0-1的单通道图片。这样就可以运行通过了。但是当我运行train_demo的时候(将数据换成自制的多通道数据),想fine-tune Unet模型时候,loss不下降,维持在1左右,同时IoU一直未0.5,kappa为-1。利用训练之后的模型预测结果都是0.

请问我改如何利用train_demo或者说remotesensing已经训练好的cloud分割的模型 去fine-tune迁移到我的数据集上?谢谢

你好,请问你使用的版本是develop还是release/v0.5.0呢?推荐使用最新的develop版本
关于训练loss问题,应该是配置没有配对,可以提供一下你的train_demo脚本,我看一下。
关于预训练模型迁移,迁移学习需要确保数据波段数和预训练模型波段数相同才行。如果你的数据集跟cloud的波段数相同,可以直接fine-tune。如果你的数据集不大,建议直接训练。

@LutaoChu
Copy link
Contributor

LutaoChu commented Jun 23, 2020

你好,能分享一下转换为npy格式的数据代码吗,我转换的数据中总有一部分呢数据没有转换,谢谢

以下是tif转为npy的代码,供参考:

import gdal
import numpy as np
import os
import os.path as osp
import cv2


def readTifImg(fileName):
    dataset = gdal.Open(fileName)
    if dataset == None:
        raise Exception('can not open', fileName)
    im_data = dataset.ReadAsArray()
    return im_data
    

img_dir = 'xxx'
output_dir = 'xxx'
if not osp.exists(output_dir):
    os.makedirs(output_dir)
count = 0
for path,dir_list,file_list in os.walk(img_dir):
    for file_name in file_list:
        file = osp.join(path, file_name)
        img = readTifImg(file)
        img = img.transpose((1,2,0))

        output_file = osp.join(output_dir, file_name.rstrip('.tif'))
        np.save(output_file, img)
        count += 1
        if count % 10 == 0:
            print("current process: {}.".format(count))

print('total count = ', count)

最新的develop版本已支持tif、img、png、npy 4种格式,这4种格式的数据无需转换,可直接读取,推荐使用

@KuntaHu
Copy link
Author

KuntaHu commented Jun 25, 2020

您好,我尝试使用develop版本,但是与0.5.0版本不同之处在于使用了gdal。采用conda安装一直失败,如下报错:

ERROR conda.core.link:_execute(637): An error occurred while installing package 'conda-forge::parso-0.7.0-pyh9f0ad1d_0'.
PermissionError(13, 'Permission denied')
Attempting to roll back.

Rolling back transaction: | WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/pycache/init.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/pycache/_compatibility.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/pycache/cache.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/pycache/file_io.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/pycache/grammar.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/pycache/normalizer.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/pycache/parser.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/pgen2/pycache/init.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/pgen2/pycache/generator.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/pgen2/pycache/grammar_parser.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/python/pycache/init.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/python/pycache/diff.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/python/pycache/errors.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/python/pycache/parser.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/python/pycache/pep8.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/python/pycache/prefix.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/python/pycache/token.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/python/pycache/tokenize.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/python/pycache/tree.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/pycache/tree.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /opt/conda/lib/python3.6/site-packages/parso/pycache/utils.cpython-36.pyc. Please remove this file manually (you may need to reboot to free file handles)
done

[Errno 13] Permission denied: '/opt/conda/lib/python3.6/site-packages/parso-0.7.0.dist-info/AUTHORS.txt'

因此我已经将数据在本地转为npy了,所以comment掉import gdal之后,也是可以正常运行的。但还是希望能够正常安装gdal。

还有几个小疑问,

  1. RemoteSensing里面可以切换model的backbone吗?以及RemoteSensing模型中Unet和hrnet默认的backbone?
  2. Train 数据中的loss等评价指标是否可以保存,因为想绘制loss随epoch的下降曲线。请问PaddleSeg里面有实现代码吗?我观察到saved_model里面保存了一个log文件,请问有什么方法读取吗?

谢谢!

@nepeplwu
Copy link
Collaborator

@KuntaHu

  1. Unet不支持backbone切换,hrnet可以切换不同大小的结构,在创建model时指定不同state的channel数量和module数量就行
    接口参数见:https://github.com/PaddlePaddle/PaddleSeg/blob/release/v0.5.0/contrib/RemoteSensing/models/hrnet.py#L47

  2. model.train接口有一个use_vdl的参数,设置为True会自动记录日志文件,接着可以通过visualdl的命令启动一个前端页面查看训练日志

@Rayaction
Copy link

img

你好,能分享一下转换为npy格式的数据代码吗,我转换的数据中总有一部分呢数据没有转换,谢谢

以下是tif转为npy的代码,供参考:

import gdal
import numpy as np
import os
import os.path as osp
import cv2


def readTifImg(fileName):
    dataset = gdal.Open(fileName)
    if dataset == None:
        raise Exception('can not open', fileName)
    im_data = dataset.ReadAsArray()
    return im_data
    

img_dir = 'xxx'
output_dir = 'xxx'
if not osp.exists(output_dir):
    os.makedirs(output_dir)
count = 0
for path,dir_list,file_list in os.walk(img_dir):
    for file_name in file_list:
        file = osp.join(path, file_name)
        img = readTifImg(file)
        img = img.transpose((1,2,0))

        output_file = osp.join(output_dir, file_name.rstrip('.tif'))
        np.save(output_file, img)
        count += 1
        if count % 10 == 0:
            print("current process: {}.".format(count))

print('total count = ', count)

最新的develop版本已支持tif、img、png、npy 4种格式,这4种格式的数据无需转换,可直接读取,推荐使用

tif、img、png、npy--》img是包括jpg吗?

@LutaoChu
Copy link
Contributor

LutaoChu commented Nov 9, 2020

img是单独的一种格式,不包括jpg。
想读jpg可以用opencv或PIL,在read_img中添加几行代码即可
https://github.com/PaddlePaddle/PaddleSeg/blob/develop/contrib/RemoteSensing/readers/reader.py

@LutaoChu
Copy link
Contributor

LutaoChu commented Nov 9, 2020

不过注意PIL库和opencv库在读取图片上的差异:
opencv:图片的通道顺序为BGR
PIL:通道顺序为RGB

@Rayaction
Copy link

不过注意PIL库和opencv库在读取图片上的差异:
opencv:图片的通道顺序为BGR
PIL:通道顺序为RGB

也就是说我需要再添加一个cvtcolor转换一下是吗

@LutaoChu
Copy link
Contributor

看你是否需要进行转换了,需要的话就进行cvtcolor转换

@Rayaction
Copy link

Rayaction commented Nov 10, 2020

看你是否需要进行转换了,需要的话就进行cvtcolor转换

我在reader部分加了这个判断后的操作
elif ext == '.jpg':
im_data = cv2.imread(img_path)
im_data = cv2.cvtColor(im_data, cv2.COLOR_BGR2RGB)
print(im_data.shape)
return np.transpose(im_data, [2, 0, 1]).astype(np.float32)
但是搞完也还是不对.paddle输入需要是chw 还是hwc
这个错:
ValueError: operands could not be broadcast together with shapes (1000,1000,256) (3,)

@LutaoChu
Copy link
Contributor

paddle输入是nchw
去掉np.transpose操作试一下

@Rayaction
Copy link

貌似可行,但是我试了--train_batch_size 32到1都会提示Out of memory error on GPU 0. Cannot allocate 7.629395GB memory on GPU 0, available memory is only 5.876587GB.类似的超内存的问题,我的gpu是v100 32g的。。
bs设置为1的话又说不能allocate 100多兆,这是为啥。也给他了visible cuda devices

@LutaoChu
Copy link
Contributor

是这样的,框架不是一次性allocate 32g的显存,而是多次allocate所需的显存。所以allocate 100多兆报错是正常的,说明最后申请的时候显存不够100多兆了

你的模型是不是太大了,或者图像尺寸太大了?

@Rayaction
Copy link

用的就是那个remote sensing的代码,图像大小256256的,模型是unet,应该不会吧

@Rayaction
Copy link

Rayaction commented Nov 10, 2020

#574
这个能帮我看下吗,我下载下来的模型好像目录文件不太对的样子
加载时显示没找到

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants