- 2 Jan 2025: PointSAM has been accepted by TGRS and is now available here.
- 8 Dec 2024: The complete code is released.
- 20 Sep 2024: The arXiv version is released here.
To ensure compatibility, Python version must not exceed 3.10. Follow these steps to set up your environment:
conda create --name pointsam python=3.10
conda activate pointsam
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118
git clone https://github.com/Lans1ng/PointSAM.git
cd PointSAM
pip install -r requirements.txtNote:
The CUDA version in the pip install command is specified as cu118 (CUDA 11.8). If your system uses a different CUDA version (e.g., CUDA 12.1), replace cu118 with the appropriate version tag (e.g., cu121).
-
Dataset download address: WHU Building Dataset。
-
For converting semantic label to instance label, you can refer to corresponding conversion script.
- Dataset download address: HRSID Dataset.
-
Dataset download address: NWPU VHR-10 Dataset.
-
Instance label download address: NWPU VHR-10 Instance Label.
For convenience, the necessary JSON annotations are included in this repo. You only need to download the corresponding images. Organize your dataset as follows:
data
├── WHU
│ ├── annotations
│ │ ├── WHU_building_train.json
│ │ ├── WHU_building_test.json
│ │ └── WHU_building_val.json
│ └── images
│ ├── train
│ │ ├── image
│ │ └── label
│ ├── val
│ │ ├── image
│ │ └── label
│ └── test
│ ├── image
│ └── label
├── HRSID
│ ├── Annotations
│ │ ├── all
│ │ ├── inshore
│ │ │ ├── inshore_test.json
│ │ │ └── inshore_train.json
│ │ └── offshore
│ └── Images
└── NWPU
├── Annotations
│ ├── NWPU_instnaces_train.json
│ └── NWPU_instnaces_val.json
└── Images
Click the links below to download the checkpoint for the corresponding model type.
vit-h: ViT-H SAM model.vit-l: ViT-L SAM model.vit-b: ViT-B SAM model.
After downloading, move the models to the pretrain folder.
Note: In our project, only the vit-b model is used.
For convenience, the scripts folder provides training instructions for three different methods implemented on the NWPU VHR-10, WHU, and HRSID datasets:
- Supervised Training – mask supervised and uses full mask annotations.
- Self-Training – a point supervised baseline that uses pseudo labels generated by SAM.
- PointSAM – the proposed PointSAM method trained with point-level annotations.
Here’s an example of training PointSAM on the WHU dataset:
bash scripts/train_whu_pointsam.sh
Here’s an example of how to perform inference:
python inference.py --cfg <CONFIG_FILE_PATH> --out_dir <OUTPUT_DIR> --ckpt <CHECKPOINT_PATH>
Please replace <CONFIG_FILE_PATH>, <OUTPUT_DIR>, and <CHECKPOINT_PATH> with the values of the actual path.
Note: The generated results consist of four images arranged in parallel:
- The first image is the original input image.
- The second image is the visualization of the GT mask.
- The third image is the result obtained by direct testing through the original SAM.
- The fourth image is the result obtained using the provided checkpoint.
If you find this project useful in your research, please consider starring ⭐ and citing 📚:
@ARTICLE{10839471,
author={Liu, Nanqing and Xu, Xun and Su, Yongyi and Zhang, Haojie and Li, Heng-Chao},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={PointSAM: Pointly-Supervised Segment Anything Model for Remote Sensing Images},
year={2025},
volume={63},
number={},
pages={1-15},
doi={10.1109/TGRS.2025.3529031}}

