# 🎓 Reproducing **Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery**  Paper

Welcome to this **step-by-step guide** for **reproducing the results** of the paper **"Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery"**! This guide will walk through the process of setting up, running, and verifying the paper's experiments.  

## 🔹 Steps to Reproduce:  
1. **Clone the repository** – Download the official code and set up the project.  
2. **Set up the environment** – Install dependencies and configure necessary settings.  
3. **Download and preprocess datasets** – Retrieve validation datasets and process them correctly.  
4. **Run experiments and reproduce results** – Validate key figures or tables from the paper.  

Let's get started and replicate the findings! 🚀  

### Summary of the paper:
The paper introduces SatMAE++, a new method to improve how transformer models learn from satellite images. The goal is to handle data from different sensors and scales better than older models like SatMAE, which don’t fully use all the available information. SatMAE++ uses a technique called multi-scale reconstruction to help the model learn details from images at different sizes. The method was tested on six datasets, including fMoW-RGB, fMoW-Sentinel, and BigEarthNet, and showed excellent results. It achieved a Top-1 Accuracy of 78.14% on fMoW-RGB. SatMAE++ also learns faster and performed better in land cover classification, with a 3.6% improvement, making it a strong tool for analyzing satellite images.

## Challenges Encountered:

1. To speed up the process, I need access to a GPU. I created an account on the FAU Remote CIP Pool and set up a directory under my IDM ID.
2. I then began working on the project by cloning the GitHub repository.
3. In the Readme file, they did not mention anything in details like no information about environment setting, code run nothing at all specifically.
4. For the environment setup, I encountered challenges due to incomplete instructions in the README.md file. The Conda environment required significant storage, and I had to install  the required pip packages separately to address version compatibility issues.
5. Dataset github file mentioned but those links have some credential issues. So i need to do it manually with data preprocessing. See dataset section here.
6. To run the actual code, they just gave the command but not other informations where to change in code and which model they used.
7. Need to change in `mainfinetune.py`, `enginefinetune.py`, `misc.py` and `dataset.py` files. But the accuracy level is not appropiate like them.


## 🔹No. 1: Clone the GitHub Repository 🛠️

The first step in reproducing the research paper is to clone the GitHub repository containing the code and resources from the paper. Below are the steps for cloning the repository to your **Remote machine**.

### A. **Cloning on Your Remote Machine** 🖥️

1. Open your terminal on Visual Studio Prompt (Windows).
2. Go to the working folder.

   ```zsh
    cd /proj/ciptmp
    cd ev72erij

   ```
3. Step 1 will create a folder in my current directory with the same name as the repository (e.g., satmae_pp). You can navigate into the folder using:
   
   ```zsh
   cd satmae_pp
    
   
4. Run the following command to clone the repository:

   ```zsh
   git clone https://github.com/techmn/satmae_pp
   
  This will create a local copy of the repository on FAU Remote machine.




## 🔹No. 2: Set Up the Environment ⚙️

After cloning the repository, the next step is to set up the environment where the code will run. This typically involves installing dependencies.

While a requirements.txt file or cuda.yml are not provided. So need to create conda environment.
### Manually Resolving Dependencies 🔧

1. Run the following command in the terminal:
         ```
           conda env create satmaeenv
        ```

  
2. Use pip to manually install the missing packages. For example:
   
   - With `pip`:
     ```zsh
     pip install tensorvision
     ```

3. Repeat this process for any other missing or conflicting dependencies until the environment is successfully set up.

> Since you are working on a **remote machine**, it is recommended to create a virtual environment (using venv or conda) to keep project dependencies isolated.
To activate the environment, run:
   ```zsh
      conda activate satmaeenv
   ```


## 🔹No. 3: Reproducing the *Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery* Paper 🎓


Link to Paper's Github Repo https://github.com/techmn/satmae_pp



### 📁 Dataset:

Here, there are two dataset are available fMow-Sentinel and fMow-rgb. I am only working on FMow-RGB Dataset as the datasets are too large to work on.
In the paper, they are using distributed methods to run these datasets.

### FMoW-RGB
You can download the dataset by following the instructions here [[fmow-github]](https://github.com/fMoW/dataset).

**Issues:** There are some issues i faced when downloading the datasets.
As Aws/torrent links provided for download purpose, but torrent links aren't useable. Then for aws cls, there are some credential issue occured.
They put whole validation dataset into Six different parquet files. So need to write python script to set all data into one directory called **val**.

### Data pre-processing

Download validation set and need to pre-process it. For this, I need to write a python code. Save it to files.py and run it.
```cmd
   

import pandas as pd
import os
from tqdm import tqdm
import base64

# Load the Parquet file
df = pd.read_parquet('val-00004-of-00005-c2208e03589db7a1.parquet')

# Base output folder
output_base_folder = 'extracted_images'
os.makedirs(output_base_folder, exist_ok=True)

# Iterate over each row and save the images
for index, row in tqdm(df.iterrows(), total=len(df)):
    try:
        filename = row['img_filename']
        file_path = row['img_path']  # Directory structure from file_path column
        
        if pd.isna(row['image']) or pd.isna(filename) or pd.isna(file_path):
            continue  # Skip rows with missing image data, filename, or file path
        
        # Handle different data types for image data
        if isinstance(row['image'], dict):
            image_data = row['image'].get('data') or row['image'].get('bytes')
        elif isinstance(row['image'], str) and row['image'].startswith('data:image'):
            image_data = base64.b64decode(row['image'].split(',')[1])
        else:
            image_data = row['image']
        
        # Validate byte data
        if not isinstance(image_data, (bytes, bytearray)):
            print(f"Skipping {filename}: image data is not in byte format.")
            continue
        
        # Extract directory path from file_path
        directory_path = os.path.join(output_base_folder, os.path.dirname(file_path))
        os.makedirs(directory_path, exist_ok=True)  # Ensure subdirectories exist
        
        
        full_file_path = os.path.join(directory_path, filename)
        
        # Save the image file
        with open(full_file_path, 'wb') as file:
            file.write(image_data)
    
    except Exception as e:
        print(f"Failed to save {filename}: {e}")

print("All images have been saved successfully in their respective directories.")

```
For convenient, save this code in a file named file.py and run it in the command prompt.

```cmd
python file.py
```
this code are done by me for pre-process the val datasets and save these into **val** folder.

### 📁 Directory Structure for Data Organization

Directory structure of the dataset should look like as below:

```
[Root folder]
____ train_62classes.json
____ val_62classes.json
____ train
________ aiport
________ aiport_hangar
________ .......
____ val
________ aiport
________ aiport_hangar
________ .......
```





#### 📝 Explanation of the Structure:

Download the train and validation json files [[data-split]](https://github.com/techmn/satmae_pp/tree/main/fmow_rgb_data_split).


In this phase, walk through the steps to reproduce the results from the *Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery* paper using the cloned repository.

## 🔹No 4: Run main experiments

### Finetuning
Use the following command to finetune the ViT model (default is ViT-L). In the Readme.md file,
They gave command but need to edit it because of the requirements. So I edited it and put this command in `run.sh` to run easily.

This is the edited command:

CUDA_LAUNCH_BLOCKING=0 \
CUDA_VISIBLE_DEVICES=0 python main_finetune.py \
--batch_size 8 --accum_iter 16 \
--epochs 50 --warmup_epochs 5 \
--input_size 224 --patch_size 16 \
--model_type vanilla \
--dataset_type rgb \
--weight_decay 0.05 --drop_path 0.2 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--lr 0.001 --num_workers 16 \
--train_path ./train_62classes.json \
--test_path ./val_62classes.json \
--output_dir ./finetune_dir \
--log_dir ./finetune_dir \
--eval \
--cls_token \
--resume ./checkpoint_ViT-L_finetune_fmow_rgb.pth


After put it to `run.sh`:

 ```zsh
     /run.sh
```


------------------------------------------------------------------------------------

## Model Weights
Using ViT-L model for finetuning.
| Model | Dataset | Top1 Acc (%) |  Finetune |
| :---  |  :---:  |    :---:     |  :---:   |
| ViT-L | FMoW-RGB | 78.14 | [download](https://huggingface.co/mubashir04/checkpoint_ViT-L_finetune_fmow_rgb) |



## 📊 Evaluation Results

After running the evaluation command, the following result is obtained for only `fMow-rgb` dataset:

![experiment.png](attachment:experiment.png)



## 📊 Comparison of Evaluation Results

### 📝 Comparison with Paper's Results

The expected results were not achieved in my experiment. Specifically, the Top-1 Accuracy obtained was `45.6%`, which is significantly lower than the original paper's reported value of `78.14%`. I believe this discrepancy is primarily due to issues related to poor code documentation. The lack of proper comments in the code made it challenging to understand the implementation details, which not only extended the execution time but also complicated the process of making necessary modifications. These factors likely contributed to the performance gap observed in the experimental results.
Below is the comparison of the evaluation results reported in the paper at `Table 2` for all the datasets.

#### Paper Reported Results:
![paperpic.png](attachment:paperpic.png)

