diff --git a/source/en/_static/image/internnav_process.png b/source/en/_static/image/internnav_process.png new file mode 100644 index 0000000..b128f0b Binary files /dev/null and b/source/en/_static/image/internnav_process.png differ diff --git a/source/en/_static/video/nav_demo.webm b/source/en/_static/video/nav_demo.webm new file mode 100644 index 0000000..c0bab98 Binary files /dev/null and b/source/en/_static/video/nav_demo.webm differ diff --git a/source/en/_static/video/nav_eval.gif b/source/en/_static/video/nav_eval.gif new file mode 100644 index 0000000..7018288 Binary files /dev/null and b/source/en/_static/video/nav_eval.gif differ diff --git a/source/en/user_guide/internnav/quick_start/create_model.md b/source/en/user_guide/internnav/quick_start/create_model.md deleted file mode 100644 index 5d2ff7b..0000000 --- a/source/en/user_guide/internnav/quick_start/create_model.md +++ /dev/null @@ -1,122 +0,0 @@ -# Create Your Model and Agent - -## Development Overview -The main architecture of the evaluation code adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then make model to predict and response to the client. - -The InternNav project adopts a modular design, allowing developers to easily add new navigation algorithms. -The main components include: - -- **Model**: Implements the specific neural network architecture and inference logic - -- **Agent**: Serves as a wrapper for the Model, handling environment interaction and data preprocessing - -- **Config**: Defines configuration parameters for the model and training - -## Custom Model -A Model is the concrete implementation of your algorithm. Implement model under `baselines/models`. A model ideally would inherit from the base model and implement the following key methods: - -- `forward(train_batch) -> dict(output, loss)` -- `inference(obs_batch, state) -> output_for_agent` - -## Create a Custom Config Class - -In the model file, define a `Config` class that inherits from `PretrainedConfig`. -A reference implementation is `CMAModelConfig` in [`cma_model.py`](../internnav/model/cma/cma_policy.py). - -## Registration and Integration - -In [`internnav/model/__init__.py`](../internnav/model/__init__.py): -- Add the new model to `get_policy`. -- Add the new model's configuration to `get_config`. - -## Create a Custom Agent - -The Agent handles interaction with the environment, data preprocessing/postprocessing, and calls the Model for inference. -A custom Agent usually inherits from [`Agent`](../internnav/agent/base.py) and implements the following key methods: - -- `reset()`: Resets the Agent's internal state (e.g., RNN states, action history). Called at the start of each episode. -- `inference(obs)`: Receives environment observations `obs`, performs preprocessing (e.g., tokenizing instructions, padding), calls the model for inference, and returns an action. -- `step(obs)`: The external interface, usually calls `inference`, and can include logging or timing. - -Example: [`CMAAgent`](../internnav/agent/cma_agent.py) - -For each step, the agent should expect an observation from environment. - -For the vln benchmark under internutopia: - -``` -action = self.agent.step(obs) -``` -**obs** has format: -``` -obs = [{ - 'globalgps': [X, Y, Z] # robot location - 'globalrotation': [X, Y, Z, W] # robot orientation in quaternion - 'rgb': np.array(256, 256, 3) # rgb camera image - 'depth': np.array(256, 256, 1) # depth image -}] -``` -**action** has format: -``` -action = List[int] # action for each environments -# 0: stop -# 1: move forward -# 2: turn left -# 3: turn right -``` - -## Create a Trainer - -The Trainer manages the training loop, including data loading, forward pass, loss calculation, and backpropagation. -A custom trainer usually inherits from the [`Base Trainer`](../internnav/trainer/base.py) and implements: - -- `train_epoch()`: Runs one training epoch (batch iteration, forward pass, loss calculation, parameter update). -- `eval_epoch()`: Evaluates the model on the validation set and records metrics. -- `save_checkpoint()`: Saves model weights, optimizer state, and training progress. -- `load_checkpoint()`: Loads pretrained models or resumes training. - -Example: [`CMATrainer`](../internnav/trainer/cma_trainer.py) shows how to handle sequence data, compute action loss, and implement imitation learning. - -## Training Data - -The training data is under `data/vln_pe/traj_data`. Our dataset provides trajectory data collected from the H1 robot as it navigates through the task environment. -Each observation in the trajectory is paired with its corresponding action. - -You may also incorporate external datasets to improve model generalization. - -## Evaluation Data -In `raw_data/val`, for each task, the model should guide the robot at the start position and rotation to the target position with language instruction. - -## Set the Corresponding Configuration - -Refer to existing **training** configuration files for customization: - -- **CMA Model Config**: [`cma_exp_cfg`](../scripts/train/configs/cma.py) - -Configuration files should define: -- `ExpCfg` (experiment config) -- `EvalCfg` (evaluation config) -- `IlCfg` (imitation learning config) - -Ensure your configuration is imported and registered in [`__init__.py`](../scripts/train/configs/__init__.py). - -Key parameters include: -- `name`: Experiment name -- `model_name`: Must match the name used during model registration -- `batch_size`: Batch size -- `lr`: Learning rate -- `epochs`: Number of training epochs -- `dataset_*_root_dir`: Dataset paths -- `lmdb_features_dir`: Feature storage path - -Refer to existing **evaluation** config files for customization: - -- **CMA Model Evaluation Config**: [`h1_cma_cfg.py`](../scripts/eval/configs/h1_cma_cfg.py) - -Main fields: -- `name`: Evaluation experiment name -- `model_name`: Must match the name used during training -- `ckpt_to_load`: Path to the model checkpoint -- `task`: Define the tasks settings, number of env, scene, robots -- `dataset`: Load r2r or interiornav dataset -- `split`: Dataset split (`val_seen`, `val_unseen`, `test`, etc.) diff --git a/source/en/user_guide/internnav/quick_start/index.md b/source/en/user_guide/internnav/quick_start/index.md index 96c82fa..c484f2d 100644 --- a/source/en/user_guide/internnav/quick_start/index.md +++ b/source/en/user_guide/internnav/quick_start/index.md @@ -13,7 +13,7 @@ myst: :maxdepth: 2 installation +simulation +interndata train_eval -vln_evaluation -create_model ``` diff --git a/source/en/user_guide/internnav/quick_start/installation.md b/source/en/user_guide/internnav/quick_start/installation.md index e68c7c4..cc51af0 100644 --- a/source/en/user_guide/internnav/quick_start/installation.md +++ b/source/en/user_guide/internnav/quick_start/installation.md @@ -13,7 +13,16 @@ # Installation Guide -This page provides detailed guidance on simulation environment setup and quantitative model evaluation. If you want to reproduce the results of the [technical report](https://internrobotics.github.io/internvla-n1.github.io/), you should follow this page. Howerver, for inference-only usage, such as deploying InternVLA-N1 in your own robot or self-built dataset, you could follow this simpler [guideline](https://github.com/InternRobotics/InternNav/blob/main/scripts/eval/inference_only_demo.ipynb) to setup the environment and run inference with the model. +This page provides detailed instructions for installing **InternNav** in inference-only mode, such as when deploying **InternVLA-N1** on your own robot or with a custom dataset. +Follow the steps below to set up the environment and run inference with the model. + +If you want to **reproduce the results** presented in the [technical report](https://internrobotics.github.io/internvla-n1.github.io/), please follow this page, and also complete the following sections on [Simulation Environments Setup](./simulation.md), [Dataset Preparation](./interndata.md) and [Training and Evaluation](./train_eval.md). + +For more advanced examples, refer to these demos: + +- [**InternVLA-N1 Inference-only Demo**](https://githubtocolab.com/InternRobotics/InternNav/blob/main/scripts/notebooks/inference_only_demo.ipynb) +- [**Real-World Unitree Go2 Deploy Script**](https://github.com/kew6688/InternNav/tree/main/scripts/realworld) + ## Prerequisites @@ -166,299 +175,139 @@ We provide a flexible installation tool for users who want to use InternNav for ## Quick Installation ### Install InternNav -Clone the InternNav repository: +Clone the **InternNav** repository: ```bash git clone https://github.com/InternRobotics/InternNav.git --recursive ``` -After pull the latest code, install the package: -```bash -pip install -e . -``` -By default, only the core modules are installed. It allows you to inherit the base class and implement your own models or benchmarks. In order to use different functionalities of InternNav tool, several install flags are provided: -- `[isaac]`: install all requires for [Isaac environment](#isaac-sim-environment), follow the instructions below to install the evaluation environment -- `[habitat]`: install all requires for [Habitat environment](#habitat-environment), follow the instructions below to install the evaluation environment -- `demo`: install all requires to run the gradio demo for visualization usage -- `model`: install all requires to train and evaluate all provided models included cma, rdp, navdp, internvla_n1 -- `internvla_n1`: quick installation of internvla_n1 to inference - -usage example: -```bash -pip install -e .[model] -pip install -e .[isaac,demo] -pip install -e .[internvla_n1,habitat] -``` -### Install Models -For quick usage and deploy models, InternNav provide client-server design for easy use of model prediction. More details can be find at [inference_only_demo](https://githubtocolab.com/InternRobotics/InternNav/blob/main/scripts/notebooks/inference_only_demo.ipynb) and [real_world_agent_demo](). Install the requires: +After pull the latest code, install InternNav with models: ```bash -# create a new conda env, the model server can be isolated from the evaluation env -conda create -n python=3.10 libxcb=1.14 -pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118 -pip install -e .[model] -``` - -Our toolchain provides two Python environment solutions to accommodate different usage scenarios with the InternNav-N1 series model: - -- For quick trials and evaluations of the InternNav-N1 model, we recommend using the [Habitat environment](#habitat-environment). This option offer allowing you to quickly test and eval the InternVLA-N1 models with minimal configuration. -- If you require high-fidelity rendering, training capabilities, and physical property evaluations within the environment, we suggest using the [Isaac Sim](#isaac-sim-environment) environment. This solution provides enhanced graphical rendering and more accurate physics simulations for comprehensive testing. - -Choose the environment that best fits your specific needs to optimize your experience with the InternNav-N1 model. Note that both environments support the training of the system1 model NavDP. +# create a new isolated environment for model server +conda create -n python=3.10 libxcb=1.14 +conda activate -### Install with Isaac Sim Environment -#### Prerequisite -- Ubuntu 20.04, 22.04 -- Python 3.10.16 (3.10.* should be ok) -- NVIDIA Omniverse Isaac Sim 4.5.0 -- NVIDIA GPU (RTX 2070 or higher) -- NVIDIA GPU Driver (recommended version 535.216.01+) -- PyTorch 2.5.1, 2.6.0 (recommended) -- CUDA 11.8, 12.4 (recommended) +# install PyTorch (CUDA 11.8) +pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \ + --index-url https://download.pytorch.org/whl/cu118 -Before proceeding with the installation, ensure that you have [Isaac Sim 4.5.0](https://docs.isaacsim.omniverse.nvidia.com/4.5.0/installation/install_workstation.html) and [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) installed. +# install InternNav with model dependencies +pip install -e .[model] -**Pull our latest Docker image with everything you need** (~17GB) -```bash -$ docker pull crpi-mdum1jboc8276vb5.cn-beijing.personal.cr.aliyuncs.com/iros-challenge/internnav:v1.2 ``` -Run the container -```bash -$ xhost +local:root # Allow the container to access the display - -$ cd PATH/TO/INTERNNAV/ - -$ docker run --name internnav -it --rm --gpus all --network host \ - -e "ACCEPT_EULA=Y" \ - -e "PRIVACY_CONSENT=Y" \ - -e "DISPLAY=${DISPLAY}" \ - --entrypoint /bin/bash \ - -w /root/InternNav \ - -v /tmp/.X11-unix/:/tmp/.X11-unix \ - -v ${PWD}:/root/InternNav \ - -v ${HOME}/docker/isaac-sim/cache/kit:/isaac-sim/kit/cache:rw \ - -v ${HOME}/docker/isaac-sim/cache/ov:/root/.cache/ov:rw \ - -v ${HOME}/docker/isaac-sim/cache/pip:/root/.cache/pip:rw \ - -v ${HOME}/docker/isaac-sim/cache/glcache:/root/.cache/nvidia/GLCache:rw \ - -v ${HOME}/docker/isaac-sim/cache/computecache:/root/.nv/ComputeCache:rw \ - -v ${HOME}/docker/isaac-sim/logs:/root/.nvidia-omniverse/logs:rw \ - -v ${HOME}/docker/isaac-sim/data:/root/.local/share/ov/data:rw \ - -v ${HOME}/docker/isaac-sim/documents:/root/Documents:rw \ - -v ${PWD}/data/scene_data/mp3d_pe:/isaac-sim/Matterport3D/data/v1/scans:rw \ - crpi-mdum1jboc8276vb5.cn-beijing.personal.cr.aliyuncs.com/iros-challenge/internnav:v1.2 -``` +To enable additional functionalities, several install flags are available: - -#### Conda installation from Scretch -```bash -conda create -n python=3.10 libxcb=1.14 +| Flag | Description | +| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | +| `.` | Install the core only for the InternNav framework. | +| `[model]` | Install all dependencies for training and evaluating models (CMA, RDP, NavDP, InternVLA-N1). | +| `[isaac]` | Install dependencies for the [Isaac environment](./simualtion.md). | +| `[habitat]` | Install dependencies for the [Habitat environment](./simualtion.md). | -# Install InternUtopia through pip.(2.1.1 and 2.2.0 recommended) -conda activate -pip install internutopia -# Configure the conda environment. -python -m internutopia.setup_conda_pypi -conda deactivate && conda activate -``` -For InternUtopia installation, you can find more detailed [docs](https://internrobotics.github.io/user_guide/internutopia/get_started/installation.html) in [InternUtopia](https://github.com/InternRobotics/InternUtopia?tab=readme-ov-file). -```bash -# Install PyTorch based on your CUDA version -pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118 - -# Install other deps -cd Path/to/InternNav/ -pip install -e .[isaac,model] -``` +### Download Checkpoints +1. **InternVLA-N1 pretrained Checkpoints** +- Download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory. +2. **DepthAnything v2 Checkpoints** +- Download the DepthAnything v2 pretrained [checkpoint](https://huggingface.co/Ashoka74/Placement/resolve/main/depth_anything_v2_vits.pth). Move the checkpoint to the `checkpoints` directory. -If you need to train or evaluate models on [Habitat](#optional-habitat-environment) without physics simulation, we recommend the following setup and easier environment installation. - -### Install with Habitat Environment +## Verification -#### Prerequisite -- Python 3.9 -- Pytorch 2.6.0 -- CUDA 12.4 -- GPU: NVIDIA A100 or higher (optional for VLA training) +InternNav adopts a **client–server design** to simplify model deployment and prediction. +To verify the installation of **InternNav**, start the model server first. ```bash -conda create -n python=3.9 -conda activate +python scripts/eval/start_server.py --port 8087 ``` -Install habitat sim and habitat lab: -```bash -conda install habitat-sim==0.2.4 withbullet headless -c conda-forge -c aihabitat -git clone --branch v0.2.4 https://github.com/facebookresearch/habitat-lab.git -cd habitat-lab -pip install -e habitat-lab # install habitat_lab -pip install -e habitat-baselines # install habitat_baselines +The output should be: ``` -Install pytorch and other requirements: -```bash -pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124 -cd Path/to/InternNav/ -pip install -e .[habitat,internvla_n1] +Starting Agent Server... +Registering agents... +INFO: Started server process [18877] +INFO: Waiting for application startup. +INFO: Application startup complete. +INFO: Uvicorn running on http://localhost:8087 (Press CTRL+C to quit) ``` - -## Verification - -### Data/Checkpoints Preparation -To get started, we need to prepare the data and checkpoints. -1. **InternVLA-N1 pretrained Checkpoints** -- Download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory. -2. **DepthAnything v2 Checkpoints** -- Download the depthanything v2 pretrained [checkpoint](https://huggingface.co/Ashoka74/Placement/resolve/main/depth_anything_v2_vits.pth). Move the checkpoint to the `checkpoints` directory. -3. **InternData-N1 Dataset Episodes** -- Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1). Extract them into the `data/vln_ce/` and `data/vln_pe/` directory. -4. **Scene-N1** -- Download the [SceneData-N1](https://huggingface.co/datasets/InternRobotics/Scene-N1) for `mp3d_ce`. Extract them into the `data/scene_data/` directory. -5. **Embodiments** -- Download the [Embodiments](https://huggingface.co/datasets/InternRobotics/Embodiments) for the `Embodiments/` - -6. **Baseline models** +To verify the installation of **internvla-n1**. Initialize the internvla-n1 agent by ```bash -# ddppo-models -$ mkdir -p checkpoints/ddppo-models -$ wget -P checkpoints/ddppo-models https://dl.fbaipublicfiles.com/habitat/data/baselines/v1/ddppo/ddppo-models/gibson-4plus-mp3d-train-val-test-resnet50.pth -# longclip-B -$ huggingface-cli download --include 'longclip-B.pt' --local-dir-use-symlinks False --resume-download Beichenzhang/LongCLIP-B --local-dir checkpoints/clip-long -# download r2r finetuned baseline checkpoints -$ git clone https://huggingface.co/InternRobotics/VLN-PE && mv VLN-PE/r2r checkpoints/ +from internnav.configs.agent import AgentCfg +from internnav.utils import AgentClient + +agent=AgentCfg( + server_host='localhost', + server_port=8087, + model_name='internvla_n1', + ckpt_path='', + model_settings={ + 'policy_name': "InternVLAN1_Policy", + 'state_encoder': None, + 'env_num': 1, + 'sim_num': 1, + 'model_path': "checkpoints/InternVLA-N1", + 'camera_intrinsic': [[585.0, 0.0, 320.0], [0.0, 585.0, 240.0], [0.0, 0.0, 1.0]], + 'width': 640, + 'height': 480, + 'hfov': 79, + 'resize_w': 384, + 'resize_h': 384, + 'max_new_tokens': 1024, + 'num_frames': 32, + 'num_history': 8, + 'num_future_steps': 4, + 'device': 'cuda:0', + 'predict_step_nums': 32, + 'continuous_traj': True, + } +) +agent = AgentClient(cfg.agent) +``` +The output should be something like: +``` +Loading navdp model: NavDP_Policy_DPT_CriticSum_DAT +Pretrained: None +No pretrained weights provided, initializing randomly. +Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00, 1.06it/s] +INFO: ::1:38332 - "POST /agent/init HTTP/1.1" 201 Created ``` -The final folder structure should look like this: - -```bash -InternNav/ -├── data/ -│ ├── scene_data/ -│ │ ├── mp3d_ce/ -│ │ │ └── mp3d/ -│ │ │ ├── 17DRP5sb8fy/ -│ │ │ ├── 1LXtFkjw3qL/ -│ │ │ └── ... -│ │ └── mp3d_pe/ -│ │ ├──17DRP5sb8fy/ -│ │ ├── 1LXtFkjw3qL/ -│ │ └── ... -│ ├── vln_ce/ -│ │ ├── raw_data/ -│ │ │ ├── r2r -│ │ │ │ ├── train -│ │ │ │ ├── val_seen -│ │ │ │ │ └── val_seen.json.gz -│ │ │ │ └── val_unseen -│ │ │ │ └── val_unseen.json.gz -│ │ └── traj_data/ -│ └── vln_pe/ -│ ├── raw_data/ # JSON files defining tasks, navigation goals, and dataset splits -│ │ └── r2r/ -│ │ ├── train/ -│ │ ├── val_seen/ -│ │ │ └── val_seen.json.gz -│ │ └── val_unseen/ -│ └── traj_data/ # training sample data for two types of scenes -│ ├── interiornav/ -│ │ └── kujiale_xxxx.tar.gz -│ └── r2r/ -│ └── trajectory_0/ -│ ├── data/ -│ ├── meta/ -│ └── videos/ -├── checkpoints/ -│ ├── InternVLA-N1/ -│ │ ├── model-00001-of-00004.safetensors -│ │ ├── config.json -│ │ └── ... -│ ├── InternVLA-N1-S2 -│ │ ├── model-00001-of-00004.safetensors -│ │ ├── config.json -│ │ └── ... -│ ├── depth_anything_v2_vits.pth -│ ├── r2r -│ │ ├── fine_tuned -│ │ └── zero_shot -├── internnav/ -│ └── ... +Load a capture frame from RealSense DS455 camera: ``` -### Gradio demo +from scripts.iros_challenge.onsite_competition.sdk.save_obs import load_obs_from_meta +rs_meta_path = '/root/InternNav/scripts/iros_challenge/onsite_competition/captures/rs_meta.json' -Currently the gradio demo is only available in **habitat** environment. Replace the 'model_path' variable in 'vln_gradio_backend.py' with the path of InternVLA-N1 checkpoint. -```bash -conda activate -python3 scripts/demo/vln_gradio_backend.py +fake_obs_640 = load_obs_from_meta(rs_meta_path) +fake_obs_640['instruction'] = 'go to the red car' +print(fake_obs_640['rgb'].shape, fake_obs_640['depth'].shape) ``` -Find the IP address of the node allocated by Slurm. Then change the BACKEND_URL in the gradio client (navigation_ui.py) to the server's IP address. Start the gradio. -```bash -python scripts/demo/navigation_ui.py +The output should be: +``` +(480, 640, 3) (480, 640) ``` -Note that it's better to run the Gradio client on a machine with a graphical user interface (GUI) but ensure there is proper network connectivity between the client and the server. Download the gradio scene assets from [huggingface](https://huggingface.co/datasets/InternRobotics/Scene-N1) and extract them into the `scene_assets` directory of the client. Then open a browser and enter the Gradio address (such as http://0.0.0.0:5700). We can see the interface as shown below. -![img.png](../../../_static/image/gradio_interface.jpg) - -Click the 'Start Navigation Simulation' button to send a VLN request to the backend. The backend will submit a task to ray server and simulate the VLN task with InternVLA-N1 models. Wait about 2 minutes, the VLN task will be finished and return a result video. We can see the result video in the gradio like this. -![img.png](../../../_static/image/gradio_result.jpg) - - -🎉 Congratulations! You have successfully installed InternNav. - - -## InternData-N1 Dataset Preparation +Test model inference ``` -Due to network throttling restrictions on HuggingFace, InternData-N1 has not been fully uploaded yet. Please wait patiently for several days. +action = agent.step([obs])[0]['action'][0] +print(f"Action taken: {action}") ``` -We also prepare high-quality data for **training** system1/system2 and **evaluation** on isaac sim environment. To set up the dataset, please follow the steps below: -1. Download Datasets -- Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1) for: - - `vln_pe/` - - `vln_ce/` - - `vln_n1/` +The output should be: +``` +============ output 1 ←←←← +s2 infer finish!! +get s2 output lock +=============== [2, 2, 2, 2] ================= +Output discretized traj: [2] 0 +INFO: ::1:46114 - "POST /agent/internvla_n1/step HTTP/1.1" 200 OK +Action taken: 2 +``` -- Download the [SceneData-N1](https://huggingface.co/datasets/InternRobotics/Scene-N1) for the `scene_data/`. +Congrats, now you have made one prediction. In this task, the agent convert the trajectory output to discrete action. Apply this action "turn left" (2) to real robot controller by using `internnav.env.real_world_env`. -2. Directory Structure +Checkout the real deploy demo video: -After downloading, organize the datasets into the following structure: + -```bash -data/ -├── scene_data/ -│ ├── mp3d_pe/ -│ │ ├── 17DRP5sb8fy/ -│ │ ├── 1LXtFkjw3qL/ -│ │ └── ... -│ ├── mp3d_ce/ -│ │ ├── mp3d/ -│ │ │ ├── 17DRP5sb8fy/ -│ │ │ ├── 1LXtFkjw3qL/ -│ │ │ └── ... -│ └── mp3d_n1/ -├── vln_pe/ -│ ├── raw_data/ -│ │ ├── train/ -│ │ ├── val_seen/ -│ │ │ └── val_seen.json.gz -│ │ └── val_unseen/ -│ │ └── val_unseen.json.gz -├── └── traj_data/ -│ └── mp3d/ -│ └── 17DRP5sb8fy/ -│ └── 1LXtFkjw3qL/ -│ └── ... -├── vln_ce/ -│ ├── raw_data/ -│ │ ├── r2r -│ │ │ ├── train -│ │ │ ├── val_seen -│ │ │ │ └── val_seen.json.gz -│ │ │ └── val_unseen -│ │ │ └── val_unseen.json.gz -│ └── traj_data/ -└── vln_n1/ - └── traj_data/ -``` +for more details, check out the [**Internvla_n1 Inference-only Demo**](https://githubtocolab.com/InternRobotics/InternNav/blob/main/scripts/notebooks/inference_only_demo.ipynb). \ No newline at end of file diff --git a/source/en/user_guide/internnav/quick_start/interndata.md b/source/en/user_guide/internnav/quick_start/interndata.md new file mode 100644 index 0000000..3a64d88 --- /dev/null +++ b/source/en/user_guide/internnav/quick_start/interndata.md @@ -0,0 +1,76 @@ +# Dataset Preparation + +We prepared high-quality data for **training** system1/system2 and **evaluation** on isaac sim and habitat sim environment. These trajectories were collected using the **training episodes** from **R2R** and **RxR** under the Matterport3D environment. + + +## Data and Checkpoints Checklist +To get started with the training and evaluation, we need to prepare the data and checkpoints properly. +1. **InternVLA-N1 pretrained Checkpoints** +- Download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory. +2. **DepthAnything v2 Checkpoints** +- Download the depthanything v2 pretrained [checkpoint](https://huggingface.co/Ashoka74/Placement/resolve/main/depth_anything_v2_vits.pth). Move the checkpoint to the `checkpoints` directory. +3. **InternData-N1 Dataset Episodes** +- Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1). You only need to download the dataset relevant to your chosen task. Download `vln_ce` for VLNCE evaluation in habitat, `vln_pe` for VLNPE evaluation in internutopia. +4. **Scene-N1** +- Download the [SceneData-N1](https://huggingface.co/datasets/InternRobotics/Scene-N1) for `mp3d_ce` or `mp3d_pe`. Extract them into the `data/scene_data/` directory. +5. **Embodiments** +- Download the [Embodiments](https://huggingface.co/datasets/InternRobotics/Embodiments) and place it under the `Embodiments/`. These embodiment assets are used by the Isaac Sim environment. + +The final folder structure should look like this: + +```bash +InternNav/ +├── checkpoints/ +│ ├── InternVLA-N1/ +│ │ ├── model-00001-of-00004.safetensors +│ │ ├── config.json +│ │ └── ... +│ ├── InternVLA-N1-S2 +│ │ ├── model-00001-of-00004.safetensors +│ │ ├── config.json +│ │ └── ... +│ ├── depth_anything_v2_vits.pth +│ └── r2r +│ ├── fine_tuned +│ └── zero_shot +├── data/ +| ├── Embodiments/ +│ ├── scene_data/ +│ │ ├── mp3d_ce/ +│ │ │ └── mp3d/ +│ │ │ ├── 17DRP5sb8fy/ +│ │ │ ├── 1LXtFkjw3qL/ +│ │ │ └── ... +│ │ └── mp3d_pe/ +│ │ ├──17DRP5sb8fy/ +│ │ ├── 1LXtFkjw3qL/ +│ │ └── ... +| ├── vln_n1/ +| | └── traj_data/ +│ ├── vln_ce/ +│ │ ├── raw_data/ +│ │ │ ├── r2r +│ │ │ │ ├── train +│ │ │ │ ├── val_seen +│ │ │ │ │ └── val_seen.json.gz +│ │ │ │ └── val_unseen +│ │ │ │ └── val_unseen.json.gz +│ │ └── traj_data/ +│ └── vln_pe/ +│ ├── raw_data/ # JSON files defining tasks, navigation goals, and dataset splits +│ │ └── r2r/ +│ │ ├── train/ +│ │ ├── val_seen/ +│ │ │ └── val_seen.json.gz +│ │ └── val_unseen/ +│ └── traj_data/ # training sample data for two types of scenes +│ ├── interiornav/ +│ │ └── kujiale_xxxx.tar.gz +│ └── r2r/ +│ └── trajectory_0/ +│ ├── data/ +│ ├── meta/ +│ └── videos/ +├── internnav/ +│ └── ... +``` \ No newline at end of file diff --git a/source/en/user_guide/internnav/quick_start/simulation.md b/source/en/user_guide/internnav/quick_start/simulation.md new file mode 100644 index 0000000..f095f64 --- /dev/null +++ b/source/en/user_guide/internnav/quick_start/simulation.md @@ -0,0 +1,115 @@ +# Simulation Environments Setup + +Our toolchain provides two Python environment solutions to accommodate different usage scenarios with the InternNav-N1 series model: + +- For quick trials and evaluations of the InternNav-N1 model, we recommend using the [Habitat environment](#habitat-environment). This option offer allowing you to quickly test and eval the InternVLA-N1 models with minimal configuration. +- If you require high-fidelity rendering, training capabilities, and physical property evaluations within the environment, we suggest using the [Isaac Sim](#isaac-sim-environment) environment. This solution provides enhanced graphical rendering and more accurate physics simulations for comprehensive testing. + +Choose the environment that best fits your specific needs to optimize your experience with the InternNav-N1 model. Note that both environments support the training of the system1 model NavDP. + +## Install with Isaac Sim Environment + +#### Install from Docker Image +To help you get started quickly, we've prepared a **Docker image** pre-configured with Isaac Sim 4.5, InternUtopia and models. A detailed guideline can be found at [challenge](https://github.com/InternRobotics/InternNav/tree/main/scripts/iros_challenge#-environment-setup) page. + +You can pull the image (~17GB) and run evaluations in the container using the following command: +```bash +docker pull crpi-mdum1jboc8276vb5.cn-beijing.personal.cr.aliyuncs.com/iros-challenge/internnav:v1.2 +``` + +Run the container by: +```bash +xhost +local:root # Allow the container to access the display + +cd PATH/TO/INTERNNAV/ # where the latest code pulled + +docker run --name internnav -it --rm --gpus all --network host \ + -e "ACCEPT_EULA=Y" \ + -e "PRIVACY_CONSENT=Y" \ + -e "DISPLAY=${DISPLAY}" \ + --entrypoint /bin/bash \ + -w /root/InternNav \ + -v /tmp/.X11-unix/:/tmp/.X11-unix \ + -v ${PWD}:/root/InternNav \ + -v ${HOME}/docker/isaac-sim/cache/kit:/isaac-sim/kit/cache:rw \ + -v ${HOME}/docker/isaac-sim/cache/ov:/root/.cache/ov:rw \ + -v ${HOME}/docker/isaac-sim/cache/pip:/root/.cache/pip:rw \ + -v ${HOME}/docker/isaac-sim/cache/glcache:/root/.cache/nvidia/GLCache:rw \ + -v ${HOME}/docker/isaac-sim/cache/computecache:/root/.nv/ComputeCache:rw \ + -v ${HOME}/docker/isaac-sim/logs:/root/.nvidia-omniverse/logs:rw \ + -v ${HOME}/docker/isaac-sim/data:/root/.local/share/ov/data:rw \ + -v ${HOME}/docker/isaac-sim/documents:/root/Documents:rw \ + -v ${PWD}/data/scene_data/mp3d_pe:/isaac-sim/Matterport3D/data/v1/scans:rw \ + crpi-mdum1jboc8276vb5.cn-beijing.personal.cr.aliyuncs.com/iros-challenge/internnav:v1.2 +``` +After the container started, you can quickly start the env and install the InternNav: +```bash +conda activate internutopia +pip install -e .[isaac,model] +``` + +#### Conda Installation from Scratch +**Prerequisite** +- Ubuntu 20.04, 22.04 +- Python 3.10.16 (3.10.* should be ok) +- NVIDIA Omniverse Isaac Sim 4.5.0 +- NVIDIA GPU (RTX 2070 or higher) +- NVIDIA GPU Driver (recommended version 535.216.01+) +- PyTorch 2.5.1, 2.6.0 (recommended) +- CUDA 11.8, 12.4 (recommended) + +Before proceeding with the installation, ensure that you have [Isaac Sim 4.5.0](https://docs.isaacsim.omniverse.nvidia.com/4.5.0/installation/install_workstation.html) and [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) installed. + + +```bash +conda create -n python=3.10 libxcb=1.14 + +# Install InternUtopia through pip.(2.1.1 and 2.2.0 recommended) +conda activate +pip install internutopia + +# Configure the conda environment. +python -m internutopia.setup_conda_pypi +conda deactivate && conda activate +``` +For InternUtopia installation, you can find more detailed [docs](https://internrobotics.github.io/user_guide/internutopia/get_started/installation.html) in [InternUtopia](https://github.com/InternRobotics/InternUtopia?tab=readme-ov-file). +```bash +# Install PyTorch based on your CUDA version +pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118 + +# Install other deps +cd Path/to/InternNav/ +pip install -e .[isaac] +``` + +## Install with Habitat Environment +If you need to train or evaluate models on [Habitat](#optional-habitat-environment) without physics simulation, we recommend the following setup and easier environment installation. + +#### Prerequisite +- Python 3.9 +- Pytorch 2.6.0 +- CUDA 12.4 +- GPU: NVIDIA A100 or higher (optional for VLA training) + +```bash +conda create -n python=3.9 +conda activate +``` +Install habitat sim and habitat lab: +```bash +conda install habitat-sim==0.2.4 withbullet headless -c conda-forge -c aihabitat +git clone --branch v0.2.4 https://github.com/facebookresearch/habitat-lab.git +cd habitat-lab +pip install -e habitat-lab # install habitat_lab +pip install -e habitat-baselines # install habitat_baselines +``` +Install pytorch and other requirements: +```bash +pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124 +cd Path/to/InternNav/ +pip install -e .[habitat,internvla_n1] +``` diff --git a/source/en/user_guide/internnav/quick_start/train_eval.md b/source/en/user_guide/internnav/quick_start/train_eval.md index 77e1fa1..c83440f 100644 --- a/source/en/user_guide/internnav/quick_start/train_eval.md +++ b/source/en/user_guide/internnav/quick_start/train_eval.md @@ -1,46 +1,38 @@ # Training and Evaluation - -This document presents how to train and evaluate models for different systems with InternNav. +This document presents how to train and evaluate models for different systems with InternNav. ## Whole-system +### Training +The training pipeline is currently under preparation and will be open-sourced soon. + ### Evaluation Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory. Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1). -#### Evaluation on isaac sim +#### Evaluation on Isaac Sim The main architecture of the whole-system evaluation adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then submits tasks to the Ray distributed framework based on the corresponding cfg file, enabling the entire evaluation process to run. -First start the ray server: +First, change the 'model_path' in the cfg file to the path of the InternVLA-N1 weights. Start the evaluation server: ```bash -ray disable-usage-stats -ray stop -ray start --head +# from one process +conda activate +python scripts/eval/start_server.py --config scripts/eval/configs/h1_internvla_n1_cfg.py ``` -Then change the 'model_path' in the cfg file to the path of the InternVLA-N1 weights. Start the evaluation server: -```bash -python -m internnav.agent.utils.server --config scripts/eval/configs/h1_internvla_n1_cfg.py -``` - -Finally, start the client: +Then, start the client to run evaluation: ```bash +# from another process +conda activate MESA_GL_VERSION_OVERRIDE=4.6 python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_cfg.py ``` The evaluation results will be saved in the `eval_results.log` file in the output_dir of the config file. The whole evaluation process takes about 10 hours at RTX-4090 graphics platform. +The simulation can be visualized by set `vis_output=True` in eval_cfg. -Also, the Baselines can directly run: -```bash -# seq2seq model -./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_seq2seq_cfg.py -# cma model -./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_cma_cfg.py -# rdp model -./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_rdp_cfg.py -``` +My GIF -#### Evaluation on habitat +#### Evaluation on Habitat Sim Evaluate on Single-GPU: ```bash @@ -50,7 +42,7 @@ python scripts/eval/eval_habitat.py --model_path checkpoints/InternVLA-N1 --cont For multi-gpu inference, currently we only support inference on SLURM. ```bash -./scripts/eval/eval_dual_system.sh +./scripts/eval/bash/eval_dual_system.sh ``` @@ -111,62 +103,6 @@ python eval_pointgoal_wheeled.py --port {PORT} --scene_dir {SCENE_DIR} ## System2 -### Data Preparation - -Please download the following VLN-CE datasets and insert them into the `data` folder following the same structure. - -1. **VLN-CE Episodes** - - Download the VLN-CE episodes: - - [r2r](https://drive.google.com/file/d/18DCrNcpxESnps1IbXVjXSbGLDzcSOqzD/view) (rename R2R_VLNCE_v1/ -> r2r/) - - [rxr](https://drive.google.com/file/d/145xzLjxBaNTbVgBfQ8e9EsBAV8W-SM0t/view) (rename RxR_VLNCE_v0/ -> rxr/) - - [envdrop](https://drive.google.com/file/d/1fo8F4NKgZDH-bPSdVU3cONAkt5EW-tyr/view) (rename R2R_VLNCE_v1-3_preprocessed/envdrop/ -> envdrop/) - - Extract them into the `data/datasets/` directory. - -2. **InternData-N1** - - We provide pre-collected observation-action trajectory data for training. These trajectories were collected using the **training episodes** from **R2R** and **RxR** under the Matterport3D environment. Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1) and [SceneData-N1](https://huggingface.co/datasets/InternRobotics/Scene-N1). -The final folder structure should look like this: -```bash -data/ -├── scene_data/ -│ ├── mp3d_pe/ -│ │ ├── 17DRP5sb8fy/ -│ │ ├── 1LXtFkjw3qL/ -│ │ └── ... -│ ├── mp3d_ce/ -│ │ ├── mp3d/ -│ │ │ ├── 17DRP5sb8fy/ -│ │ │ ├── 1LXtFkjw3qL/ -│ │ │ └── ... -│ └── mp3d_n1/ -├── vln_pe/ -│ ├── raw_data/ -│ │ ├── train/ -│ │ ├── val_seen/ -│ │ │ └── val_seen.json.gz -│ │ └── val_unseen/ -│ │ └── val_unseen.json.gz -├── └── traj_data/ -│ └── mp3d/ -│ └── trajectory_0/ -│ ├── data/ -│ ├── meta/ -│ └── videos/ -├── vln_ce/ -│ ├── raw_data/ -│ │ ├── r2r -│ │ │ ├── train -│ │ │ ├── val_seen -│ │ │ │ └── val_seen.json.gz -│ │ │ └── val_unseen -│ │ │ └── val_unseen.json.gz -│ └── traj_data/ -└── vln_n1/ - └── traj_data/ -``` - ### Training Currently, we only support training of small VLN models (CMA, RDP, Seq2Seq) in this repo. For the training of LLM-based VLN (Navid, StreamVLN, etc), please refer to [StreamVLN](https://github.com/OpenRobotLab/StreamVLN) for training details. @@ -195,7 +131,7 @@ python scripts/eval/eval_habitat.py --model_path checkpoints/InternVLA-N1-S2 --m For multi-gpu inference, currently we only support inference on SLURM. ```bash -./scripts/eval/eval_system2.sh +./scripts/eval/bash/eval_system2.sh ``` #### Baseline Models @@ -211,26 +147,17 @@ $ huggingface-cli download --include 'longclip-B.pt' --local-dir-use-symlinks Fa # download r2r finetuned baseline checkpoints $ git clone https://huggingface.co/InternRobotics/VLN-PE && mv VLN-PE/r2r checkpoints/ ``` -Start the Ray server: -```bash -ray disable-usage-stats -ray stop -ray start --head -``` - -Start the evaluation server: -```bash -python -m internnav.agent.utils.server --config scripts/eval/configs/h1_xxx_cfg.py -``` Start Evaluation: ```bash +# Please modify the first line of the bash file to your own conda path # seq2seq model -./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_seq2seq_cfg.py +./scripts/eval/bash/start_eval.sh --config scripts/eval/configs/h1_seq2seq_cfg.py # cma model -./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_cma_cfg.py +./scripts/eval/bash/start_eval.sh --config scripts/eval/configs/h1_cma_cfg.py # rdp model -./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_rdp_cfg.py +./scripts/eval/bash/start_eval.sh --config scripts/eval/configs/h1_rdp_cfg.py ``` + The evaluation results will be saved in the `eval_results.log` file in the `output_dir` of the config file. diff --git a/source/en/user_guide/internnav/quick_start/vln_evaluation.md b/source/en/user_guide/internnav/quick_start/vln_evaluation.md deleted file mode 100644 index 8a1c951..0000000 --- a/source/en/user_guide/internnav/quick_start/vln_evaluation.md +++ /dev/null @@ -1,93 +0,0 @@ -# VLN Evaluation - -## Overview of the Evaluation Process -The main architecture of the evaluation code adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then submits tasks to the Ray distributed framework based on the corresponding cfg file, enabling the entire evaluation process to run. - - -## Supported baselines -- InternVLA-N1 -- CMA (Cross-Modal Attention) -- RDP (Recurrent Diffusion Policy) -- Navid (RSS2023) -- Seq2Seq Policy - -## Supported Datasets -- R2R-CE -- Matterport3D - -## Evaluation Metrics -The project provides comprehensive evaluation metrics: - -- **Success Rate (SR)**: Proportion of episodes where the agent reaches the goal location within 3m -- **SPL**: Success weighted by Path Length -- **Trajectory Length (TL)**: Total length of the trajectory (m) -- **Navigation Error (NE)**: Euclidean distance between the agent's final position and the goal (m) -- **OS Oracle Success Rate (OSR)**: Whether any point along the predicted trajectory reaches the goal within 3m -- **Fall Rate (FR)**: Frequency of the agent falling during navigation -- **Stuck Rate (StR)**: Frequency of the agent becoming stuck during navigation - - -# Quick Start for Evaluation - -## 1. Start the ray server -```bash -ray disable-usage-stats -ray stop -ray start --head -``` - -## 2. Custom your evaluation config -```bash -eval_cfg = EvalCfg( - agent=AgentCfg( - server_port=8023, - model_name='internvla_n1', - ckpt_path='', - model_settings={ - }, - ), - env=EnvCfg( - env_type='vln_multi', - env_settings={ - 'use_fabric': True, # improve simulation efficiency - 'headless': True, # display option: set to False will open isaac-sim interactive window - }, - ), - task=TaskCfg( - task_name='test', - task_settings={ - 'env_num': 1, # number of env in one isaac sim - 'use_distributed': False, # Ray distributed framework - 'proc_num': 1, - }, - scene=SceneCfg( - scene_type='mp3d', - mp3d_data_dir='/path/to/mp3d', - ), - robot_name='h1', - robot_flash=True, - robot_usd_path='/robots/h1/h1_vln_multi_camera.usd', - camera_resolution=[640, 480] # (W,H) - ), - dataset=EvalDatasetCfg( - dataset_type="mp3d", - dataset_settings={ - 'base_data_dir': '/path/to/R2R_VLNCE_v1-3', - 'split_data_types': ['val_unseen'], - 'filter_stairs': True, - }, - eval_settings={ - 'save_to_json': False, # evaluation result saved in separate json file - 'vis_output': True # save simulation progress to video under logs/ - } - ), -``` -## 3. Launch the server -```bash -INTERNUTOPIA_ASSETS_PATH=/path/to/InternUTopiaAssets MESA_GL_VERSION_OVERRIDE=4.6 python scripts/eval/eval.py --config path/to/cfg.py -``` - -## 4. Launch the client -```bash -python -m internnav.agent.utils.server --config path/to/cfg.py -``` diff --git a/source/en/user_guide/internnav/tutorials/agent.md b/source/en/user_guide/internnav/tutorials/agent.md new file mode 100644 index 0000000..73e4f73 --- /dev/null +++ b/source/en/user_guide/internnav/tutorials/agent.md @@ -0,0 +1,126 @@ +# Customizing Models and Agents in InternNav + +This tutorial provides a detailed guide for registering new agent and model within the InternNav framework + +--- + +## Development Overview +The main architecture of the evaluation code adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then make model to predict and response to the client. + +The InternNav project adopts a modular design, allowing developers to easily add new navigation algorithms. +The main components include: + +- **Model**: Implements the specific neural network architecture and inference logic + +- **Agent**: Serves as a wrapper for the Model, handling environment interaction and data preprocessing + +- **Config**: Defines configuration parameters for the model and training + +## Supported Models +- InternVLA-N1 +- CMA (Cross-Modal Attention) +- RDP (Recurrent Diffusion Policy) +- Navid (RSS2023) +- Seq2Seq Policy + +## Custom Model +A Model is the concrete implementation of your algorithm. Implement model under `baselines/models`. A model ideally would inherit from the base model and implement the following key methods: + +- `forward(train_batch) -> dict(output, loss)` +- `inference(obs_batch, state) -> output_for_agent` + +## Create a Custom Config Class + +In the model file, define a `Config` class that inherits from `PretrainedConfig`. +A reference implementation is `CMAModelConfig` in [`cma_model.py`](https://github.com/InternRobotics/InternNav/blob/main/internnav/model/cma/cma_policy.py). + +## Registration and Integration + +In [`internnav/model/__init__.py`](https://github.com/InternRobotics/InternNav/blob/main/internnav/model/__init__.py): +- Add the new model to `get_policy`. +- Add the new model's configuration to `get_config`. + +## Create a Custom Agent + +The Agent handles interaction with the environment, data preprocessing/postprocessing, and calls the Model for inference. +A custom Agent usually inherits from [`Agent`](https://github.com/InternRobotics/InternNav/blob/main/internnav/agent/base.py) and implements the following key methods: + +- `reset()`: Resets the Agent's internal state (e.g., RNN states, action history). Called at the start of each episode. +- `inference(obs)`: Receives environment observations `obs`, performs preprocessing (e.g., tokenizing instructions, padding), calls the model for inference, and returns an action. +- `step(obs)`: The external interface, usually calls `inference`, and can include logging or timing. + +Example: [`CMAAgent`](https://github.com/InternRobotics/InternNav/blob/main/internnav/agent/cma_agent.py) + +For each step, the agent should expect an observation from environment. + +For the vln benchmark under internutopia: + +``` +action = self.agent.step(obs) +``` +**obs** has format: +``` +obs = [{ + 'globalgps': [X, Y, Z] # robot location + 'globalrotation': [X, Y, Z, W] # robot orientation in quaternion + 'rgb': np.array(256, 256, 3) # rgb camera image + 'depth': np.array(256, 256, 1) # depth image + 'instruction': str # language instruction for the navigation task +}] +``` +**action** has format: +``` +action = List[int] # action for each environments +# 0: stop +# 1: move forward +# 2: turn left +# 3: turn right +``` +## Registration +The agent should be registered to internnav.agent, so it can be used by the name through configs. +``` +from internnav.agent.base import Agent +from internnav.configs.agent import AgentCfg + +@Agent.register('cma') +class NewAgent(Agent): + def __init__(self, agent_config: AgentCfg): + ... +``` +Make sure you also import it inside `internnav/agent/__init__.py` +``` +# make the register decorator taking effect +from internnav.agent.internvla_n1_agent import InternVLAN1Agent +``` + +## Agent and Model Initialization + +Refer to existing **evaluation** config files for customization: +``` +agent_cfg=AgentCfg( + server_host='localhost', + server_port=8023, + model_name='internvla_n1', + ckpt_path='', + model_settings={ + policy_name='InternVLAN1_Policy', + state_encoder=None, + }, +) +``` + +## Typical Usage Example +``` +from internnav.configs.agent import AgentCfg + +cfg = AgentCfg(server_host="127.0.0.1", server_port=8087) +client = AgentClient(cfg) + +# step once +obs = [{"rgb": ..., "depth": ..., "instruction": "go to kitchen"}] +action = client.step(obs) +print("Predicted action:", action) + +# reset agent +client.reset() +``` \ No newline at end of file diff --git a/source/en/user_guide/internnav/tutorials/core.md b/source/en/user_guide/internnav/tutorials/core.md new file mode 100644 index 0000000..926f9d0 --- /dev/null +++ b/source/en/user_guide/internnav/tutorials/core.md @@ -0,0 +1,16 @@ +# Core Concepts +## Overview + + + +The main architecture of the evaluation code adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then submits tasks to the Ray distributed framework based on the corresponding cfg file, enabling the entire evaluation process to run. +## Main Process (WIP) + +![img.png](../../../_static/image/internnav_process.png) + +**Learn the Modules** +1. [Dataset](./dataset.md) +2. [Model](./model.md) +3. [Training](./training.md) +4. [Agent](./agent.md) +5. [Env](./env.md) diff --git a/source/en/user_guide/internnav/tutorials/env.md b/source/en/user_guide/internnav/tutorials/env.md new file mode 100644 index 0000000..15479d6 --- /dev/null +++ b/source/en/user_guide/internnav/tutorials/env.md @@ -0,0 +1,50 @@ +# Customizing Environments and Tasks in InternNav + +This tutorial provided a step-by-step guide to define a new environment and a new navigation task within the InternNav framework. + +--- + +## Overview +InternNav separates **navigation logic / policy** from **where the agent actually lives** (simulator vs real robot). The key ideas are: + +- `Env`: A unified interface. All environments must behave like an `Env`. + +- `Task`: A high-level navigation objective exposed to the agent, like "go to the kitchen sink" or "follow this instruction". + +- `Agent`: Agent consumes observations from `Env`, predicts an action, and sends that action back to `Env`. + +Because of this separation: + +- We can run the same agent in simulation (Isaac / InternUtopia) or on a real robot, as long as both environments implement the same API. + +- We can benchmark different tasks (VLN, PointGoalNav, etc.) in different worlds without rewriting the agent. + +InternNav already ships with two major environment backends: + +- **InternUtopiaEnv**: +Simulated environment built on top of InternUtopia / Isaac Sim. This supports complex indoor scenes, object semantics, RGB-D sensing, and scripted evaluation loops. +- **HabitatEnv** (WIP): Simulated environment built on top of Habitat Sim. + +- **RealWorldEnv**: +Wrapper around an actual robot platform and its sensors (e.g. RGB camera, depth, odometry). This lets you deploy the same agent logic in the physical world. + +Both of these are children of the same base [`Env`](https://github.com/InternRobotics/InternNav/blob/main/internnav/env/base.py) class. + +## Evaluation Task (WIP) +For the vlnpe benchmark, we build the task based on internutopia. Here is a diagram. + +![img.png](../../../_static/image/agent_definition.png) + + +## Evaluation Metrics (WIP) +For the VLN-PE benchmark in internutopia, InternNav provides comprehensive evaluation metrics: +- **Success Rate (SR)**: The proportion of episodes in which the agent successfully reaches the goal location within a 3-meter radius. +- **Success Rate weighted by Path Length (SPL)**: Measures both efficiency and success. It is defined as the ratio of the shortest-path distance to the actual trajectory length, weighted by whether the agent successfully reaches the goal. +A higher SPL indicates that the agent not only succeeds but does so efficiently, without taking unnecessarily long routes. +- **Trajectory Length (TL)**: The total distance traveled by the agent during an episode, measured in meters. +- **Navigation Error (NE)**: The Euclidean distance (in meters) between the agent’s final position and the goal location at the end of an episode. +- **Oracle Success Rate (OSR)**: The proportion of episodes in which any point along the predicted trajectory comes within 3 meters of the goal—representing the agent’s potential success if it were to stop optimally. +- **Fall Rate (FR)**: The frequency at which the agent falls or loses balance during navigation. +- **Stuck Rate (StR)**: The frequency at which the agent becomes immobile or trapped (e.g., blocked by obstacles or unable to proceed). + +The implementation is under `internnav/env/utils/internutopia_extensions`, we highly suggested follow the guide of [InternUtopia](../../internutopia). diff --git a/source/en/user_guide/internnav/tutorials/index.md b/source/en/user_guide/internnav/tutorials/index.md index d92b0b1..717e705 100644 --- a/source/en/user_guide/internnav/tutorials/index.md +++ b/source/en/user_guide/internnav/tutorials/index.md @@ -12,9 +12,10 @@ myst: :caption: Tutorials :maxdepth: 2 +core dataset -format_specification model training -evaluation +agent +env ``` diff --git a/source/en/user_guide/internnav/tutorials/training.md b/source/en/user_guide/internnav/tutorials/training.md index 8500445..a3141fc 100644 --- a/source/en/user_guide/internnav/tutorials/training.md +++ b/source/en/user_guide/internnav/tutorials/training.md @@ -117,3 +117,45 @@ For customizing the model structure or dataset format, see [model.md](./model.md ## System 2: InternVLA-N1-S2 Currently we don't support the training of InternVLA-N1-S2 in this repository. + +## Baselines +### Create a Trainer + +The Trainer manages the training loop, including data loading, forward pass, loss calculation, and backpropagation. +A custom trainer usually inherits from the [`Base Trainer`](https://github.com/InternRobotics/InternNav/blob/main/internnav/trainer/base.py) and implements: + +- `train_epoch()`: Runs one training epoch (batch iteration, forward pass, loss calculation, parameter update). +- `eval_epoch()`: Evaluates the model on the validation set and records metrics. +- `save_checkpoint()`: Saves model weights, optimizer state, and training progress. +- `load_checkpoint()`: Loads pretrained models or resumes training. + +Example: [`CMATrainer`](https://github.com/InternRobotics/InternNav/blob/main/internnav/trainer/cma_trainer.py) shows how to handle sequence data, compute action loss, and implement imitation learning. + +### Training Data + +The training data is under `data/vln_pe/traj_data`. Our dataset provides trajectory data collected from the H1 robot as it navigates through the task environment. +Each observation in the trajectory is paired with its corresponding action. + +You may also incorporate external datasets to improve model generalization. + +### Set the Corresponding Configuration + +Refer to existing **training** configuration files for customization: + +- **CMA Model Config**: [`cma_exp_cfg`](https://github.com/InternRobotics/InternNav/blob/main/scripts/train/configs/cma.py) + +Configuration files should define: +- `ExpCfg` (experiment config) +- `EvalCfg` (evaluation config) +- `IlCfg` (imitation learning config) + +Ensure your configuration is imported and registered in [`__init__.py`](https://github.com/InternRobotics/InternNav/blob/main/scripts/train/configs/__init__.py). + +Key parameters include: +- `name`: Experiment name +- `model_name`: Must match the name used during model registration +- `batch_size`: Batch size +- `lr`: Learning rate +- `epochs`: Number of training epochs +- `dataset_*_root_dir`: Dataset paths +- `lmdb_features_dir`: Feature storage path