[tinker] SkyRL tinkerification tracker

# Summary

Adding this parent issue tracking all the outstanding issues with the tinker API server in SkyRL

## SkyRL-Train Backend

- [ ] Support PPO loss with Tinker
         - Support critic model in `SkyRLTrainBackend`
- [ ] KL Penalty is not supported
         - Support ref model in `SkyRLTrainBackend`
- [ ] Support logprobs with the `sample()` API.  
          - Provide an example for using truncated importance sampling with the tinker API server.
          - _assigned to @tamoghnokandar_ 
- [ ] Better validation for configuration parameters to the `SkyRLTrainBackend` https://github.com/NovaSky-AI/SkyRL/issues/1279 
- [ ] Set LR outside of workers in skyrl-train backend: https://github.com/NovaSky-AI/SkyRL/issues/998 
- [ ] Add `sample` API to the new inference codepath: https://github.com/NovaSky-AI/SkyRL/issues/1286 
- [ ] Add `renderer` to the new inference codepath: https://github.com/NovaSky-AI/SkyRL/issues/1288 

## Jax Backend

- [ ] Improve the memory handling and reporting to make it easier to see where the memory goes
- [ ] Add multimodal support to APIs and add support for a VLM model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tinker] SkyRL tinkerification tracker #1380

Summary

SkyRL-Train Backend

Jax Backend

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[tinker] SkyRL tinkerification tracker #1380

Description

Summary

SkyRL-Train Backend

Jax Backend

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions