Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IRIS #30883

Open
wants to merge 82 commits into
base: main
Choose a base branch
from
Open

Add IRIS #30883

wants to merge 82 commits into from

Conversation

RUFFY-369
Copy link

@RUFFY-369 RUFFY-369 commented May 17, 2024

What does this PR do?

This PR adds Iris, a Reinforcement learning agent for Sample Efficient RL

Fixes #30882

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@amyeroberts @younesbelkada @NielsRogge @kashif

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Ready for review!!!

RUFFY-369 and others added 30 commits April 21, 2024 05:12
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RUFFY-369, thanks for opening this PR! This is a mammoth piece of work

I've just done a very high-level pass and more review rounds would be needed. At the moment, the structure of the modeling file is very far away from the standard library patterns. In particular, there's lots of logic for things which should be handled elsewhere e.g. agents, environments, downloading files.

I think the best and easiest way to make this model available is by adding the mode directly on the hub. I would refer to the decision transformer to see what classes should be added and how to add them into the library to make them transformers compatible.

Comment on lines 60 to 64
IRIS_PRETRAINED_MODEL_ARCHIVE_LIST = [
"ruffy369/iris-breakout",
# See all Iris models at https://huggingface.co/models?filter=iris
]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model archive lists have been deprecated

Suggested change
IRIS_PRETRAINED_MODEL_ARCHIVE_LIST = [
"ruffy369/iris-breakout",
# See all Iris models at https://huggingface.co/models?filter=iris
]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed and committed

tokens: torch.LongTensor


class Slicer(nn.Module):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All model specific submodules and layers should have the model prefix. The prefix should be camel-case

Suggested change
class Slicer(nn.Module):
class IrisSlicer(nn.Module):

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on it asap 👍

return (y, att)


class WorldModelEnv:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definition of the environment is outside the scope of the modeling file - this should just be the model

Copy link
Author

@RUFFY-369 RUFFY-369 May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will make the required changes, thank you for pointing out 💯 And also, this class is not a gym env but a simulation of env in the world model, use of the world model for training the actor critic component in imagination without actual env. The self.env was for the reset() function from the original code and it is not used in the modeling file. So, I will modify the code to the usage of just the IrisWorldModel without any outer dependencies of any type of env

self.register_buffer("mask", causal_mask if config.attention == "causal" else block_causal_mask)

def forward(self, x: torch.Tensor, kv_cache: Optional[KVCache] = None) -> torch.Tensor:
B, T, C = x.size()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No single letter var names - they should all be explicit e.g. batch_size

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do that just in a while 👍

self._size += x.size(2)


class KVCache:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache definition is outside the scope of the modeling file - the model should use the library's cache

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, will change that, thanks for mentioning 👍



def nonlinearity(x: torch.Tensor) -> torch.Tensor:
# swish
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the swish activation already defined in the library

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh!sure, thanks for mentioning 👍


class ScalingLayer(nn.Module):
def __init__(self) -> None:
super(ScalingLayer, self).__init__()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
super(ScalingLayer, self).__init__()
super().__init__()

"""A single linear layer which does a 1x1 conv"""

def __init__(self, chn_in: int, chn_out: int = 1, use_dropout: bool = False) -> None:
super(NetLinLayer, self).__init__()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
super(NetLinLayer, self).__init__()
super().__init__()

Comment on lines 649 to 659
layers = (
[
nn.Dropout(),
]
if (use_dropout)
else []
)
layers += [
nn.Conv2d(chn_in, chn_out, 1, stride=1, padding=0, bias=False),
]
self.model = nn.Sequential(*layers)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
layers = (
[
nn.Dropout(),
]
if (use_dropout)
else []
)
layers += [
nn.Conv2d(chn_in, chn_out, 1, stride=1, padding=0, bias=False),
]
self.model = nn.Sequential(*layers)
layers = [nn.Dropout()] if (use_dropout) else []
layers += [nn.Conv2d(chn_in, chn_out, 1, stride=1, padding=0, bias=False)]
self.model = nn.Sequential(*layers)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make style did it in correction

Comment on lines 885 to 903

@dataclass
class TransformerConfig:
tokens_per_block: int
max_blocks: int
attention: str

num_layers: int
num_heads: int
embed_dim: int

embed_pdrop: float
resid_pdrop: float
attn_pdrop: float

@property
def max_tokens(self):
return self.tokens_per_block * self.max_blocks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@dataclass
class TransformerConfig:
tokens_per_block: int
max_blocks: int
attention: str
num_layers: int
num_heads: int
embed_dim: int
embed_pdrop: float
resid_pdrop: float
attn_pdrop: float
@property
def max_tokens(self):
return self.tokens_per_block * self.max_blocks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed and comitted

RUFFY-369 and others added 24 commits May 24, 2024 16:21
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
@RUFFY-369
Copy link
Author

RUFFY-369 commented May 27, 2024

Hi @amyeroberts , I have done the suggested changes and some refactoring in the modeling file to enhance compatibility. Please review the updated files at your convenience. Thank you!

As this is a huge piece of work and and it represents only the third model that integrates Reinforcement Learning (RL) with transformers. If this model is successfully ported, it will significantly benefit the transformers + RL community. As this is SOTA in sample efficient RL for methods without lookahead search in the Atari 100k benchmark so,its successful integration with transformers will provide considerable value for fine-tuning it or training it from scratch on various tasks with Transformers Trainer. Moreover, achieving full compatibility and successful porting will serve as a blueprint for future RL models based on the transformers architecture to be added in the library even for the papers' authors.

Following the successful porting, I will create and hyperlink a Colab notebook with a detailed guide on using the model with transformers, including training it from scratch and will be more than happy to maintaining the model here. 👍

@SangbumChoi
Copy link
Contributor

How did you convert the model file into hugginface format? Usually, transformers require conversion script also.

https://github.com/eloialonso/iris_pretrained_models/tree/main/pretrained_models

Maybe upload few more models would be good for other people :)

@RUFFY-369
Copy link
Author

RUFFY-369 commented May 28, 2024

How did you convert the model file into hugginface format? Usually, transformers require conversion script also.

https://github.com/eloialonso/iris_pretrained_models/tree/main/pretrained_models

@SangbumChoi Thank you for your comment. Basically i converted the initial model weights file of iris-breakout with the conversion script to make it compatible with Hugging face separately. The conversion script was experimental and was also quite simple so had it in a separate folder. I will push that too in this branch as it worked fine.

Maybe upload few more models would be good for other people :)

I just pushed all the converted models to the hub. You can check them out here

@RUFFY-369
Copy link
Author

RUFFY-369 commented May 31, 2024

soft cc @amyeroberts , thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add IRIS
3 participants