Skip to content

Commit

Permalink
v0.0.2 release
Browse files Browse the repository at this point in the history
  • Loading branch information
Sharad24 committed Sep 1, 2020
1 parent 1e568be commit dbdc190
Show file tree
Hide file tree
Showing 23 changed files with 47 additions and 34 deletions.
34 changes: 26 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,36 @@
<br>
<p>

[![pypi](https://img.shields.io/badge/pypi%20package-v0.0.1-blue)](https://pypi.org/project/genrl/)
[![pypi](https://img.shields.io/badge/pypi%20package-v0.0.2-blue)](https://pypi.org/project/genrl/)
[![GitHub license](https://img.shields.io/github/license/SforAiDl/genrl)](https://github.com/SforAiDl/genrl/blob/master/LICENSE)
[![Build Status](https://travis-ci.com/SforAiDl/genrl.svg?branch=master)](https://travis-ci.com/SforAiDl/genrl)
[![Total alerts](https://img.shields.io/lgtm/alerts/g/SforAiDl/genrl.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/SforAiDl/genrl/alerts/)
[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/SforAiDl/genrl.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/SforAiDl/genrl/context:python)
[![codecov](https://codecov.io/gh/SforAiDl/genrl/branch/master/graph/badge.svg)](https://codecov.io/gh/SforAiDl/genrl)
[![Documentation Status](https://readthedocs.org/projects/genrl/badge/?version=latest)](https://genrl.readthedocs.io/en/latest/?badge=latest)
[![Maintainability](https://api.codeclimate.com/v1/badges/c3f6e7d31c078528e0e1/maintainability)](https://codeclimate.com/github/SforAiDl/genrl/maintainability)
![Lint, Test, Code Coverage](https://github.com/SforAiDl/genrl/workflows/Lint,%20Test,%20Code%20Coverage/badge.svg)
[![Lint, Test, Code Coverage](https://github.com/SforAiDl/genrl/workflows/Lint,%20Test,%20Code%20Coverage/badge.svg)
[![Slack - Chat](https://img.shields.io/badge/Slack-Chat-blueviolet)](https://join.slack.com/t/genrlworkspace/shared_invite/zt-gwlgnymd-Pw3TYC~0XDLy6VQDml22zg)

---

[![](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/images/0)](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/links/0)[![](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/images/1)](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/links/1)[![](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/images/2)](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/links/2)[![](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/images/3)](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/links/3)[![](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/images/4)](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/links/4)[![](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/images/5)](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/links/5)[![](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/images/6)](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/links/6)[![](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/images/7)](https://sourcerer.io/fame/Sharad24/Sharad24/genrl/links/7)

---

**GenRL is a PyTorch reinforcement learning library centered around reproducible and generalizable algorithm implementations.**
**GenRL is a PyTorch reinforcement learning library centered around reproducible, generalizable algorithm implementations and improving accessibility in Reinforcement Learning**

Reinforcement learning research is moving faster than ever before. In order to keep up with the growing trend and ensure that RL research remains reproducible, GenRL aims to aid faster paper reproduction and benchmarking by providing the following main features:

- **PyTorch-first**: Modular, Extensible and Idiomatic Python
- **Tutorials and Example**: 20+ Tutorials from basic RL to SOTA Deep RL algorithm (with explanations)!
- **Unified Trainer and Logging class**: code reusability and high-level UI
- **Ready-made algorithm implementations**: ready-made implementations of popular RL algorithms.
- **Faster Benchmarking**: automated hyperparameter tuning, environment implementations etc.

By integrating these features into GenRL, we aim to eventually support **any new algorithm implementation in less than 100 lines**.

**If you're interested in contributing, feel free to go through the issues and open PRs for code, docs, tests etc. In case of any questions, please check out the [Contributing Guidelines](https://github.com/SforAiDl/genrl/wiki/Contributing-Guidelines)**
**If you're interested in contributing, feel free to go through the issues and open PRs for code, docs, tests etc. In case of any questions, please check out the [Contributing Guidelines](CONTRIBUTING.md)**


## Installation
Expand All @@ -55,10 +57,9 @@ To train a Soft Actor-Critic model from scratch on the `Pendulum-v0` gym environ
```python
import gym

from genrl import SAC, QLearning
from genrl.classical.common import Trainer
from genrl.deep.common import OffPolicyTrainer
from genrl.agents import SAC, QLearning
from genrl.environments import VectorEnv
from genrl.trainers import ClassicalTrainer, OffPolicyTrainer

env = VectorEnv("Pendulum-v0")
agent = SAC('mlp', env)
Expand All @@ -69,13 +70,30 @@ trainer.train()
To train a Tabular Dyna-Q model from scratch on the `FrozenLake-v0` gym environment and plot rewards:
```python


env = gym.make("FrozenLake-v0")
agent = QLearning(env)
trainer = Trainer(agent, env, mode="dyna", model="tabular", n_episodes=10000)
trainer = ClassicalTrainer(agent, env, mode="dyna", model="tabular", n_episodes=10000)
episode_rewards = trainer.train()
trainer.plot(episode_rewards)
```

## Tutorials
- [Multi Armed Bandits](https://genrl.readthedocs.io/en/latest/usage/tutorials/bandit/bandit_overview.html)
- [Upper Confidence Bound](https://genrl.readthedocs.io/en/latest/usage/tutorials/bandit/ucb.html)
- [Thompson Sampling](https://genrl.readthedocs.io/en/latest/usage/tutorials/bandit/thompson_sampling.html)
- [Bayesian](https://genrl.readthedocs.io/en/latest/usage/tutorials/bandit/bayesian.html)
- [Softmax Action Selection](https://genrl.readthedocs.io/en/latest/usage/tutorials/bandit/gradients.html)
- [Contextual Bandits](https://genrl.readthedocs.io/en/latest/usage/tutorials/bandit/contextual_overview.html)
- [Linear Posterior Inference](https://genrl.readthedocs.io/en/latest/usage/tutorials/bandit/linpos.html)
- [Variational Inference](https://genrl.readthedocs.io/en/latest/usage/tutorials/bandit/variational.html)
- [https://genrl.readthedocs.io/en/latest/usage/tutorials/bandit/bootstrap.html](Bootstrap)
- [Parameter Noise Sampling](https://genrl.readthedocs.io/en/latest/usage/tutorials/bandit/noise.html)
- [Deep Reinforcement Learning Background](https://genrl.readthedocs.io/en/latest/usage/tutorials/Deep/Background.html)
- [Vanilla Policy Gradients](https://genrl.readthedocs.io/en/latest/usage/tutorials/Deep/VPG.html)
- [Advantage Actor Critic](https://genrl.readthedocs.io/en/latest/usage/tutorials/Deep/A2C.html)
- [Proximal Policy Optimization](https://genrl.readthedocs.io/en/latest/usage/tutorials/Deep/PPO.html)

## Algorithms

### Deep RL
Expand Down
1 change: 0 additions & 1 deletion docs/source/api/agents/genrl.agents.classical.sarsa.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,3 @@ genrl.agents.classical.sarsa.sarsa module
:members:
:undoc-members:
:show-inheritance:

1 change: 0 additions & 1 deletion docs/source/api/agents/genrl.agents.deep.ppo1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,3 @@ genrl.agents.deep.ppo1.ppo1 module
:members:
:undoc-members:
:show-inheritance:

1 change: 0 additions & 1 deletion docs/source/api/agents/genrl.agents.deep.sac.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,3 @@ genrl.agents.deep.sac.sac module
:members:
:undoc-members:
:show-inheritance:

1 change: 0 additions & 1 deletion docs/source/api/agents/genrl.agents.deep.td3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,3 @@ genrl.agents.deep.td3.td3 module
:members:
:undoc-members:
:show-inheritance:

1 change: 0 additions & 1 deletion docs/source/api/agents/genrl.agents.deep.vpg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,3 @@ genrl.agents.deep.vpg.vpg module
:members:
:undoc-members:
:show-inheritance:

1 change: 0 additions & 1 deletion docs/source/usage/tutorials/Classical/Q_Learning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,4 +80,3 @@ Great so far so good! Now moving towards the training process it is just calling
That's it! You have successfully trained a Q-Learning agent. You can now go ahead and play with your own environments using GenRL!

2 changes: 1 addition & 1 deletion docs/source/usage/tutorials/Classical/Sarsa.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,4 @@ Great so far so good! Now moving towards the training process it is just calling
trainer.train()
trainer.evaluate()
That's it! You have successfully trained a SARSA agent. You can now go ahead and play with your own environments using GenRL!
That's it! You have successfully trained a SARSA agent. You can now go ahead and play with your own environments using GenRL!
1 change: 1 addition & 0 deletions genrl/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
version = "0.0.2"
4 changes: 2 additions & 2 deletions genrl/agents/bandits/contextual/common/base_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
from typing import Dict

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import nn as nn
from torch.nn import functional as F

from genrl.agents.bandits.contextual.common.transition import TransitionDB

Expand Down
4 changes: 2 additions & 2 deletions genrl/agents/bandits/contextual/common/bayesian.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from typing import Dict, Optional, Tuple

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import nn as nn
from torch.nn import functional as F

from genrl.agents.bandits.contextual.common.base_model import Model
from genrl.agents.bandits.contextual.common.transition import TransitionDB
Expand Down
4 changes: 2 additions & 2 deletions genrl/agents/bandits/contextual/common/neural.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from typing import Dict

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import nn as nn
from torch.nn import functional as F

from genrl.agents.bandits.contextual.common.base_model import Model
from genrl.agents.bandits.contextual.common.transition import TransitionDB
Expand Down
4 changes: 2 additions & 2 deletions genrl/agents/deep/a2c/a2c.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
import gym
import numpy as np
import torch
import torch.nn.functional as F
import torch.optim as opt
from torch import optim as opt
from torch.nn import functional as F

from genrl.agents.deep.base import OnPolicyAgent
from genrl.utils import get_env_properties, get_model, safe_mean
Expand Down
2 changes: 1 addition & 1 deletion genrl/agents/deep/ddpg/ddpg.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from typing import Any, Dict

import numpy as np
import torch.optim as opt
from torch import optim as opt

from genrl.agents import OffPolicyAgentAC
from genrl.core import ActionNoise
Expand Down
2 changes: 1 addition & 1 deletion genrl/agents/deep/dqn/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

import numpy as np
import torch
import torch.optim as opt
from torch import optim as opt

from genrl.agents import OffPolicyAgent
from genrl.utils import get_env_properties, get_model, safe_mean
Expand Down
4 changes: 2 additions & 2 deletions genrl/agents/deep/ppo1/ppo1.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
import gym
import numpy as np
import torch
import torch.nn as nn
import torch.optim as opt
from torch import nn as nn
from torch import optim as opt

from genrl.agents import OnPolicyAgent
from genrl.utils import get_env_properties, get_model, safe_mean
Expand Down
2 changes: 1 addition & 1 deletion genrl/agents/deep/sac/sac.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

import numpy as np
import torch
import torch.optim as opt
from torch import optim as opt

from genrl.agents import OffPolicyAgentAC
from genrl.utils import get_env_properties, get_model, safe_mean
Expand Down
2 changes: 1 addition & 1 deletion genrl/agents/deep/vpg/vpg.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import gym
import numpy as np
import torch
import torch.optim as opt
from torch import optim as opt

from genrl.agents import OnPolicyAgent
from genrl.utils import get_env_properties, get_model, safe_mean
Expand Down
2 changes: 1 addition & 1 deletion genrl/core/actor_critic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

import numpy as np
import torch
import torch.nn as nn
from gym import spaces
from torch import nn as nn
from torch.distributions import Categorical, Normal

from genrl.core.base import BaseActorCritic
Expand Down
2 changes: 1 addition & 1 deletion genrl/core/bandit.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from typing import List, Tuple, Union

import torch
import torch.nn.functional as F
from torch.nn import functional as F


class Bandit(ABC):
Expand Down
2 changes: 1 addition & 1 deletion genrl/core/base.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from typing import Optional, Tuple

import torch
import torch.nn as nn
from torch import nn as nn
from torch.distributions import Categorical, Normal


Expand Down
2 changes: 1 addition & 1 deletion genrl/core/noise.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import numpy as np
import torch
import torch.nn as nn
from torch import nn as nn


class ActionNoise(ABC):
Expand Down
2 changes: 1 addition & 1 deletion genrl/utils/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import gym
import numpy as np
import torch
import torch.nn as nn
from torch import nn as nn

from genrl.core.base import BaseActorCritic, BasePolicy, BaseValue
from genrl.core.noise import NoisyLinear
Expand Down

0 comments on commit dbdc190

Please sign in to comment.