Skip to content
This repository has been archived by the owner on Jul 4, 2023. It is now read-only.

Commit

Permalink
Merge branch 'master' of github.com:PetrochukM/PyTorch-NLP
Browse files Browse the repository at this point in the history
  • Loading branch information
PetrochukM committed Apr 12, 2018
2 parents 9bb1f1f + 2ac2b1f commit 9b7bb53
Show file tree
Hide file tree
Showing 3 changed files with 90 additions and 61 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
*.pyc
.idea
.vscode

# Operating system related file
.DS_Store
Expand Down Expand Up @@ -86,6 +87,7 @@ data/**

# Embeddings
.pretrained_embeddings_cache/
.word_vectors_cache/

# Test Files
.pytest_cache
Expand Down
11 changes: 11 additions & 0 deletions ISSUE_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## Expected Behavior


## Actual Behavior


## Steps to Reproduce the Problem

1.
2.
3.
138 changes: 77 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
<p align="center"><img width="55%" src="docs/_static/img/logo_horizontal_color.svg" /></p>

<h3 align="center">Supporting Rapid Prototyping with a PyTorch-NLP Toolkit.&nbsp;&nbsp;
<a href="https://twitter.com/intent/tweet?text=Supporting%20rapid%20prototyping%20for%20research,%20PyTorch-NLP%20a%20deep%20learning%20toolkit.%20&url=https://github.com/PetrochukM/PyTorch-NLP&hashtags=pytorch,nlp,research">
<h3 align="center">Supporting Rapid Prototyping with a Deep Learning NLP Toolkit&nbsp;&nbsp;
<a href="https://twitter.com/intent/tweet?text=Supporting%20rapid%20prototyping%20for%20research,%20PyTorch-NLP%20has%20LAUNCHED,%20a%20deep%20learning%20natural%20language%20processing%20(NLP)%20toolkit!%20&url=https://github.com/PetrochukM/PyTorch-NLP&hashtags=pytorch,nlp,research">
<img style='vertical-align: text-bottom !important;' src="https://img.shields.io/twitter/url/http/shields.io.svg?style=social" alt="Tweet">
</a>
</h3>

PyTorch-NLP is a Natural Language Processing (NLP) toolkit designed to support rapid prototyping. It includes common [neural network modules](https://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.nn.html) and pre-trained word vectors (e.g. [FastText](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.word_to_vector.html#torchnlp.word_to_vector.FastText) and [GloVe](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.word_to_vector.html#torchnlp.word_to_vector.GloVe)).
Finally, it features **9 text encoders**, **14 popular datasets** and NLP ``torch.utils.data.Sampler``s.
PyTorch-NLP, or torchnlp for short, is a library of neural network layers, text processing modules and datasets designed to accelerate Natural Language Processing (NLP) research.

Join our community, add datasets and neural network layers! Chat with us on [Gitter](https://gitter.im/PyTorch-NLP/Lobby) and join the [Google Group](https://groups.google.com/forum/#!forum/pytorch-nlp), we're eager to collaborate with you.

![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pytorch-nlp.svg?style=flat-square)
[![Codecov](https://img.shields.io/codecov/c/github/PetrochukM/PyTorch-NLP/master.svg?style=flat-square)](https://codecov.io/gh/PetrochukM/PyTorch-NLP)
[![Documentation Status]( https://img.shields.io/readthedocs/pytorchnlp/latest.svg?style=flat-square)](http://pytorchnlp.readthedocs.io/en/latest/?badge=latest&style=flat-square)
[![Build Status](https://img.shields.io/travis/PetrochukM/PyTorch-NLP/master.svg?style=flat-square)](https://travis-ci.org/PetrochukM/PyTorch-NLP)
[![Gitter chat](https://img.shields.io/gitter/room/PyTorch-NLP/Lobby.svg?style=flat-square)](https://gitter.im/PyTorch-NLP?style=flat-square)

## Installation

Expand All @@ -26,70 +26,87 @@ pip:

The complete documentation for PyTorch-NLP is available via [our ReadTheDocs website](https://pytorchnlp.readthedocs.io).

## Quickstart
## Basics

Add PyTorch-NLP to your project by following one the common use cases:
Add PyTorch-NLP to your project by following one of the common use cases:

- From the [neural network package](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.nn.html),
use a Simple Recurrent Unit (SRU), like so:
### Load a [Dataset](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.datasets.html)

```python
from torchnlp.nn import SRU
import torch
Load the IMDB dataset, for example:

input_ = torch.autograd.Variable(torch.randn(6, 3, 10))
sru = SRU(10, 20)
```python
from torchnlp.datasets import imdb_dataset

# Apply a Simple Recurrent Unit to `input_`
sru(input_) # RETURNS: (output [torch.FloatTensor of size 6x3x20], hidden_state [torch.FloatTensor of size 2x3x20])
```
# Load the imdb training dataset
train = imdb_dataset(train=True)
train[0] # RETURNS: {'text': 'For a movie that gets..', 'sentiment': 'pos'}
```

- Load a [dataset](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.datasets.html) like IMDB.
### Apply [Neural Networks](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.nn.html) Layers

```python
from torchnlp.datasets import imdb_dataset

# Load the imdb training dataset
train = imdb_dataset(train=True)
train[0] # RETURNS: {'text': 'For a movie that gets..', 'sentiment': 'pos'}
```
- Encode text into vectors with the [text encoders package](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.text_encoders.html).

```python
from torchnlp.text_encoders import WhitespaceEncoder

# Create a `WhitespaceEncoder` with a corpus of text
encoder = WhitespaceEncoder(["now this ain't funny", "so don't you dare laugh"])

# Encode and decode phrases
encoder.encode("this ain't funny.") # RETURNS: torch.LongTensor([6, 7, 1])
encoder.decode(encoder.encode("This ain't funny.")) # RETURNS: "this ain't funny."
```

- Load FastText, state-of-the-art English [word vector representations](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.word_to_vector.html).
For example, from the neural network package, apply a Simple Recurrent Unit (SRU):

```python
from torchnlp.word_to_vector import FastText

vectors = FastText()
# Load vectors for any word as a `torch.FloatTensor`
vectors['hello'] # RETURNS: [torch.FloatTensor of size 100]
```

- Compute the BLEU Score with the [metrics package](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.metrics.html).
```python
from torchnlp.nn import SRU
import torch

```python
from torchnlp.metrics import get_moses_multi_bleu

hypotheses = ["The brown fox jumps over the dog 笑"]
references = ["The quick brown fox jumps over the lazy dog 笑"]

# Compute BLEU score with the official BLEU perl script
get_moses_multi_bleu(hypotheses, references, lowercase=True) # RETURNS: 47.9
```
input_ = torch.autograd.Variable(torch.randn(6, 3, 10))
sru = SRU(10, 20)

# Apply a Simple Recurrent Unit to `input_`
sru(input_)
# RETURNS: (
# output [torch.FloatTensor (6x3x20)],
# hidden_state [torch.FloatTensor (2x3x20)]
# )
```

### [Encode Text](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.text_encoders.html)

Tokenize and encode text as a tensor. For example, a `WhitespaceEncoder` breaks text into terms whenever it encounters a whitespace character.

```python
from torchnlp.text_encoders import WhitespaceEncoder

# Create a `WhitespaceEncoder` with a corpus of text
encoder = WhitespaceEncoder(["now this ain't funny", "so don't you dare laugh"])

# Encode and decode phrases
encoder.encode("this ain't funny.") # RETURNS: torch.LongTensor([6, 7, 1])
encoder.decode(encoder.encode("This ain't funny.")) # RETURNS: "this ain't funny."
```

### Load [Word Vectors](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.word_to_vector.html)

Find longer examples, [here](https://github.com/PetrochukM/PyTorch-NLP/tree/master/examples).
For example, load FastText, state-of-the-art English word vectors:

```python
from torchnlp.word_to_vector import FastText

vectors = FastText()
# Load vectors for any word as a `torch.FloatTensor`
vectors['hello'] # RETURNS: [torch.FloatTensor of size 100]
```

### Compute [Metrics](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.metrics.html)

Finally, compute common metrics such as the BLEU score.

```python
from torchnlp.metrics import get_moses_multi_bleu

hypotheses = ["The brown fox jumps over the dog 笑"]
references = ["The quick brown fox jumps over the lazy dog 笑"]

# Compute BLEU score with the official BLEU perl script
get_moses_multi_bleu(hypotheses, references, lowercase=True) # RETURNS: 47.9
```

### Help :question:

Maybe looking at longer examples may help you at [`examples/`](examples/).

Need more help? We are happy to answer your questions via [Gitter Chat](https://gitter.im/PyTorch-NLP)

## Contributing

Expand All @@ -99,14 +116,13 @@ We've released PyTorch-NLP because we found a lack of basic toolkits for NLP in

Read our [contributing guide](https://github.com/PetrochukM/PyTorch-NLP/blob/master/Contributing.md) to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to PyTorch-NLP.


## Related Work

### [torchtext](https://github.com/pytorch/text)

torchtext and PyTorch-NLP differ in the architecture and feature set; otherwise, they are similar. torchtext and PyTorch-NLP provide pre-trained word vectors, datasets, iterators and text encoders. PyTorch-NLP also provides neural network modules and metrics. From an architecture standpoint, torchtext is object orientated with external coupling while PyTorch-NLP is object orientated with low coupling.

### [AllenNLP](https://github.com/pytorch/text)
### [AllenNLP](https://github.com/allenai/allennlp)

AllenNLP is designed to be a platform for research. PyTorch-NLP is designed to be a lightweight toolkit.

Expand Down

0 comments on commit 9b7bb53

Please sign in to comment.