Merge branch 'master' of github.com:PetrochukM/PyTorch-NLP

PetrochukM · Apr 12, 2018 · 9b7bb53 · 9b7bb53
2 parents 9bb1f1f + 2ac2b1f
commit 9b7bb53
Show file tree

Hide file tree

Showing 3 changed files with 90 additions and 61 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,6 @@
 *.pyc
 .idea
+.vscode
 
 # Operating system related file
 .DS_Store
@@ -86,6 +87,7 @@ data/**
 
 # Embeddings
 .pretrained_embeddings_cache/
+.word_vectors_cache/
 
 # Test Files
 .pytest_cache

diff --git a/ISSUE_TEMPLATE.md b/ISSUE_TEMPLATE.md
@@ -0,0 +1,11 @@
+## Expected Behavior
+
+
+## Actual Behavior
+
+
+## Steps to Reproduce the Problem
+
+1.
+2.
+3.
diff --git a/README.md b/README.md
@@ -1,19 +1,19 @@
 <p align="center"><img width="55%" src="docs/_static/img/logo_horizontal_color.svg" /></p>
 
-<h3 align="center">Supporting Rapid Prototyping with a PyTorch-NLP Toolkit.&nbsp;&nbsp;
-  <a href="https://twitter.com/intent/tweet?text=Supporting%20rapid%20prototyping%20for%20research,%20PyTorch-NLP%20a%20deep%20learning%20toolkit.%20&url=https://github.com/PetrochukM/PyTorch-NLP&hashtags=pytorch,nlp,research">
+<h3 align="center">Supporting Rapid Prototyping with a Deep Learning NLP Toolkit&nbsp;&nbsp;
+  <a href="https://twitter.com/intent/tweet?text=Supporting%20rapid%20prototyping%20for%20research,%20PyTorch-NLP%20has%20LAUNCHED,%20a%20deep%20learning%20natural%20language%20processing%20(NLP)%20toolkit!%20&url=https://github.com/PetrochukM/PyTorch-NLP&hashtags=pytorch,nlp,research">
     <img style='vertical-align: text-bottom !important;' src="https://img.shields.io/twitter/url/http/shields.io.svg?style=social" alt="Tweet">
   </a>
 </h3>
 
-PyTorch-NLP is a Natural Language Processing (NLP) toolkit designed to support rapid prototyping. It includes common [neural network modules](https://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.nn.html) and pre-trained word vectors (e.g. [FastText](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.word_to_vector.html#torchnlp.word_to_vector.FastText) and  [GloVe](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.word_to_vector.html#torchnlp.word_to_vector.GloVe)).
-Finally, it features **9 text encoders**, **14 popular datasets** and NLP ``torch.utils.data.Sampler``s.
+PyTorch-NLP, or torchnlp for short, is a library of neural network layers, text processing modules and datasets designed to accelerate Natural Language Processing (NLP) research.
+
+Join our community, add datasets and neural network layers! Chat with us on [Gitter](https://gitter.im/PyTorch-NLP/Lobby) and join the [Google Group](https://groups.google.com/forum/#!forum/pytorch-nlp), we're eager to collaborate with you.
 
 ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pytorch-nlp.svg?style=flat-square)
 [![Codecov](https://img.shields.io/codecov/c/github/PetrochukM/PyTorch-NLP/master.svg?style=flat-square)](https://codecov.io/gh/PetrochukM/PyTorch-NLP) 
 [![Documentation Status](	https://img.shields.io/readthedocs/pytorchnlp/latest.svg?style=flat-square)](http://pytorchnlp.readthedocs.io/en/latest/?badge=latest&style=flat-square)
 [![Build Status](https://img.shields.io/travis/PetrochukM/PyTorch-NLP/master.svg?style=flat-square)](https://travis-ci.org/PetrochukM/PyTorch-NLP)
-[![Gitter chat](https://img.shields.io/gitter/room/PyTorch-NLP/Lobby.svg?style=flat-square)](https://gitter.im/PyTorch-NLP?style=flat-square)
 
 ## Installation
 
@@ -26,70 +26,87 @@ pip:
 
 The complete documentation for PyTorch-NLP is available via [our ReadTheDocs website](https://pytorchnlp.readthedocs.io).
 
-## Quickstart
+## Basics
 
-Add PyTorch-NLP to your project by following one the common use cases:
+Add PyTorch-NLP to your project by following one of the common use cases:
 
-- From the [neural network package](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.nn.html),
-  use a Simple Recurrent Unit (SRU), like so:
+### Load a [Dataset](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.datasets.html) 
 
-    ```python
-    from torchnlp.nn import SRU
-    import torch
+Load the IMDB dataset, for example: 
 
-    input_ = torch.autograd.Variable(torch.randn(6, 3, 10))
-    sru = SRU(10, 20)
+```python
+from torchnlp.datasets import imdb_dataset
 
-    # Apply a Simple Recurrent Unit to `input_`
-    sru(input_) # RETURNS: (output [torch.FloatTensor of size 6x3x20], hidden_state [torch.FloatTensor of size 2x3x20])
-    ```
+# Load the imdb training dataset
+train = imdb_dataset(train=True)
+train[0]  # RETURNS: {'text': 'For a movie that gets..', 'sentiment': 'pos'}
+```
 
-- Load a [dataset](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.datasets.html) like IMDB.
+### Apply [Neural Networks](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.nn.html) Layers 
 
-    ```python
-    from torchnlp.datasets import imdb_dataset
-
-    # Load the imdb training dataset
-    train = imdb_dataset(train=True)
-    train[0]  # RETURNS: {'text': 'For a movie that gets..', 'sentiment': 'pos'}
-    ```
-      
-- Encode text into vectors with the [text encoders package](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.text_encoders.html).
-
-    ```python
-    from torchnlp.text_encoders import WhitespaceEncoder
-
-    # Create a `WhitespaceEncoder` with a corpus of text
-    encoder = WhitespaceEncoder(["now this ain't funny", "so don't you dare laugh"])
-
-    # Encode and decode phrases
-    encoder.encode("this ain't funny.") # RETURNS: torch.LongTensor([6, 7, 1])
-    encoder.decode(encoder.encode("This ain't funny.")) # RETURNS: "this ain't funny."
-    ```
-
-- Load FastText, state-of-the-art English [word vector representations](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.word_to_vector.html).
+For example, from the neural network package, apply a Simple Recurrent Unit (SRU):
 
-    ```python
-    from torchnlp.word_to_vector import FastText
-
-    vectors = FastText()
-    # Load vectors for any word as a `torch.FloatTensor`
-    vectors['hello']  # RETURNS: [torch.FloatTensor of size 100]
-    ```
-
-- Compute the BLEU Score with the [metrics package](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.metrics.html).
+```python
+from torchnlp.nn import SRU
+import torch
 
-    ```python
-    from torchnlp.metrics import get_moses_multi_bleu
-
-    hypotheses = ["The brown fox jumps over the dog 笑"]
-    references = ["The quick brown fox jumps over the lazy dog 笑"]
-
-    # Compute BLEU score with the official BLEU perl script
-    get_moses_multi_bleu(hypotheses, references, lowercase=True)  # RETURNS: 47.9
-    ```
+input_ = torch.autograd.Variable(torch.randn(6, 3, 10))
+sru = SRU(10, 20)
+
+# Apply a Simple Recurrent Unit to `input_`
+sru(input_)
+# RETURNS: (
+#   output [torch.FloatTensor (6x3x20)], 
+#   hidden_state [torch.FloatTensor (2x3x20)]
+# )
+```
+
+### [Encode Text](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.text_encoders.html) 
+
+Tokenize and encode text as a tensor. For example, a `WhitespaceEncoder` breaks text into terms whenever it encounters a whitespace character.
+
+```python
+from torchnlp.text_encoders import WhitespaceEncoder
+
+# Create a `WhitespaceEncoder` with a corpus of text
+encoder = WhitespaceEncoder(["now this ain't funny", "so don't you dare laugh"])
+
+# Encode and decode phrases
+encoder.encode("this ain't funny.") # RETURNS: torch.LongTensor([6, 7, 1])
+encoder.decode(encoder.encode("This ain't funny.")) # RETURNS: "this ain't funny."
+```
+
+### Load [Word Vectors](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.word_to_vector.html)
 
-Find longer examples, [here](https://github.com/PetrochukM/PyTorch-NLP/tree/master/examples).
+For example, load FastText, state-of-the-art English word vectors:
+
+```python
+from torchnlp.word_to_vector import FastText
+
+vectors = FastText()
+# Load vectors for any word as a `torch.FloatTensor`
+vectors['hello']  # RETURNS: [torch.FloatTensor of size 100]
+```
+
+### Compute [Metrics](http://pytorchnlp.readthedocs.io/en/latest/source/torchnlp.metrics.html)
+
+Finally, compute common metrics such as the BLEU score.
+
+```python
+from torchnlp.metrics import get_moses_multi_bleu
+
+hypotheses = ["The brown fox jumps over the dog 笑"]
+references = ["The quick brown fox jumps over the lazy dog 笑"]
+
+# Compute BLEU score with the official BLEU perl script
+get_moses_multi_bleu(hypotheses, references, lowercase=True)  # RETURNS: 47.9
+```
+
+### Help :question:
+
+Maybe looking at longer examples may help you at [`examples/`](examples/).
+
+Need more help? We are happy to answer your questions via [Gitter Chat](https://gitter.im/PyTorch-NLP)
 
 ## Contributing
 
@@ -99,14 +116,13 @@ We've released PyTorch-NLP because we found a lack of basic toolkits for NLP in
 
 Read our [contributing guide](https://github.com/PetrochukM/PyTorch-NLP/blob/master/Contributing.md) to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to PyTorch-NLP.
 
-
 ## Related Work
 
 ### [torchtext](https://github.com/pytorch/text)
 
 torchtext and PyTorch-NLP differ in the architecture and feature set; otherwise, they are similar. torchtext and PyTorch-NLP provide pre-trained word vectors, datasets, iterators and text encoders. PyTorch-NLP also provides neural network modules and metrics. From an architecture standpoint, torchtext is object orientated with external coupling while PyTorch-NLP is object orientated with low coupling.
 
-### [AllenNLP](https://github.com/pytorch/text)
+### [AllenNLP](https://github.com/allenai/allennlp)
 
 AllenNLP is designed to be a platform for research. PyTorch-NLP is designed to be a lightweight toolkit.