Skip to content

Commit

Permalink
Move documentation and update links
Browse files Browse the repository at this point in the history
  • Loading branch information
ophelielacroix committed Nov 13, 2020
1 parent 06b5f66 commit 0a443ad
Show file tree
Hide file tree
Showing 16 changed files with 54 additions and 61 deletions.
11 changes: 0 additions & 11 deletions docs/frameworks.rst

This file was deleted.

2 changes: 1 addition & 1 deletion docs/bert.md → docs/frameworks/bert.md
Expand Up @@ -43,6 +43,6 @@ The tone analyzer consists of two BERT classification models.
The first model detects the polarity of a sentence, i.e. whether it is perceived as `positive`, `neutral` or `negative`.
The second model detects the tone of a sentence, between `subjective` and `objective`.

The models are finetuned on manually annotated Twitter data from [Twitter Sentiment](datasets.md#twitter-sentiment) (train part) and [EuroParl sentiment 2](datasets.md#europarl-sentiment2)).
The models are finetuned on manually annotated Twitter data from [Twitter Sentiment](../datasets.md#twitter-sentiment) (train part) and [EuroParl sentiment 2](../datasets.md#europarl-sentiment2)).
Both datasets can be loaded with the DaNLP package.

2 changes: 1 addition & 1 deletion docs/flair.md → docs/frameworks/flair.md
Expand Up @@ -5,7 +5,7 @@ The [flair](https://github.com/flairNLP/flair) framework from Zalando is based o


Through the DaNLP package, we provide a pre-trained Part-of-Speech tagger and Named Entity recognizer using the flair framework.
The models have been trained on the [Danish Dependency Treebank](datasets.md#dane) and use fastText word embeddings and [flair contextual word embeddings](models/embeddings.md#flair-embeddings) trained on data from Wikipedia and EuroParl corpus.
The models have been trained on the [Danish Dependency Treebank](../datasets.md#dane) and use fastText word embeddings and [flair contextual word embeddings](../models/embeddings.md#flair-embeddings) trained on data from Wikipedia and EuroParl corpus.
The code for training can be found on flair's GitHub, and the following parameters are set:
`learning_rate=1`, `mini_batch_size=32`, `max_epochs=150`, `hidden_size=256`.

Expand Down
20 changes: 10 additions & 10 deletions docs/spacy.md → docs/frameworks/spacy.md
Expand Up @@ -11,15 +11,15 @@ Note that the two models are not the same, e.g. the spaCy model in DaNLP perform

The spaCy model comes with **tokenization**, **dependency parsing**, **part of speech tagging** , **word vectors** and **name entity recognition**.

The model is trained on the [Danish Dependency Treebank (DaNe)](datasets.md#dane), and with additional data for NER which originates from news articles form a collaboration with InfoMedia.
The model is trained on the [Danish Dependency Treebank (DaNe)](../datasets.md#dane), and with additional data for NER which originates from news articles form a collaboration with InfoMedia.

For comparison to other models and additional information of the tasks, check out the task individual pages for [word embeddings](models/embeddings.md), [named entity recognition](models/ner.md), [part of speech tagging](models/pos.md) and [dependency parsing](models/dependency.md).
For comparison to other models and additional information of the tasks, check out the task individual pages for [word embeddings](../models/embeddings.md), [named entity recognition](../models/ner.md), [part of speech tagging](../models/pos.md) and [dependency parsing](../models/dependency.md).

The DaNLP github also provides a version of the spaCy model which contains a sentiment classifier, read more about it in the [sentiment analysis docs](models/sentiment_analysis.md).
The DaNLP github also provides a version of the spaCy model which contains a sentiment classifier, read more about it in the [sentiment analysis docs](../models/sentiment_analysis.md).

### Performance of the spaCy model

The following lists the performance scores of the spaCy model provided in DaNLP pakage on the [Danish Dependency Treebank (DaNe)](datasets.md#dane) test set. The scores and elaborating scores can be found in the file meta.json that is shipped with the model when it is downloaded.
The following lists the performance scores of the spaCy model provided in DaNLP pakage on the [Danish Dependency Treebank (DaNe)](../datasets.md#dane) test set. The scores and elaborating scores can be found in the file meta.json that is shipped with the model when it is downloaded.

| Task | Measures | Scores |
| ----------------------- | -------- | :----- |
Expand Down Expand Up @@ -66,7 +66,7 @@ for token in doc:

```

![](imgs/ling_feat.PNG)
![](../imgs/ling_feat.PNG)

**Visualizing the dependency tree**

Expand All @@ -78,9 +78,9 @@ displacy.serve(doc, style='dep')



![](imgs/dep.PNG)
![](../imgs/dep.PNG)

Here is an example of using Named entity recognitions . You can read more about [NER](models/ner.md#named-entity-recognition) in the specific doc.
Here is an example of using Named entity recognitions . You can read more about [NER](../models/ner.md#named-entity-recognition) in the specific doc.

```python
doc = nlp('Jens Peter Hansen kommer fra Danmark og arbejder hos Alexandra Instituttet')
Expand All @@ -107,13 +107,13 @@ Instituttet ORG

The spaCy framework provides an easy command line tool for training an existing model, for example by adding a text classifier. This short example shows how to do so using your own annotated data. It is also possible to use any static embedding provided in the DaNLP wrapper.

As an example we will use a small dataset for sentiment classification on twitter. The dataset is under development and will be added in the DaNLP package when ready, and the spacy model will be updated with the classification model as well. A first verison of a spacy model with a sentiment classifier can be load with the danlp wrapper, read more about it in the sentiment analysis [docs](models/sentiment_analysis.md).
As an example we will use a small dataset for sentiment classification on twitter. The dataset is under development and will be added in the DaNLP package when ready, and the spacy model will be updated with the classification model as well. A first verison of a spacy model with a sentiment classifier can be load with the danlp wrapper, read more about it in the sentiment analysis [docs](../models/sentiment_analysis.md).

**The first thing is to convert the annotated data into a data format readable by spaCy**

Imagine you have the data in an e.g csv format and have it split in development and training part. Our twitter data has (in time of creating this snippet) 973 training examples and 400 evaluation examples, with the following labels : 'positive' marked by 0, 'neutral' marked by 1, and 'negative' marked by 2. Loaded with pandas dataFrame it looks like this:

![](imgs/data_head.PNG)
![](../imgs/data_head.PNG)

It needs to be converted into the format expected by spaCy for training the model, which can be done as follows:

Expand Down Expand Up @@ -159,7 +159,7 @@ prepare_data(df_dev, 'eval_dev.json')

The data now looks like this cutted snippet:

![](imgs/snippet_json.PNG)
![](../imgs/snippet_json.PNG)

**Ensure you have the models and embeddings downloaded**

Expand Down
File renamed without changes.
File renamed without changes.
Expand Up @@ -10,16 +10,16 @@ The DaNLP package provides you with several models for different NLP tasks using
On this section, you will have a quick tour of the main functions of the DaNLP package.
For a more detailed description of the tasks and frameworks, follow the links to the documentation:

* [Embedding of text](../docs/models/embeddings.md) with flair, spaCy or Gensim
* [Part of speech tagging](../docs/models/pos.md) (POS) with spaCy or flair
* [Named Entity Recognition](../docs/models/ner.md) (NER) with spaCy, flair or BERT
* [Sentiment Analysis](../docs/models/sentiment_analysis.md) with spaCy or BERT
* [Dependency parsing and NP-chunking](../docs/models/dependency.md) with spaCy
* [Embedding of text](../models/embeddings.md) with flair, spaCy or Gensim
* [Part of speech tagging](../models/pos.md) (POS) with spaCy or flair
* [Named Entity Recognition](../models/ner.md) (NER) with spaCy, flair or BERT
* [Sentiment Analysis](../models/sentiment_analysis.md) with spaCy or BERT
* [Dependency parsing and NP-chunking](../models/dependency.md) with spaCy


## All-in-one with the spaCy models

To quickly get started with DaNLP and try out different NLP tasks, you can use the spaCy model ([see also](../docs/spacy.md)). The main advantages of the spaCy model is that it is fast and it includes most of the basic NLP tasks that you need for pre-processing texts in Danish.
To quickly get started with DaNLP and try out different NLP tasks, you can use the spaCy model ([see also](../frameworks/spacy.md)). The main advantages of the spaCy model is that it is fast and it includes most of the basic NLP tasks that you need for pre-processing texts in Danish.

The main functions are:

Expand All @@ -28,7 +28,7 @@ The main functions are:

### Pre-processing tasks

Perform [Part-of-Speech tagging](../docs/models/pos.md), [Named Entity Recognition](../docs/models/ner.md) and [dependency parsing](../docs/models/dependency.md) at the same time with the DaNLP spaCy model.
Perform [Part-of-Speech tagging](../models/pos.md), [Named Entity Recognition](../models/ner.md) and [dependency parsing](../models/dependency.md) at the same time with the DaNLP spaCy model.
Here is a snippet to quickly getting started:

```python
Expand Down
2 changes: 1 addition & 1 deletion docs/models/dependency.md
Expand Up @@ -33,7 +33,7 @@ We provide a convertion function -- from dependencies to NP-chunks -- thus depen

## 🔧 SpaCy {#spacy}

Read more about the SpaCy model in the dedicated [SpaCy docs](../spacy.md) , it has also been trained using the [Danish Dependency Treebank](../datasets.md#dane) dataset.
Read more about the SpaCy model in the dedicated [SpaCy docs](../frameworks/spacy.md) , it has also been trained using the [Danish Dependency Treebank](../datasets.md#dane) dataset.

### Dependency Parser

Expand Down
13 changes: 0 additions & 13 deletions docs/models/models.rst

This file was deleted.

4 changes: 2 additions & 2 deletions docs/models/ner.md
Expand Up @@ -21,7 +21,7 @@ The BERT [(Devlin et al. 2019)](https://www.aclweb.org/anthology/N19-1423/) NER
has been finetuned on the [DaNE](../datasets.md#dane)
dataset [(Hvingelby et al. 2020)](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.565.pdf). The finetuning has been done using the [Transformers](https://github.com/huggingface/transformers) library from HuggingFace.

To use the BERT NER model it can be loaded with the `load_bert_ner_model()` method. Please notice that it can maximum take 512 tokens as input at a time. For longer text sequences split before hand, for example be using sentence boundary detection (eg. by using the [spacy model](../spacy.md ).)
To use the BERT NER model it can be loaded with the `load_bert_ner_model()` method. Please notice that it can maximum take 512 tokens as input at a time. For longer text sequences split before hand, for example be using sentence boundary detection (eg. by using the [spacy model](../frameworks/spacy.md ).)
```python
from danlp.models import load_bert_ner_model

Expand Down Expand Up @@ -53,7 +53,7 @@ print(sentence.to_tagged_string())
```

#### 🔧 spaCy {#spacy}
The [spaCy](https://spacy.io/) model is trained for several NLP tasks [(read more here)](../spacy.md) uing the [DDT and DaNE](../datasets.md#dane) annotations.
The [spaCy](https://spacy.io/) model is trained for several NLP tasks [(read more here)](../frameworks/spacy.md) uing the [DDT and DaNE](../datasets.md#dane) annotations.
The spaCy model can be loaded with DaNLP to do NER predictions in the following way.
```python
from danlp.models import load_spacy_model
Expand Down
2 changes: 1 addition & 1 deletion docs/models/pos.md
Expand Up @@ -47,7 +47,7 @@ print(sentence.to_tagged_string())

##### 🔧 SpaCy {#spacy}

Read more about the spaCy model in the dedicated [spaCy docs](../spacy.md) , it has also been trained using the [Danish Dependency Treebank](../datasets.md#dane) data.
Read more about the spaCy model in the dedicated [spaCy docs](../frameworks/spacy.md) , it has also been trained using the [Danish Dependency Treebank](../datasets.md#dane) data.

Below is a small getting started snippet for using the Spacy pos tagger:

Expand Down
4 changes: 2 additions & 2 deletions docs/models/sentiment_analysis.md
Expand Up @@ -77,11 +77,11 @@ classifier._clases()

SpaCy sentiment is a text classification model trained using spacy built in command line interface. It uses the CoNLL2017 word vectors (read about it [here](embeddings.md)).

The model is trained using hard distil of the [BERT Tone](#wrenchbert-tone) (beta) - Meaning, the BERT Tone model is used to make predictions on 50.000 sentences from Twitter and 50.000 sentences from [Europarl7](http://www.statmt.org/europarl/). These data is then used to trained a spacy model. Notice the dataset has first been balanced between the classes by oversampling. The model recognizes the classses: 'positiv', 'neutral' and 'negative'.
The model is trained using hard distil of the [BERT Tone](#bert-tone) (beta) - Meaning, the BERT Tone model is used to make predictions on 50.000 sentences from Twitter and 50.000 sentences from [Europarl7](http://www.statmt.org/europarl/). These data is then used to trained a spacy model. Notice the dataset has first been balanced between the classes by oversampling. The model recognizes the classses: 'positiv', 'neutral' and 'negative'.

It is a first version.

Read more about using the Danish spaCy model [here](../spacy.md).
Read more about using the Danish spaCy model [here](../frameworks/spacy.md).

Below is a small snippet for getting started using the spaCy sentiment model. Currently the danlp packages provide both a spaCy model which do not provide any classes in the textcat module (so it is empty for you to train from scratch), and the sentiment spacy model which have pretrained the classes 'positiv', 'neutral' and 'negative'. Notice it is possible with the spacy command line interface to continue training of the sentiment classes, or add new tags.

Expand Down
11 changes: 11 additions & 0 deletions readthedocs/frameworks.rst
@@ -0,0 +1,11 @@
Frameworks
==========


.. toctree::
:maxdepth: 1
:caption: Frameworks

docs/frameworks/spacy.md
docs/frameworks/flair.md
docs/frameworks/bert.md
10 changes: 5 additions & 5 deletions readthedocs/index.rst
Expand Up @@ -14,16 +14,16 @@ If you are new to NLP or want to know more about the project in a broader perspe
:maxdepth: 1
:caption: Getting started

gettingstarted/installation.md
gettingstarted/quickstart.md
gettingstarted/contributing.md
docs/gettingstarted/installation.md
docs/gettingstarted/quickstart.md
docs/gettingstarted/contributing.md

.. toctree::
:maxdepth: 2
:caption: Documentation

docs/models/models.rst
docs/frameworks.rst
models.rst
frameworks.rst
docs/datasets.md

.. toctree::
Expand Down
7 changes: 0 additions & 7 deletions readthedocs/library/download.rst

This file was deleted.

13 changes: 13 additions & 0 deletions readthedocs/models.rst
@@ -0,0 +1,13 @@
Models
======


.. toctree::
:maxdepth: 1
:caption: Models

docs/models/embeddings.md
docs/models/pos.md
docs/models/ner.md
docs/models/dependency.md
docs/models/sentiment_analysis.md

0 comments on commit 0a443ad

Please sign in to comment.