Move documentation and update links

alexandrainst · Nov 13, 2020 · 0a443ad · 0a443ad
1 parent 06b5f66
commit 0a443ad
Show file tree

Hide file tree

Showing 16 changed files with 54 additions and 61 deletions.
diff --git a/docs/frameworks.rst b/docs/frameworks.rst
diff --git a/docs/bert.md → docs/frameworks/bert.md b/docs/bert.md → docs/frameworks/bert.md
@@ -43,6 +43,6 @@ The tone analyzer consists of two BERT classification models.
 The first model detects the polarity of a sentence, i.e. whether it is perceived as `positive`, `neutral` or `negative`.
 The second model detects the tone of a sentence, between `subjective` and `objective`. 
 
-The models are finetuned on manually annotated Twitter data from [Twitter Sentiment](datasets.md#twitter-sentiment) (train part) and [EuroParl sentiment 2](datasets.md#europarl-sentiment2)).
+The models are finetuned on manually annotated Twitter data from [Twitter Sentiment](../datasets.md#twitter-sentiment) (train part) and [EuroParl sentiment 2](../datasets.md#europarl-sentiment2)).
 Both datasets can be loaded with the DaNLP package.  
 
diff --git a/docs/flair.md → docs/frameworks/flair.md b/docs/flair.md → docs/frameworks/flair.md
@@ -5,7 +5,7 @@ The [flair](https://github.com/flairNLP/flair) framework from Zalando is based o
 
 
 Through the DaNLP package, we provide a pre-trained Part-of-Speech tagger and Named Entity recognizer using the flair framework. 
-The models have been trained on the [Danish Dependency Treebank](datasets.md#dane) and use fastText word embeddings and [flair contextual word embeddings](models/embeddings.md#flair-embeddings) trained on data from Wikipedia and EuroParl corpus.
+The models have been trained on the [Danish Dependency Treebank](../datasets.md#dane) and use fastText word embeddings and [flair contextual word embeddings](../models/embeddings.md#flair-embeddings) trained on data from Wikipedia and EuroParl corpus.
 The code for training can be found on flair's GitHub, and the following parameters are set:
 `learning_rate=1`, `mini_batch_size=32`, `max_epochs=150`, `hidden_size=256`.
 

diff --git a/docs/spacy.md → docs/frameworks/spacy.md b/docs/spacy.md → docs/frameworks/spacy.md
@@ -11,15 +11,15 @@ Note that the two models are not the same, e.g. the spaCy model in DaNLP perform
 
 The spaCy model comes with **tokenization**, **dependency parsing**, **part of speech tagging** , **word vectors** and **name entity recognition**. 
 
-The model is trained on the [Danish Dependency Treebank (DaNe)](datasets.md#dane), and with additional data for NER  which originates from news articles form a collaboration with InfoMedia. 
+The model is trained on the [Danish Dependency Treebank (DaNe)](../datasets.md#dane), and with additional data for NER  which originates from news articles form a collaboration with InfoMedia. 
 
-For comparison to other models and additional information of the tasks, check out the task individual pages for [word embeddings](models/embeddings.md), [named entity recognition](models/ner.md), [part of speech tagging](models/pos.md) and [dependency parsing](models/dependency.md).
+For comparison to other models and additional information of the tasks, check out the task individual pages for [word embeddings](../models/embeddings.md), [named entity recognition](../models/ner.md), [part of speech tagging](../models/pos.md) and [dependency parsing](../models/dependency.md).
 
-The DaNLP github also provides a version of the spaCy model which contains a sentiment classifier, read more about it in the [sentiment analysis docs](models/sentiment_analysis.md).
+The DaNLP github also provides a version of the spaCy model which contains a sentiment classifier, read more about it in the [sentiment analysis docs](../models/sentiment_analysis.md).
 
 ### Performance of the spaCy model
 
-The following lists the  performance scores of the spaCy model provided in DaNLP pakage on the [Danish Dependency Treebank (DaNe)](datasets.md#dane) test set. The scores and elaborating scores can be found in the file meta.json that is shipped with the model when it is downloaded. 
+The following lists the  performance scores of the spaCy model provided in DaNLP pakage on the [Danish Dependency Treebank (DaNe)](../datasets.md#dane) test set. The scores and elaborating scores can be found in the file meta.json that is shipped with the model when it is downloaded. 
 
 | Task                    | Measures | Scores |
 | ----------------------- | -------- | :----- |
@@ -66,7 +66,7 @@ for token in doc:
 
 ```
 
-![](imgs/ling_feat.PNG)
+![](../imgs/ling_feat.PNG)
 
 **Visualizing the dependency tree**
 
@@ -78,9 +78,9 @@ displacy.serve(doc, style='dep')
 
 
 
-![](imgs/dep.PNG)
+![](../imgs/dep.PNG)
 
-Here is an example of using Named entity recognitions . You can read more about [NER](models/ner.md#named-entity-recognition) in the specific doc. 
+Here is an example of using Named entity recognitions . You can read more about [NER](../models/ner.md#named-entity-recognition) in the specific doc. 
 
 ```python
 doc = nlp('Jens Peter Hansen kommer fra Danmark og arbejder hos Alexandra Instituttet') 
@@ -107,13 +107,13 @@ Instituttet ORG
 
 The spaCy framework provides an easy command line tool for training an existing model, for example by adding a text classifier.  This short example shows how to do so using your own annotated data. It is also possible to use any static embedding provided in the DaNLP wrapper. 
 
-As an example we will use a small dataset for sentiment classification on twitter. The dataset is under development and will be added in the DaNLP package when ready, and the spacy model will be updated with the classification model as well.  A first verison of  a spacy model with a sentiment classifier can be load with the danlp wrapper, read more about it in the sentiment analysis [docs](models/sentiment_analysis.md).
+As an example we will use a small dataset for sentiment classification on twitter. The dataset is under development and will be added in the DaNLP package when ready, and the spacy model will be updated with the classification model as well.  A first verison of  a spacy model with a sentiment classifier can be load with the danlp wrapper, read more about it in the sentiment analysis [docs](../models/sentiment_analysis.md).
 
  **The first thing is to convert the annotated data into a data format readable by spaCy**
 
 Imagine you have the data in an e.g csv format and have it split in development and training part. Our twitter data has (in time of creating this snippet)  973 training examples and 400 evaluation examples, with the following labels : 'positive' marked by 0, 'neutral' marked by 1, and 'negative' marked by 2. Loaded with pandas dataFrame it looks like this:  
 
-![](imgs/data_head.PNG)
+![](../imgs/data_head.PNG)
 
 It needs to be converted into the format expected by spaCy for training the model, which can be done as follows:
 
@@ -159,7 +159,7 @@ prepare_data(df_dev, 'eval_dev.json')
 
 The data now looks like this cutted snippet:
 
-![](imgs/snippet_json.PNG)
+![](../imgs/snippet_json.PNG)
 
 **Ensure you have the models and embeddings downloaded**
 

diff --git a/readthedocs/gettingstarted/contributing.md → docs/gettingstarted/contributing.md b/readthedocs/gettingstarted/contributing.md → docs/gettingstarted/contributing.md
diff --git a/readthedocs/gettingstarted/installation.md → docs/gettingstarted/installation.md b/readthedocs/gettingstarted/installation.md → docs/gettingstarted/installation.md
diff --git a/readthedocs/gettingstarted/quickstart.md → docs/gettingstarted/quickstart.md b/readthedocs/gettingstarted/quickstart.md → docs/gettingstarted/quickstart.md
@@ -10,16 +10,16 @@ The DaNLP package provides you with several models for different NLP tasks using
 On this section, you will have a quick tour of the main functions of the DaNLP package. 
 For a more detailed description of the tasks and frameworks, follow the links to the documentation: 
 
-*  [Embedding of text](../docs/models/embeddings.md) with flair, spaCy or Gensim
-*  [Part of speech tagging](../docs/models/pos.md) (POS) with spaCy or flair
-*  [Named Entity Recognition](../docs/models/ner.md) (NER) with spaCy, flair or BERT
-*  [Sentiment Analysis](../docs/models/sentiment_analysis.md) with spaCy or BERT
-*  [Dependency parsing and NP-chunking](../docs/models/dependency.md) with spaCy
+*  [Embedding of text](../models/embeddings.md) with flair, spaCy or Gensim
+*  [Part of speech tagging](../models/pos.md) (POS) with spaCy or flair
+*  [Named Entity Recognition](../models/ner.md) (NER) with spaCy, flair or BERT
+*  [Sentiment Analysis](../models/sentiment_analysis.md) with spaCy or BERT
+*  [Dependency parsing and NP-chunking](../models/dependency.md) with spaCy
 
 
 ## All-in-one with the spaCy models
 
-To quickly get started with DaNLP and try out different NLP tasks, you can use the spaCy model ([see also](../docs/spacy.md)). The main advantages of the spaCy model is that it is fast and it includes most of the basic NLP tasks that you need for pre-processing texts in Danish. 
+To quickly get started with DaNLP and try out different NLP tasks, you can use the spaCy model ([see also](../frameworks/spacy.md)). The main advantages of the spaCy model is that it is fast and it includes most of the basic NLP tasks that you need for pre-processing texts in Danish. 
 
 The main functions are:  
 
@@ -28,7 +28,7 @@ The main functions are:
 
 ### Pre-processing tasks
 
-Perform [Part-of-Speech tagging](../docs/models/pos.md), [Named Entity Recognition](../docs/models/ner.md) and [dependency parsing](../docs/models/dependency.md) at the same time with the DaNLP spaCy model.
+Perform [Part-of-Speech tagging](../models/pos.md), [Named Entity Recognition](../models/ner.md) and [dependency parsing](../models/dependency.md) at the same time with the DaNLP spaCy model.
 Here is a snippet to quickly getting started: 
 
 ```python

diff --git a/docs/models/dependency.md b/docs/models/dependency.md
@@ -33,7 +33,7 @@ We provide a convertion function -- from dependencies to NP-chunks -- thus depen
 
 ## 🔧 SpaCy {#spacy}
 
-Read more about the SpaCy model in the dedicated [SpaCy docs](../spacy.md) , it has also been trained using the [Danish Dependency Treebank](../datasets.md#dane) dataset. 
+Read more about the SpaCy model in the dedicated [SpaCy docs](../frameworks/spacy.md) , it has also been trained using the [Danish Dependency Treebank](../datasets.md#dane) dataset. 
 
 ### Dependency Parser
 

diff --git a/docs/models/models.rst b/docs/models/models.rst
diff --git a/docs/models/ner.md b/docs/models/ner.md
@@ -21,7 +21,7 @@ The BERT [(Devlin et al. 2019)](https://www.aclweb.org/anthology/N19-1423/) NER
 has been finetuned on the [DaNE](../datasets.md#dane) 
 dataset [(Hvingelby et al. 2020)](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.565.pdf). The finetuning has been done using the [Transformers](https://github.com/huggingface/transformers) library from HuggingFace.
 
-To use the BERT NER model it can be loaded with the `load_bert_ner_model()` method. Please notice that it can maximum take 512 tokens as input at a time. For longer text sequences split before hand, for example be using sentence boundary detection (eg. by using the [spacy model](../spacy.md ).) 
+To use the BERT NER model it can be loaded with the `load_bert_ner_model()` method. Please notice that it can maximum take 512 tokens as input at a time. For longer text sequences split before hand, for example be using sentence boundary detection (eg. by using the [spacy model](../frameworks/spacy.md ).) 
 ```python
 from danlp.models import load_bert_ner_model
 
@@ -53,7 +53,7 @@ print(sentence.to_tagged_string())
 ```
 
 #### 🔧 spaCy {#spacy}
-The [spaCy](https://spacy.io/) model is trained for several NLP tasks [(read more here)](../spacy.md) uing the [DDT and DaNE](../datasets.md#dane) annotations.
+The [spaCy](https://spacy.io/) model is trained for several NLP tasks [(read more here)](../frameworks/spacy.md) uing the [DDT and DaNE](../datasets.md#dane) annotations.
 The spaCy model can be loaded with DaNLP to do NER predictions in the following way.
 ```python
 from danlp.models import load_spacy_model

diff --git a/docs/models/pos.md b/docs/models/pos.md
@@ -47,7 +47,7 @@ print(sentence.to_tagged_string())
 
 ##### 🔧 SpaCy {#spacy}
 
-Read more about the spaCy model in the dedicated [spaCy docs](../spacy.md) , it has also been trained using the [Danish Dependency Treebank](../datasets.md#dane) data. 
+Read more about the spaCy model in the dedicated [spaCy docs](../frameworks/spacy.md) , it has also been trained using the [Danish Dependency Treebank](../datasets.md#dane) data. 
 
 Below is a small getting started snippet for using the Spacy pos tagger:
 

diff --git a/docs/models/sentiment_analysis.md b/docs/models/sentiment_analysis.md
@@ -77,11 +77,11 @@ classifier._clases()
 
 SpaCy sentiment is a text classification model trained using spacy built in command line interface. It uses the CoNLL2017 word vectors (read about it [here](embeddings.md)).
 
-The model is trained using hard distil of the [BERT Tone](#wrenchbert-tone) (beta) - Meaning,  the BERT Tone model is used to make predictions on 50.000 sentences from Twitter and 50.000 sentences from [Europarl7](http://www.statmt.org/europarl/). These data is then used to trained a spacy model. Notice the dataset has first been balanced between the classes by oversampling. The model recognizes the classses: 'positiv', 'neutral' and 'negative'.
+The model is trained using hard distil of the [BERT Tone](#bert-tone) (beta) - Meaning,  the BERT Tone model is used to make predictions on 50.000 sentences from Twitter and 50.000 sentences from [Europarl7](http://www.statmt.org/europarl/). These data is then used to trained a spacy model. Notice the dataset has first been balanced between the classes by oversampling. The model recognizes the classses: 'positiv', 'neutral' and 'negative'.
 
 It is a first version. 
 
-Read more about using the Danish spaCy model [here](../spacy.md).
+Read more about using the Danish spaCy model [here](../frameworks/spacy.md).
 
 Below is a small snippet for getting started using the spaCy sentiment model. Currently the danlp packages provide both a spaCy model which do not provide any classes in the textcat module (so it is empty for you to train from scratch), and the sentiment spacy model which have pretrained the classes 'positiv', 'neutral' and 'negative'. Notice it is possible with the spacy command line interface to continue training of the sentiment classes, or add new tags. 
 

diff --git a/readthedocs/frameworks.rst b/readthedocs/frameworks.rst
@@ -0,0 +1,11 @@
+Frameworks
+==========
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Frameworks
+
+   docs/frameworks/spacy.md
+   docs/frameworks/flair.md
+   docs/frameworks/bert.md
diff --git a/readthedocs/index.rst b/readthedocs/index.rst
@@ -14,16 +14,16 @@ If you are new to NLP or want to know more about the project in a broader perspe
    :maxdepth: 1
    :caption: Getting started
 
-   gettingstarted/installation.md
-   gettingstarted/quickstart.md
-   gettingstarted/contributing.md
+   docs/gettingstarted/installation.md
+   docs/gettingstarted/quickstart.md
+   docs/gettingstarted/contributing.md
 
 .. toctree::
    :maxdepth: 2
    :caption: Documentation
 
-   docs/models/models.rst
-   docs/frameworks.rst
+   models.rst
+   frameworks.rst
    docs/datasets.md
 
 .. toctree::

diff --git a/readthedocs/library/download.rst b/readthedocs/library/download.rst
diff --git a/readthedocs/models.rst b/readthedocs/models.rst
@@ -0,0 +1,13 @@
+Models
+======
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Models
+
+   docs/models/embeddings.md
+   docs/models/pos.md
+   docs/models/ner.md
+   docs/models/dependency.md
+   docs/models/sentiment_analysis.md