deepset-ai · ZanSara · Feb 4, 2022 · Feb 2, 2022 · Feb 2, 2022 · Feb 3, 2022
diff --git a/.github/workflows/update_docsstrings_tutorials.yml b/.github/workflows/update_docsstrings_tutorials.yml
@@ -28,7 +28,7 @@ jobs:
       - name: Install dependencies
         run: |
           python -m pip install --upgrade pip
-          pip install pydoc-markdown==3.11.0
+          pip install pydoc-markdown
           pip install mkdocs
           pip install jupytercontrib
           pip install watchdog==1.0.2

diff --git a/docs/_src/api/api/crawler.md b/docs/_src/api/api/crawler.md
@@ -1,7 +1,9 @@
-<a name="crawler"></a>
+<a id="crawler"></a>
+
 # Module crawler
 
-<a name="crawler.Crawler"></a>
+<a id="crawler.Crawler"></a>
+
 ## Crawler
 
 ```python
@@ -20,31 +22,12 @@ Crawl texts from a website so that we can use them later in Haystack as a corpus
 |                         filter_urls= ["haystack\.deepset\.ai\/overview\/"])
 ```
 
-<a name="crawler.Crawler.__init__"></a>
-#### \_\_init\_\_
-
-```python
- | __init__(output_dir: str, urls: Optional[List[str]] = None, crawler_depth: int = 1, filter_urls: Optional[List] = None, overwrite_existing_files=True)
-```
-
-Init object with basic params for crawling (can be overwritten later).
+<a id="crawler.Crawler.crawl"></a>
 
-**Arguments**:
-
-- `output_dir`: Path for the directory to store files
-- `urls`: List of http(s) address(es) (can also be supplied later when calling crawl())
-- `crawler_depth`: How many sublinks to follow from the initial list of URLs. Current options:
-    0: Only initial list of urls 
-    1: Follow links found on the initial URLs (but no further) 
-- `filter_urls`: Optional list of regular expressions that the crawled URLs must comply with.
-    All URLs not matching at least one of the regular expressions will be dropped.
-- `overwrite_existing_files`: Whether to overwrite existing files in output_dir with new content
-
-<a name="crawler.Crawler.crawl"></a>
 #### crawl
 
 ```python
- | crawl(output_dir: Union[str, Path, None] = None, urls: Optional[List[str]] = None, crawler_depth: Optional[int] = None, filter_urls: Optional[List] = None, overwrite_existing_files: Optional[bool] = None) -> List[Path]
+def crawl(output_dir: Union[str, Path, None] = None, urls: Optional[List[str]] = None, crawler_depth: Optional[int] = None, filter_urls: Optional[List] = None, overwrite_existing_files: Optional[bool] = None) -> List[Path]
 ```
 
 Craw URL(s), extract the text from the HTML, create a Haystack Document object out of it and save it (one JSON
@@ -53,43 +36,36 @@ You can optionally specify via `filter_urls` to only crawl URLs that match a cer
 All parameters are optional here and only meant to overwrite instance attributes at runtime.
 If no parameters are provided to this method, the instance attributes that were passed during __init__ will be used.
 
-**Arguments**:
-
-- `output_dir`: Path for the directory to store files
-- `urls`: List of http addresses or single http address
-- `crawler_depth`: How many sublinks to follow from the initial list of URLs. Current options:
+:param output_dir: Path for the directory to store files
+:param urls: List of http addresses or single http address
+:param crawler_depth: How many sublinks to follow from the initial list of URLs. Current options:
                       0: Only initial list of urls
                       1: Follow links found on the initial URLs (but no further)
-- `filter_urls`: Optional list of regular expressions that the crawled URLs must comply with.
+:param filter_urls: Optional list of regular expressions that the crawled URLs must comply with.
                    All URLs not matching at least one of the regular expressions will be dropped.
-- `overwrite_existing_files`: Whether to overwrite existing files in output_dir with new content
+:param overwrite_existing_files: Whether to overwrite existing files in output_dir with new content
 
-**Returns**:
+:return: List of paths where the crawled webpages got stored
 
-List of paths where the crawled webpages got stored
+<a id="crawler.Crawler.run"></a>
 
-<a name="crawler.Crawler.run"></a>
 #### run
 
 ```python
- | run(output_dir: Union[str, Path, None] = None, urls: Optional[List[str]] = None, crawler_depth: Optional[int] = None, filter_urls: Optional[List] = None, overwrite_existing_files: Optional[bool] = None, return_documents: Optional[bool] = False) -> Tuple[Dict, str]
+def run(output_dir: Union[str, Path, None] = None, urls: Optional[List[str]] = None, crawler_depth: Optional[int] = None, filter_urls: Optional[List] = None, overwrite_existing_files: Optional[bool] = None, return_documents: Optional[bool] = False) -> Tuple[Dict, str]
 ```
 
 Method to be executed when the Crawler is used as a Node within a Haystack pipeline.
 
-**Arguments**:
-
-- `output_dir`: Path for the directory to store files
-- `urls`: List of http addresses or single http address
-- `crawler_depth`: How many sublinks to follow from the initial list of URLs. Current options:
+:param output_dir: Path for the directory to store files
+:param urls: List of http addresses or single http address
+:param crawler_depth: How many sublinks to follow from the initial list of URLs. Current options:
                       0: Only initial list of urls
                       1: Follow links found on the initial URLs (but no further)
-- `filter_urls`: Optional list of regular expressions that the crawled URLs must comply with.
+:param filter_urls: Optional list of regular expressions that the crawled URLs must comply with.
                    All URLs not matching at least one of the regular expressions will be dropped.
-- `overwrite_existing_files`: Whether to overwrite existing files in output_dir with new content
-- `return_documents`: Return json files content
-
-**Returns**:
+:param overwrite_existing_files: Whether to overwrite existing files in output_dir with new content
+:param return_documents:  Return json files content
 
-Tuple({"paths": List of filepaths, ...}, Name of output edge)
+:return: Tuple({"paths": List of filepaths, ...}, Name of output edge)
 
diff --git a/docs/_src/api/api/document_classifier.md b/docs/_src/api/api/document_classifier.md
@@ -1,26 +1,31 @@
-<a name="base"></a>
+<a id="base"></a>
+
 # Module base
 
-<a name="base.BaseDocumentClassifier"></a>
+<a id="base.BaseDocumentClassifier"></a>
+
 ## BaseDocumentClassifier
 
 ```python
 class BaseDocumentClassifier(BaseComponent)
 ```
 
-<a name="base.BaseDocumentClassifier.timing"></a>
+<a id="base.BaseDocumentClassifier.timing"></a>
+
 #### timing
 
 ```python
- | timing(fn, attr_name)
+def timing(fn, attr_name)
 ```
 
 Wrapper method used to time functions.
 
-<a name="transformers"></a>
+<a id="transformers"></a>
+
 # Module transformers
 
-<a name="transformers.TransformersDocumentClassifier"></a>
+<a id="transformers.TransformersDocumentClassifier"></a>
+
 ## TransformersDocumentClassifier
 
 ```python
@@ -74,57 +79,17 @@ With this document_classifier, you can directly get predictions via predict()
 |    p.run(file_paths=file_paths)
  ```
 
-<a name="transformers.TransformersDocumentClassifier.__init__"></a>
-#### \_\_init\_\_
+<a id="transformers.TransformersDocumentClassifier.predict"></a>
 
-```python
- | __init__(model_name_or_path: str = "bhadresh-savani/distilbert-base-uncased-emotion", model_version: Optional[str] = None, tokenizer: Optional[str] = None, use_gpu: bool = True, return_all_scores: bool = False, task: str = 'text-classification', labels: Optional[List[str]] = None, batch_size: int = -1, classification_field: str = None)
-```
-
-Load a text classification model from Transformers.
-Available models for the task of text-classification include:
-- ``'bhadresh-savani/distilbert-base-uncased-emotion'``
-- ``'Hate-speech-CNERG/dehatebert-mono-english'``
-
-Available models for the task of zero-shot-classification include:
-- ``'valhalla/distilbart-mnli-12-3'``
-- ``'cross-encoder/nli-distilroberta-base'``
-
-See https://huggingface.co/models for full list of available models.
-Filter for text classification models: https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads
-Filter for zero-shot classification models (NLI): https://huggingface.co/models?pipeline_tag=zero-shot-classification&sort=downloads&search=nli
-
-**Arguments**:
-
-- `model_name_or_path`: Directory of a saved model or the name of a public model e.g. 'bhadresh-savani/distilbert-base-uncased-emotion'.
-See https://huggingface.co/models for full list of available models.
-- `model_version`: The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
-- `tokenizer`: Name of the tokenizer (usually the same as model)
-- `use_gpu`: Whether to use GPU (if available).
-- `return_all_scores`: Whether to return all prediction scores or just the one of the predicted class. Only used for task 'text-classification'.
-- `task`: 'text-classification' or 'zero-shot-classification'
-- `labels`: Only used for task 'zero-shot-classification'. List of string defining class labels, e.g.,
-["positive", "negative"] otherwise None. Given a LABEL, the sequence fed to the model is "<cls> sequence to
-classify <sep> This example is LABEL . <sep>" and the model predicts whether that sequence is a contradiction
-or an entailment.
-- `batch_size`: batch size to be processed at once
-- `classification_field`: Name of Document's meta field to be used for classification. If left unset, Document.content is used by default.
-
-<a name="transformers.TransformersDocumentClassifier.predict"></a>
 #### predict
 
 ```python
- | predict(documents: List[Document]) -> List[Document]
+def predict(documents: List[Document]) -> List[Document]
 ```
 
 Returns documents containing classification result in meta field.
 Documents are updated in place.
 
-**Arguments**:
-
-- `documents`: List of Document to classify
-
-**Returns**:
-
-List of Document enriched with meta information
+:param documents: List of Document to classify
+:return: List of Document enriched with meta information