-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add api pages * Add latest docstring and tutorial changes * First sweep of usage docs * Add link to conversion script * Add import statements * Add summarization page * Add web crawler documentation * Add confidence scores usage * Add crawler api docs * Regenerate api docs * Update summarizer and translator api * Add api pages * Add latest docstring and tutorial changes * First sweep of usage docs * Add link to conversion script * Add import statements * Add summarization page * Add web crawler documentation * Add confidence scores usage * Add crawler api docs * Regenerate api docs * Update summarizer and translator api * Add indentation (pydoc-markdown 3.10.1) * Comment out metadata * Remove Finder deprecation message * Remove Finder in FAQ * Update tutorial link * Incorporate reviewer feedback * Regen api docs * Add type annotations Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
- Loading branch information
1 parent
b1e8ebf
commit 9626c0d
Showing
33 changed files
with
924 additions
and
151 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
<a name="crawler"></a> | ||
# Module crawler | ||
|
||
<a name="crawler.Crawler"></a> | ||
## Crawler Objects | ||
|
||
```python | ||
class Crawler(BaseComponent) | ||
``` | ||
|
||
Crawl texts from a website so that we can use them later in Haystack as a corpus for search / question answering etc. | ||
|
||
**Example:** | ||
```python | ||
| from haystack.connector import Crawler | ||
| | ||
| crawler = Crawler() | ||
| # crawl Haystack docs, i.e. all pages that include haystack.deepset.ai/docs/ | ||
| docs = crawler.crawl(urls=["https://haystack.deepset.ai/docs/latest/get_startedmd"], | ||
| output_dir="crawled_files", | ||
| filter_urls= ["haystack\.deepset\.ai\/docs\/"]) | ||
``` | ||
|
||
<a name="crawler.Crawler.__init__"></a> | ||
#### \_\_init\_\_ | ||
|
||
```python | ||
| __init__(output_dir: str, urls: Optional[List[str]] = None, crawler_depth: int = 1, filter_urls: Optional[List] = None, overwrite_existing_files=True) | ||
``` | ||
|
||
Init object with basic params for crawling (can be overwritten later). | ||
|
||
**Arguments**: | ||
|
||
- `output_dir`: Path for the directory to store files | ||
- `urls`: List of http(s) address(es) (can also be supplied later when calling crawl()) | ||
- `crawler_depth`: How many sublinks to follow from the initial list of URLs. Current options: | ||
0: Only initial list of urls | ||
1: Follow links found on the initial URLs (but no further) | ||
- `filter_urls`: Optional list of regular expressions that the crawled URLs must comply with. | ||
All URLs not matching at least one of the regular expressions will be dropped. | ||
- `overwrite_existing_files`: Whether to overwrite existing files in output_dir with new content | ||
|
||
<a name="crawler.Crawler.crawl"></a> | ||
#### crawl | ||
|
||
```python | ||
| crawl(output_dir: Union[str, Path, None] = None, urls: Optional[List[str]] = None, crawler_depth: Optional[int] = None, filter_urls: Optional[List] = None, overwrite_existing_files: Optional[bool] = None) -> List[Path] | ||
``` | ||
|
||
Craw URL(s), extract the text from the HTML, create a Haystack Document object out of it and save it (one JSON | ||
file per URL, including text and basic meta data). | ||
You can optionally specify via `filter_urls` to only crawl URLs that match a certain pattern. | ||
All parameters are optional here and only meant to overwrite instance attributes at runtime. | ||
If no parameters are provided to this method, the instance attributes that were passed during __init__ will be used. | ||
|
||
**Arguments**: | ||
|
||
- `output_dir`: Path for the directory to store files | ||
- `urls`: List of http addresses or single http address | ||
- `crawler_depth`: How many sublinks to follow from the initial list of URLs. Current options: | ||
0: Only initial list of urls | ||
1: Follow links found on the initial URLs (but no further) | ||
- `filter_urls`: Optional list of regular expressions that the crawled URLs must comply with. | ||
All URLs not matching at least one of the regular expressions will be dropped. | ||
- `overwrite_existing_files`: Whether to overwrite existing files in output_dir with new content | ||
|
||
**Returns**: | ||
|
||
List of paths where the crawled webpages got stored | ||
|
||
<a name="crawler.Crawler.run"></a> | ||
#### run | ||
|
||
```python | ||
| run(output_dir: Union[str, Path, None] = None, urls: Optional[List[str]] = None, crawler_depth: Optional[int] = None, filter_urls: Optional[List] = None, overwrite_existing_files: Optional[bool] = None, **kwargs) -> Tuple[Dict, str] | ||
``` | ||
|
||
Method to be executed when the Crawler is used as a Node within a Haystack pipeline. | ||
|
||
**Arguments**: | ||
|
||
- `output_dir`: Path for the directory to store files | ||
- `urls`: List of http addresses or single http address | ||
- `crawler_depth`: How many sublinks to follow from the initial list of URLs. Current options: | ||
0: Only initial list of urls | ||
1: Follow links found on the initial URLs (but no further) | ||
- `filter_urls`: Optional list of regular expressions that the crawled URLs must comply with. | ||
All URLs not matching at least one of the regular expressions will be dropped. | ||
- `overwrite_existing_files`: Whether to overwrite existing files in output_dir with new content | ||
|
||
**Returns**: | ||
|
||
Tuple({"paths": List of filepaths, ...}, Name of output edge) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
<a name="eval"></a> | ||
# Module eval | ||
|
||
<a name="eval.EvalRetriever"></a> | ||
## EvalRetriever Objects | ||
|
||
```python | ||
class EvalRetriever() | ||
``` | ||
|
||
This is a pipeline node that should be placed after a Retriever in order to assess its performance. Performance | ||
metrics are stored in this class and updated as each sample passes through it. To view the results of the evaluation, | ||
call EvalRetriever.print(). Note that results from this Node may differ from that when calling Retriever.eval() | ||
since that is a closed domain evaluation. Have a look at our evaluation tutorial for more info about | ||
open vs closed domain eval (https://haystack.deepset.ai/docs/latest/tutorial5md). | ||
|
||
<a name="eval.EvalRetriever.__init__"></a> | ||
#### \_\_init\_\_ | ||
|
||
```python | ||
| __init__(debug: bool = False, open_domain: bool = True) | ||
``` | ||
|
||
**Arguments**: | ||
|
||
- `open_domain`: When True, a document is considered correctly retrieved so long as the answer string can be found within it. | ||
When False, correct retrieval is evaluated based on document_id. | ||
- `debug`: When True, a record of each sample and its evaluation will be stored in EvalRetriever.log | ||
|
||
<a name="eval.EvalRetriever.run"></a> | ||
#### run | ||
|
||
```python | ||
| run(documents, labels: dict, **kwargs) | ||
``` | ||
|
||
Run this node on one sample and its labels | ||
|
||
<a name="eval.EvalRetriever.print"></a> | ||
|
||
```python | ||
| print() | ||
``` | ||
|
||
Print the evaluation results | ||
|
||
<a name="eval.EvalReader"></a> | ||
## EvalReader Objects | ||
|
||
```python | ||
class EvalReader() | ||
``` | ||
|
||
This is a pipeline node that should be placed after a Reader in order to assess the performance of the Reader | ||
individually or to assess the extractive QA performance of the whole pipeline. Performance metrics are stored in | ||
this class and updated as each sample passes through it. To view the results of the evaluation, call EvalReader.print(). | ||
Note that results from this Node may differ from that when calling Reader.eval() | ||
since that is a closed domain evaluation. Have a look at our evaluation tutorial for more info about | ||
open vs closed domain eval (https://haystack.deepset.ai/docs/latest/tutorial5md). | ||
|
||
<a name="eval.EvalReader.__init__"></a> | ||
#### \_\_init\_\_ | ||
|
||
```python | ||
| __init__(skip_incorrect_retrieval: bool = True, open_domain: bool = True, debug: bool = False) | ||
``` | ||
|
||
**Arguments**: | ||
|
||
- `skip_incorrect_retrieval`: When set to True, this eval will ignore the cases where the retriever returned no correct documents | ||
- `open_domain`: When True, extracted answers are evaluated purely on string similarity rather than the position of the extracted answer | ||
- `debug`: When True, a record of each sample and its evaluation will be stored in EvalReader.log | ||
|
||
<a name="eval.EvalReader.run"></a> | ||
#### run | ||
|
||
```python | ||
| run(labels, answers, **kwargs) | ||
``` | ||
|
||
Run this node on one sample and its labels | ||
|
||
<a name="eval.EvalReader.print"></a> | ||
|
||
```python | ||
| print(mode) | ||
``` | ||
|
||
Print the evaluation results | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
<a name="base"></a> | ||
# Module base | ||
|
||
<a name="text_to_sparql"></a> | ||
# Module text\_to\_sparql | ||
|
||
<a name="text_to_sparql.Text2SparqlRetriever"></a> | ||
## Text2SparqlRetriever Objects | ||
|
||
```python | ||
class Text2SparqlRetriever(BaseGraphRetriever) | ||
``` | ||
|
||
Graph retriever that uses a pre-trained Bart model to translate natural language questions given in text form to queries in SPARQL format. | ||
The generated SPARQL query is executed on a knowledge graph. | ||
|
||
<a name="text_to_sparql.Text2SparqlRetriever.format_result"></a> | ||
#### format\_result | ||
|
||
```python | ||
| format_result(result) | ||
``` | ||
|
||
Generate formatted dictionary output with text answer and additional info | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
<a name="base"></a> | ||
# Module base | ||
|
||
<a name="graphdb"></a> | ||
# Module graphdb | ||
|
||
<a name="graphdb.GraphDBKnowledgeGraph"></a> | ||
## GraphDBKnowledgeGraph Objects | ||
|
||
```python | ||
class GraphDBKnowledgeGraph(BaseKnowledgeGraph) | ||
``` | ||
|
||
Knowledge graph store that runs on a GraphDB instance | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
loaders: | ||
- type: python | ||
search_path: [../../../../haystack/connector] | ||
modules: ['crawler'] | ||
ignore_when_discovered: ['__init__'] | ||
processor: | ||
- type: filter | ||
expression: not name.startswith('_') and default() | ||
- documented_only: true | ||
- do_not_filter_modules: false | ||
- skip_empty_modules: true | ||
renderer: | ||
type: markdown | ||
descriptive_class_title: true | ||
descriptive_module_title: true | ||
add_method_class_prefix: false | ||
add_member_class_prefix: false | ||
filename: crawler.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
loaders: | ||
- type: python | ||
search_path: [../../../../haystack] | ||
modules: ['eval'] | ||
ignore_when_discovered: ['__init__'] | ||
processor: | ||
- type: filter | ||
expression: not name.startswith('_') and default() | ||
- documented_only: true | ||
- do_not_filter_modules: false | ||
- skip_empty_modules: true | ||
renderer: | ||
type: markdown | ||
descriptive_class_title: true | ||
descriptive_module_title: true | ||
add_method_class_prefix: false | ||
add_member_class_prefix: false | ||
filename: evaluation.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
loaders: | ||
- type: python | ||
search_path: [../../../../haystack/graph_retriever] | ||
modules: ['base', 'text_to_sparql'] | ||
ignore_when_discovered: ['__init__'] | ||
processor: | ||
- type: filter | ||
expression: not name.startswith('_') and default() | ||
- documented_only: true | ||
- do_not_filter_modules: false | ||
- skip_empty_modules: true | ||
renderer: | ||
type: markdown | ||
descriptive_class_title: true | ||
descriptive_module_title: true | ||
add_method_class_prefix: false | ||
add_member_class_prefix: false | ||
filename: graph_retriever.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
loaders: | ||
- type: python | ||
search_path: [../../../../haystack/knowledge_graph] | ||
modules: ['base', 'graphdb'] | ||
ignore_when_discovered: ['__init__'] | ||
processor: | ||
- type: filter | ||
expression: not name.startswith('_') and default() | ||
- documented_only: true | ||
- do_not_filter_modules: false | ||
- skip_empty_modules: true | ||
renderer: | ||
type: markdown | ||
descriptive_class_title: true | ||
descriptive_module_title: true | ||
add_method_class_prefix: false | ||
add_member_class_prefix: false | ||
filename: knowledge_graph.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
loaders: | ||
- type: python | ||
search_path: [../../../../haystack/summarizer] | ||
modules: ['base', 'transformers'] | ||
ignore_when_discovered: ['__init__'] | ||
processor: | ||
- type: filter | ||
expression: not name.startswith('_') and default() | ||
- documented_only: true | ||
- do_not_filter_modules: false | ||
- skip_empty_modules: true | ||
renderer: | ||
type: markdown | ||
descriptive_class_title: true | ||
descriptive_module_title: true | ||
add_method_class_prefix: false | ||
add_member_class_prefix: false | ||
filename: summarizer.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
loaders: | ||
- type: python | ||
search_path: [../../../../haystack/translator] | ||
modules: ['base', 'transformers'] | ||
ignore_when_discovered: ['__init__'] | ||
processor: | ||
- type: filter | ||
expression: not name.startswith('_') and default() | ||
- documented_only: true | ||
- do_not_filter_modules: false | ||
- skip_empty_modules: true | ||
renderer: | ||
type: markdown | ||
descriptive_class_title: true | ||
descriptive_module_title: true | ||
add_method_class_prefix: false | ||
add_member_class_prefix: false | ||
filename: translator.md |
Oops, something went wrong.