You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Config - Parsing One way of viewing parsing config is through modules, each extracting text. This text can then either be converted into new requests, or passed on as parsed items. Right now, i can think of a few:
For example, extract a list of text, and convert all of them into lists.
However, what if we want to extract a list of items (objects)? An example is a list of products (search results).
One way to model it is to use nested extraction rules.
For example, use a css selector to select all <li> elements, then use css selectors to query for title and url and description, resulting in a list of objects.
It should also be possible to combine multiple selectors together and merge them into the list of items. For example, what if the search results are split into 2, and require two different selectors? or what each selector returns empty on certain page states? This allows for more parsing flexibility.
And what if we want to select different types of items that are present on each page? Then we would need multiple different sets of extraction rules, one for each type, and tag each parsed item with the corresponding type.
%Extractor{} that defines the extraction method
Item extraction - a list of fragment extractors with a nested list of attribute extractors, with each attribute having an extractor, attr key. limit to 1 level for now. list extractors -> attribute extractors. Tag each item with a item_type
request extraction - a list of extractors, where text extracted is converted into urls.
The text was updated successfully, but these errors were encountered:
Ziinc
changed the title
parse the response using a variable configuration.
app/requestor: parse a response using a variable configuration.
May 15, 2022
Config - Parsing One way of viewing parsing config is through modules, each extracting text. This text can then either be converted into new requests, or passed on as parsed items. Right now, i can think of a few:
For example, extract a list of text, and convert all of them into lists.
However, what if we want to extract a list of items (objects)? An example is a list of products (search results).
One way to model it is to use nested extraction rules.
For example, use a css selector to select all
<li>
elements, then use css selectors to query fortitle
andurl
anddescription
, resulting in a list of objects.It should also be possible to combine multiple selectors together and merge them into the list of items. For example, what if the search results are split into 2, and require two different selectors? or what each selector returns empty on certain page states? This allows for more parsing flexibility.
And what if we want to select different types of items that are present on each page? Then we would need multiple different sets of extraction rules, one for each type, and tag each parsed item with the corresponding type.
method
item_type
The text was updated successfully, but these errors were encountered: