Chrome Extension

The web-scraper-helper Google Chrome extension allows you to easily create and test web scraping configurations to use with Coveo Cloud V2 Web and Sitemap source types.

The web scraping configuration developed with the extension can tell the crawler to exclude web page sections and extract metadata (see Web Scraping Configuration). The extension does not currently support testing the creation of sub-items.

The extension provides both a GUI and a JSON text editor to create, save, and test your web scraping configuration on specific pages, and immediately see the results.

Installation

The extension is publicly available on the chrome web store. This is the preferred way to install it.

Build

If you want to contribute, feel free to download the code, build, modify and submit Pull Requests. Here are the build steps.

Download/clone this Git repo.
Build React app (see next section)
In Google Chrome:
1. Go to the Extensions page.
2. Select the Developer mode check box.
3. Click Load unpack extension.
4. Browse the chrome_extension folder of the repo, and then click Select.
5. Ensure the Enabled check box is selected for the web-scraper-helper extension.

Build Stencil Application

The UI in the Developer tools is done using Stencil. You need to build the Stencil application before installing the Chrome Extension.

in folder ../panel-app
do npm install
then npm run build

Usage

With Google Chrome, go to any web page for which you want to create a web scraping configuration.
Open the Chrome Developer tools (Mac: alt+cmd+I | Windows: Control+Shift+I).
In the Developer tools pane, select to the new Web Scraping tab, and then:
1. Click Create a new file to be able to start the configuration.
2. Exclude a section of the page (such as the header that you typically do not want to index).
  
  The excluded section appears with a semi-transparent white overlay.
3. Click on the Metadata to extract tab. Extract a piece of the page as a metadata.
  
  The extracted value appears in the Metadata name and Value table.
  
  Demo:
4. Click Save once you are happy with your web scraping configuration.
Test and fine-tune your web scraping configuration with other pages to which it applies.
Once happy with the web scraping configuration, use the content of your saved file by clicking on the Copy to clipboard in the JSON tab.
In the Coveo Cloud V2 administration console, paste your JSON web scraping configuration to your source configuration:
- Web source (see Add/Edit Web Source - Panel)
- Sitemap source (see Add/Edit Sitemap Source - Panel)
Rebuild your source.
Validate that your web scraping configuration performed as expected on all source items.

Dependencies

Google Chrome or Chromium

References

https://developer.chrome.com/extensions/getstarted
https://github.com/lgcarrier/coveo-developer-insight-panel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Chrome Extension

Installation

Build

Build Stencil Application

Usage

Dependencies

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Chrome Extension

Installation

Build

Build Stencil Application

Usage

Dependencies

References