Skip to content

Commit

Permalink
Updated for Craft 3.
Browse files Browse the repository at this point in the history
  • Loading branch information
michaelrog committed Jun 1, 2019
1 parent a55353f commit d05b7fa
Show file tree
Hide file tree
Showing 64 changed files with 2,194 additions and 15,169 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
@@ -0,0 +1,7 @@
# Scraper Changelog

## 3.0.0 - 2019-06-01

### Added

- 3.x Beta release!
17 changes: 17 additions & 0 deletions LICENSE.md
@@ -0,0 +1,17 @@
Copyright © Michael Rog

Permission is hereby granted to any person obtaining a copy of this software (the “Software”) to use, copy, modify, merge, publish and/or distribute copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

1. **Don’t plagiarize.** The above copyright notice and this license shall be included in all copies or substantial portions of the Software.

2. **Don’t use the same license on more than one project.** Each licensed copy of the Software shall be actively installed in no more than one production environment at a time.

3. **Don’t mess with the licensing features.** Software features related to licensing shall not be altered or circumvented in any way, including (but not limited to) license validation, payment prompts, feature restrictions, and update eligibility.

4. **Pay up.** Payment shall be made immediately upon receipt of any notice, prompt, reminder, or other message indicating that a payment is owed.

5. **Follow the law.** All use of the Software shall not violate any applicable law or regulation, nor infringe the rights of any other person or entity.

Failure to comply with the foregoing conditions will automatically and immediately result in termination of the permission granted hereby. This license does not include any right to receive updates to the Software or technical support. Licensees bear all risk related to the quality and performance of the Software and any modifications made or obtained to it, including liability for actual and consequential harm, such as loss or corruption of data, and any necessary service, repair, or correction.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, INCLUDING SPECIAL, INCIDENTAL AND CONSEQUENTIAL DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
80 changes: 57 additions & 23 deletions README.md
@@ -1,55 +1,89 @@
# Scraper

_Easily fetch, slice, dice, and output HTML content from remote pages._
_Easily fetch, slice, dice, and output HTML (or XML) content from anywhere._

**Lovingly crafted by [Top Shelf Craft](https://topshelfcraft.com)**
**A [Top Shelf Craft](https://topshelfcraft.com) creation**
[Michael Rog](https://michaelrog.com), Proprietor


* * *


## tl;dr
## Installation

1. From your project directory, use Composer to require the plugin package:

```
composer require topshelfcraft/scraper
```

2. In the Control Panel, go to Settings → Plugins and click the “Install” button for Scraper.

**Scraper** allows you to easily fetch HTML content from any URL, create a DOM with it, select elements by CSS selector, find and manipulate DOM nodes, and save or output them using the power of Twig templates.
3. There is no Step 3.

_Scraper is also available for installation via the Craft CMS Plugin Store._

## Usage

Use **Scraper** to query content from remote URLs, select it by HTML and CSS selector, and output it in your Craft templates.
The Scraper plugin exposes a full-featured crawler object to your Twig template, allowing you to fetch, parse, and filter DOM elements from a remote source document.

For example:
### Instantiating a client

{% set acmeContent = craft.scraper.get("http://acmewidgets.com") %}
{% for widgets in acmeContent.find(".widget") %}
<div>{{ widget.innerText }}</div>
{% endfor %}
When invoking the plugin, you can choose whether to use SimpleHtmlDom or Symfony components to instantiate your crawler:

or...
```twig
{% set crawler = craft.scraper.using('symfony').get('https://zombo.com') %}
```
```twig
{% set crawler = craft.scraper.using('simplehtmldom').get('https://zombo.com') %}
```

{% set google = craft.scraper.get("http://google.com") %}
{% for link in google.find("a") %}
<li>{{ link.attr.href }}</li>
{% endfor% }
I generally recommend using the Symfony components; they are more powerful and resilient to malformed source code. (The SimpleHtmlDom crawler is included to provide backwards compatibility with Craft 2 projects.)

### Using the Symfony client

### What are the system requirements?
When you opt for Symfony components, the `get` method instantiates a full [BrowserKit](https://symfony.com/components/BrowserKit) client, giving you access to all the [BrowserKit](https://symfony.com/components/BrowserKit) and [DomCrawler](https://symfony.com/doc/current/components/dom_crawler.html) methods.

Craft 2.5+ and PHP 5.4+
You can iterate over the DOM elements from your source document like this:

```twig
{% for node in crawler.filter('h2 > a') %}
{{ node.text() }}
{% endfor %}
```

### I found a bug.
### Using the SimpleHtmlDom client

When you opt for the SimpleHtmlDom crawler, the `get` method instantiates a [SimpleHtmlDom](https://simplehtmldom.sourceforge.io/) client, giving you access to all the [SimpleHtmlDom methods](https://simplehtmldom.sourceforge.io/manual.htm).

You can iterate over the DOM elements from your source document like this:

Nah...
```twig
{% for node in crawler.find('h1') %}
{{ node.innertext() }}
{% endfor %}
```

### This is great! I still have questions.

### I triple-checked. It's a bug.
Ask a question on [StackExchange](http://craftcms.stackexchange.com/), and ping me with a URL via email or Discord.

Well, alright. Please open a GitHub Issue, submit a PR to the `dev` branch, or just email me to let me know.

### What are the system requirements?

Craft 3.0+ and PHP 7.0+


### I found a bug.

Please open a GitHub Issue, submit a PR to the `3.x.dev` branch, or just email me.


* * *

#### Contributors:

- Plugin development: [Michael Rog](http://michaelrog.com) / @michaelrog
- [Simple HTML DOM](http://simplehtmldom.sourceforge.net/): created by S. C. Chen
- Plugin development: [Michael Rog](http://michaelrog.com) / @michaelrog
- Includes the ["Simple HTML DOM"](http://simplehtmldom.sourceforge.net/) library, created by S. C. Chen
- Includes the Symfony [DomCrawler](https://symfony.com/doc/current/components/dom_crawler.html) via [Goutte](https://github.com/FriendsOfPHP/Goutte), created by S. C. Chen

57 changes: 57 additions & 0 deletions composer.json
@@ -0,0 +1,57 @@
{
"name": "topshelfcraft/scraper",
"type": "craft-plugin",
"description": "Easily fetch, parse, and rejigger HTML or XML from anywhere.",
"version": "3.0.0-beta.1",
"keywords": [
"craft",
"cms",
"craftcms",
"plugin",
"scraper",
"simplehtmldom",
"dom",
"fetch",
"html",
"remote",
"external",
"parse"
],
"license": "proprietary",
"homepage": "https://topshelfcraft.com",
"authors": [
{
"name": "Top Shelf Craft (Michael Rog)",
"homepage": "https://topshelfcraft.com"
}
],
"support": {
"email": "support@topshelfcraft.com",
"issues": "https://github.com/TopShelfCraft/Scraper/issues",
"source": "https://github.com/TopShelfCraft/Scraper",
"docs": "https://github.com/TopShelfCraft/Scraper"
},
"require": {
"php": ">=7",
"craftcms/cms": "^3.0",
"topshelfcraft/ranger": "^3.0",
"fabpot/goutte": "^3.2"
},
"autoload": {
"psr-4": {
"topshelfcraft\\scraper\\": "src/"
}
},
"extra": {
"name": "Scraper",
"handle": "scraper",
"schemaVersion": "0.0.0.0",
"hasSettings": false,
"hasCpSection": false,
"changelogUrl": "https://raw.githubusercontent.com/topshelfcraft/scraper/3.x/CHANGELOG.md",
"class": "topshelfcraft\\scraper\\Scraper",
"components": {
"scraper": "topshelfcraft\\scraper\\services\\Scraper"
}
}
}
146 changes: 0 additions & 146 deletions scraper/ScraperPlugin.php

This file was deleted.

27 changes: 0 additions & 27 deletions scraper/services/Scraper_BetaService.php

This file was deleted.

0 comments on commit d05b7fa

Please sign in to comment.