New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added scraper node #1

Merged
merged 3 commits into from Jun 30, 2018

Conversation

Projects
None yet
3 participants
@ankitjain28may
Contributor

ankitjain28may commented Jun 26, 2018

No description provided.

@shibasisp

This comment has been minimized.

Show comment
Hide comment
@shibasisp

shibasisp Jun 27, 2018

Collaborator

capture the request object generated from drupal and post it here.

Collaborator

shibasisp commented Jun 27, 2018

capture the request object generated from drupal and post it here.

@ankitjain28may

This comment has been minimized.

Show comment
Hide comment
@ankitjain28may

ankitjain28may Jun 27, 2018

Contributor

Request Header -

Host: localhost
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Accept: application/json, text/javascript, /; q=0.01
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://localhost/drupal/batch?id=533&op=start
X-Requested-With: XMLHttpRequest
Cookie: cookie value (I have removed)
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Content-Length: 0

Contributor

ankitjain28may commented Jun 27, 2018

Request Header -

Host: localhost
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Accept: application/json, text/javascript, /; q=0.01
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://localhost/drupal/batch?id=533&op=start
X-Requested-With: XMLHttpRequest
Cookie: cookie value (I have removed)
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Content-Length: 0

@ankitjain28may

This comment has been minimized.

Show comment
Hide comment
@ankitjain28may

ankitjain28may Jun 27, 2018

Contributor

Request body -

{ url:
'https://www.amazon.in/s/ref=lp_14019572031_pg_2?rh=n%3A976389031%2Cn%3A%211318447031%2Cn%3A%211318449031%2Cn%3A14019572031&page=1&ie=UTF8&qid=1530099349',
options:
{ context: 'li.s-result-item.celwidget',
left_html: '',
feeds_crawler: 1,
no_of_pages: '2',
delay: '0',
url_pattern:
'https://www.amazon.in/s/ref=lp_14019572031_pg_2?rh=n%3A976389031%2Cn%3A%211318447031%2Cn%3A%211318449031%2Cn%3A14019572031&page=$index&ie=UTF8&qid=1530099349',
initial_value: '1',
increment: '1',
inner_feeds_scraper: 1,
link_selector:
'div.a-fixed-left-grid-col.a-col-right > div > div > a.a-link-normal.s-access-detail-page.s-color-twister-title-link.a-text-normal',
base_url: '',
inner_page_selector: '#dp-container',
break_in_parts: 1,
no_of_parts: '5' } }

Contributor

ankitjain28may commented Jun 27, 2018

Request body -

{ url:
'https://www.amazon.in/s/ref=lp_14019572031_pg_2?rh=n%3A976389031%2Cn%3A%211318447031%2Cn%3A%211318449031%2Cn%3A14019572031&page=1&ie=UTF8&qid=1530099349',
options:
{ context: 'li.s-result-item.celwidget',
left_html: '',
feeds_crawler: 1,
no_of_pages: '2',
delay: '0',
url_pattern:
'https://www.amazon.in/s/ref=lp_14019572031_pg_2?rh=n%3A976389031%2Cn%3A%211318447031%2Cn%3A%211318449031%2Cn%3A14019572031&page=$index&ie=UTF8&qid=1530099349',
initial_value: '1',
increment: '1',
inner_feeds_scraper: 1,
link_selector:
'div.a-fixed-left-grid-col.a-col-right > div > div > a.a-link-normal.s-access-detail-page.s-color-twister-title-link.a-text-normal',
base_url: '',
inner_page_selector: '#dp-container',
break_in_parts: 1,
no_of_parts: '5' } }

@ankitjain28may

This comment has been minimized.

Show comment
Hide comment
@ankitjain28may

ankitjain28may Jun 27, 2018

Contributor

left_html will have the value of remaining html that is left to process

Contributor

ankitjain28may commented Jun 27, 2018

left_html will have the value of remaining html that is left to process

@shibasisp

This comment has been minimized.

Show comment
Hide comment
@shibasisp

shibasisp Jun 30, 2018

Collaborator

LGTM Thanks! 👍

Collaborator

shibasisp commented Jun 30, 2018

LGTM Thanks! 👍

@shibasisp shibasisp merged commit f1fe6e3 into dbjpanda:8.x-1.x Jun 30, 2018

@dbjpanda

This comment has been minimized.

Show comment
Hide comment
@dbjpanda

dbjpanda Jun 30, 2018

Owner

I tried to send a post request using postman with below data as raw(JSON) but I didn't get any output
Am I missing something ?

{
    "url": "https://www.amazon.in/s/ref=lp_14019572031_pg_2?rh=n%3A976389031%2Cn%3A%211318447031%2Cn%3A%211318449031%2Cn%3A14019572031&page=1&ie=UTF8&qid=1530099349",
    "options":{
            "context": "li.s-result-item.celwidget",
            "left_html": "",
            "feeds_crawler": 1,
            "no_of_pages": "2",
            "delay": "0",
            "url_pattern": "https://www.amazon.in/s/ref=lp_14019572031_pg_2?rh=n%3A976389031%2Cn%3A%211318447031%2Cn%3A%211318449031%2Cn%3A14019572031&page=$index&ie=UTF8&qid=1530099349",
            "initial_value": "1",
            "increment": "1",
            "inner_feeds_scraper": 1,
            "link_selector": "div.a-fixed-left-grid-col.a-col-right > div > div > a.a-link-normal.s-access-detail-page.s-color-twister-title-link.a-text-normal",
            "base_url": "",
            "inner_page_selector": "#dp-container",
            "break_in_parts": 1,
            "no_of_parts": "5"
    }
}

@ankitjain28may Have you set up any authentication for the api ?

Owner

dbjpanda commented Jun 30, 2018

I tried to send a post request using postman with below data as raw(JSON) but I didn't get any output
Am I missing something ?

{
    "url": "https://www.amazon.in/s/ref=lp_14019572031_pg_2?rh=n%3A976389031%2Cn%3A%211318447031%2Cn%3A%211318449031%2Cn%3A14019572031&page=1&ie=UTF8&qid=1530099349",
    "options":{
            "context": "li.s-result-item.celwidget",
            "left_html": "",
            "feeds_crawler": 1,
            "no_of_pages": "2",
            "delay": "0",
            "url_pattern": "https://www.amazon.in/s/ref=lp_14019572031_pg_2?rh=n%3A976389031%2Cn%3A%211318447031%2Cn%3A%211318449031%2Cn%3A14019572031&page=$index&ie=UTF8&qid=1530099349",
            "initial_value": "1",
            "increment": "1",
            "inner_feeds_scraper": 1,
            "link_selector": "div.a-fixed-left-grid-col.a-col-right > div > div > a.a-link-normal.s-access-detail-page.s-color-twister-title-link.a-text-normal",
            "base_url": "",
            "inner_page_selector": "#dp-container",
            "break_in_parts": 1,
            "no_of_parts": "5"
    }
}

@ankitjain28may Have you set up any authentication for the api ?

@ankitjain28may

This comment has been minimized.

Show comment
Hide comment
@ankitjain28may

ankitjain28may Jun 30, 2018

Contributor

@dbjpanda You made request to which route ?

Contributor

ankitjain28may commented Jun 30, 2018

@dbjpanda You made request to which route ?

@dbjpanda

This comment has been minimized.

Show comment
Hide comment
@dbjpanda

dbjpanda Jul 1, 2018

Owner

Just to update the issue incase any one wants to use the repo.
I was trying to post to localhost:9000 instead of a proper url/route.

So there are two methods available till now. one is get-static to scrap content from a static site and get-dynamic to get data from dynamic sites.

You need to post to something like localhost:9000/get-static with proper request object.

Owner

dbjpanda commented Jul 1, 2018

Just to update the issue incase any one wants to use the repo.
I was trying to post to localhost:9000 instead of a proper url/route.

So there are two methods available till now. one is get-static to scrap content from a static site and get-dynamic to get data from dynamic sites.

You need to post to something like localhost:9000/get-static with proper request object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment