New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy #143

Closed
Arnold1 opened this Issue Dec 31, 2016 · 13 comments

Comments

Projects
None yet
3 participants
@Arnold1

Arnold1 commented Dec 31, 2016

Hi,
how can i use a SOCKS5 proxy with your library? e.g:

 // Create a socks5 dialer
  dialer, err := proxy.SOCKS5("tcp", "127.0.0.1:9050", nil, proxy.Direct)
  if err != nil {
    log.Fatal(err)
  }

  // Setup HTTP transport
  tr := &http.Transport{
    Dial: dialer.Dial,
  }
  client := &http.Client{Transport: tr}

  res, err := client.Get("http://google.com")

how can you send a post request to login to a website?

Thanks,
Arnold

@aaudis

This comment has been minimized.

Show comment
Hide comment
@aaudis

aaudis Jan 2, 2017

Arnold,

use goquery.NewDocumentFromResponse on your response from client.Get

sending Post requests is Go thing not goquerys and you use Go standard net/http library, use http.Post or http.PostForm which ever suits best for your work and then parse response with goquery

aaudis commented Jan 2, 2017

Arnold,

use goquery.NewDocumentFromResponse on your response from client.Get

sending Post requests is Go thing not goquerys and you use Go standard net/http library, use http.Post or http.PostForm which ever suits best for your work and then parse response with goquery

@Arnold1

This comment has been minimized.

Show comment
Hide comment
@Arnold1

Arnold1 Jan 2, 2017

@aaudis thanks for your reply. also use NewDocumentFromResponse for 2?

could you tell me how goquery is different from https://github.com/PuerkitoBio/gocrawl ?

Arnold1 commented Jan 2, 2017

@aaudis thanks for your reply. also use NewDocumentFromResponse for 2?

could you tell me how goquery is different from https://github.com/PuerkitoBio/gocrawl ?

@aaudis

This comment has been minimized.

Show comment
Hide comment
@aaudis

aaudis Jan 2, 2017

@aaudis thanks for your reply. also use NewDocumentFromResponse for 2?

yes

could you tell me how goquery is different from https://github.com/PuerkitoBio/gocrawl ?

gocrawl is more like robot who walks through web page
goquery is more like if you targeting specific page and specific things

it depends on your use case, if you tell more about what you want to accomplish maybe I can give advice on what to use

aaudis commented Jan 2, 2017

@aaudis thanks for your reply. also use NewDocumentFromResponse for 2?

yes

could you tell me how goquery is different from https://github.com/PuerkitoBio/gocrawl ?

gocrawl is more like robot who walks through web page
goquery is more like if you targeting specific page and specific things

it depends on your use case, if you tell more about what you want to accomplish maybe I can give advice on what to use

@Arnold1

This comment has been minimized.

Show comment
Hide comment
@Arnold1

Arnold1 Jan 3, 2017

ok, i saved the response to a file. is it possible to pass goquery.NewDocumentFromResponse the response string which is read from a file? i could use NewDocumentFromReader?

Arnold1 commented Jan 3, 2017

ok, i saved the response to a file. is it possible to pass goquery.NewDocumentFromResponse the response string which is read from a file? i could use NewDocumentFromReader?

@mna

This comment has been minimized.

Show comment
Hide comment
@mna

mna Jan 3, 2017

Member

Use goquery.NewDocumentFromReader, the open file is a io.Reader so it can be passed to this func directly.

Member

mna commented Jan 3, 2017

Use goquery.NewDocumentFromReader, the open file is a io.Reader so it can be passed to this func directly.

@Arnold1

This comment has been minimized.

Show comment
Hide comment
@Arnold1

Arnold1 Jan 7, 2017

hi, i want to read the following infos from this website:
https://gist.github.com/Arnold1/1ac3a4b10a5c2372b1686a9b3d4f1e17

1.) title: title="American Eagle Outfitters jacket"
2.) price: <div class="price">$10 or data-post-actual-price="$10"

how would that work?

here is what i tried, could you tell me what i do wrong?

  doc.Find(".right-col .masonry .listing-con.shopping-tile.masonry-brick").Each(func(i int, s *goquery.Selection) {
    // For each item found, get the title and price
    title := s.Find("a").Text()
    //price := s.Find("i").Text()
    fmt.Printf("%d: %s\n", i, title)
  })

Arnold1 commented Jan 7, 2017

hi, i want to read the following infos from this website:
https://gist.github.com/Arnold1/1ac3a4b10a5c2372b1686a9b3d4f1e17

1.) title: title="American Eagle Outfitters jacket"
2.) price: <div class="price">$10 or data-post-actual-price="$10"

how would that work?

here is what i tried, could you tell me what i do wrong?

  doc.Find(".right-col .masonry .listing-con.shopping-tile.masonry-brick").Each(func(i int, s *goquery.Selection) {
    // For each item found, get the title and price
    title := s.Find("a").Text()
    //price := s.Find("i").Text()
    fmt.Printf("%d: %s\n", i, title)
  })
@mna

This comment has been minimized.

Show comment
Hide comment
@mna

mna Jan 7, 2017

Member

Untested:

doc.Find(".listing-con.shopping-tile").Each(func(_ int, s *goquery.Selection) {
  price := s.AttrOr("data-post-actual-price", "$0")
  title := s.Find(".image-con").AttrOr("title", "<unknown>")
  fmt.Println(price, title)
})
Member

mna commented Jan 7, 2017

Untested:

doc.Find(".listing-con.shopping-tile").Each(func(_ int, s *goquery.Selection) {
  price := s.AttrOr("data-post-actual-price", "$0")
  title := s.Find(".image-con").AttrOr("title", "<unknown>")
  fmt.Println(price, title)
})
@Arnold1

This comment has been minimized.

Show comment
Hide comment
@Arnold1

Arnold1 Jan 7, 2017

thanks for the fast reply :)
ok, where can i find the documentation for .Find? can you explain the code a bit?

is goquery good for parsing websites like that or might gocrawl be a better fit?

Arnold1 commented Jan 7, 2017

thanks for the fast reply :)
ok, where can i find the documentation for .Find? can you explain the code a bit?

is goquery good for parsing websites like that or might gocrawl be a better fit?

@mna

This comment has been minimized.

Show comment
Hide comment
@mna

mna Jan 7, 2017

Member

The full documentation for goquery is on godoc: https://godoc.org/github.com/PuerkitoBio/goquery

But what you want is probably the doc on the CSS selectors (i.e. what you can send in the string parameter to Find), and that is very much the same as jQuery, which is very much the same as the CSS selectors browser standard. The selector doc is not documented as part of the goquery package, but you can find some good reference here: https://developer.mozilla.org/en/docs/Web/Guide/CSS/Getting_started/Selectors

Goquery is exactly for that kind of job, manipulating an HTML document like that. Gocrawl is a web crawler/spider kind of tool.

Member

mna commented Jan 7, 2017

The full documentation for goquery is on godoc: https://godoc.org/github.com/PuerkitoBio/goquery

But what you want is probably the doc on the CSS selectors (i.e. what you can send in the string parameter to Find), and that is very much the same as jQuery, which is very much the same as the CSS selectors browser standard. The selector doc is not documented as part of the goquery package, but you can find some good reference here: https://developer.mozilla.org/en/docs/Web/Guide/CSS/Getting_started/Selectors

Goquery is exactly for that kind of job, manipulating an HTML document like that. Gocrawl is a web crawler/spider kind of tool.

@Arnold1

This comment has been minimized.

Show comment
Hide comment
@Arnold1

Arnold1 Jan 7, 2017

is it as powerful as python's Scrapy?

Arnold1 commented Jan 7, 2017

is it as powerful as python's Scrapy?

@mna

This comment has been minimized.

Show comment
Hide comment
@mna

mna Jan 8, 2017

Member

Probably not? I don't know scrapy, the scope of goquery is roughly jquery in Go, minus some methods that don't make as much sense outside of a real live DOM (which jquery gets in a browser vs goquery with a static parsed tree).

Member

mna commented Jan 8, 2017

Probably not? I don't know scrapy, the scope of goquery is roughly jquery in Go, minus some methods that don't make as much sense outside of a real live DOM (which jquery gets in a browser vs goquery with a static parsed tree).

@Arnold1

This comment has been minimized.

Show comment
Hide comment
@Arnold1

Arnold1 Jan 8, 2017

how to handle dynamic generated pages?

Arnold1 commented Jan 8, 2017

how to handle dynamic generated pages?

@mna

This comment has been minimized.

Show comment
Hide comment
@mna

mna Jan 8, 2017

Member

You will need other packages/tools to help you parse javascript-generated pages. Some pointers here: https://github.com/PuerkitoBio/goquery/wiki/Tips-and-tricks#handle-javascript-based-pages

Member

mna commented Jan 8, 2017

You will need other packages/tools to help you parse javascript-generated pages. Some pointers here: https://github.com/PuerkitoBio/goquery/wiki/Tips-and-tricks#handle-javascript-based-pages

@mna mna closed this Jan 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment