Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Search classes with spaces fails every time (even in the weather example you provided) #24

Closed
Fef0 opened this issue Jul 7, 2018 · 6 comments

Comments

@Fef0
Copy link

Fef0 commented Jul 7, 2018

Hi, I tried your weather example and it always trows an "invalid memory address". I tried to reproduce the same bug with another website and it can actually search only those classes without any spaces inside of them. I don't know why but your parser stopped understanding spaces.
I added a fmt.Println() function in order to print the only class search with spaces (grid), that's the code:

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"strings"

	"github.com/anaskhan96/soup"
)

func main() {
	fmt.Printf("Enter the name of the city : ")
	city, _ := bufio.NewReader(os.Stdin).ReadString('\n')
	city = city[:len(city)-1]
	cityInURL := strings.Join(strings.Split(city, " "), "+")
	url := "https://www.bing.com/search?q=weather+" + cityInURL
	resp, err := soup.Get(url)
	if err != nil {
		log.Fatal(err)
	}
	doc := soup.HTMLParse(resp)
	grid := doc.Find("div", "class", "b_antiTopBleed b_antiSideBleed b_antiBottomBleed")
	fmt.Println("Print grid:", grid)
	heading := grid.Find("div", "class", "wtr_titleCtrn").Find("div").Text()
	conditions := grid.Find("div", "class", "wtr_condition")
	primaryCondition := conditions.Find("div")
	secondaryCondition := primaryCondition.FindNextElementSibling()
	temp := primaryCondition.Find("div", "class", "wtr_condiTemp").Find("div").Text()
	others := primaryCondition.Find("div", "class", "wtr_condiAttribs").FindAll("div")
	caption := secondaryCondition.Find("div").Text()
	fmt.Println("City Name : " + heading)
	fmt.Println("Temperature : " + temp + "˚C")
	for _, i := range others {
		fmt.Println(i.Text())
	}
	fmt.Println(caption)
}

And that's the output:

Enter the name of the city : New York
Print grid: {<nil>  element `div` with attributes `class b_antiTopBleed b_antiSideBleed b_antiBottomBleed` not found}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x61d1f5]

goroutine 1 [running]:
github.com/anaskhan96/soup.findOnce(0x0, 0xc42005be68, 0x3, 0x3, 0xc420050000, 0x4aa247, 0xc420261e00)
	/home/fef0/go/src/github.com/anaskhan96/soup/soup.go:304 +0x315
github.com/anaskhan96/soup.Root.Find(0x0, 0x0, 0x0, 0x6e1e60, 0xc420242070, 0xc42005be68, 0x3, 0x3, 0x0, 0x0, ...)
	/home/fef0/go/src/github.com/anaskhan96/soup/soup.go:120 +0x8d
main.main()
	/home/fef0/Code/Go/Test/Test.go:26 +0x4e3
exit status 2

If you notice in the second line it was impossible to found the grid, but in facts it happens only because there are spaces in the class name.
I hope you can fix that as soon as possible, bye for now!

@anaskhan96
Copy link
Owner

Hi @Fef0, thanks for bringing this up with me. I'll look into this as soon as possible.

@danielnovais92
Copy link

I think I have the same issue, I'm doing:
newsDivs := doc.FindAll("div", "class", "media mt-15")

But mine is weirder... It works well on Windows 10 running with GoLand but has this error on a server running CentOS 7 (although different versions of Go):

go version go1.9.3 windows/amd64
go version go1.8 linux/amd64

@danielnovais92
Copy link

Somehow I managed to make it work. I should be using FindAllStrict (and that is working), but just FindAll worked for months..

@cskonopka
Copy link

cskonopka commented May 5, 2019

FindAllStrict also worked for me.

@greenpipig
Copy link

i think i have this issue too,hope author can fix it sooner,thx

@anaskhan96
Copy link
Owner

Hey guys, sorry for the 2+ year delay - work can be exhausting :/
This commit fixes the weather example.

It's been almost a year since Find and FindAll's purpose had been changed to matching any word in a phrase separated by a whitespace, and their previous purpose (matching the exact phrase) had been bestowed upon FindStrict and FindAllStrict. You can view this CHANGELOG for the details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants