Skip to content
This repository has been archived by the owner on Apr 1, 2023. It is now read-only.

[Question] Issue scraping html table with Goutte #431

Closed
Gabotron-ES opened this issue Oct 14, 2020 · 0 comments
Closed

[Question] Issue scraping html table with Goutte #431

Gabotron-ES opened this issue Oct 14, 2020 · 0 comments

Comments

@Gabotron-ES
Copy link

Hi everybody, I'm trying to scrape an html table of cities by population, with Goutte in laravel, I want to return the html table as php array and then turn it into json and save it to disk.

For some reason when I crawl the table I get an array full of null values, this is my code:

public function crawlAustraliaHtmlTable(Request $request)
    {
        $html='';
        $client = new Client();
        $url = 'http://www.geoba.se/population.php?cc=AU&st=city_rank_country&asde=&page=1';
        $crawler = $client->request('GET', $url);
        //$crawler->addHTMLContent($html);
        
        $table = $crawler->filter('table')->filter('tr')->each(function ($tr, $i) {
            return $tr->filter('td')->each(function ($td, $i) {
                $td->filter('a')->each(function ($a, $i) {
                    return $a->attr('href');
                });
            });
        });
        
        //print_r($table);

        $json = json_encode($table);

        $filename = 'cities_in_australia.json';

        File::put(public_path('/uploads/'.$filename),$json);

        return response()->json([
            'json' => $json,
        ]);
    }

The result (notice all the nulls for some reason).

[[null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],[null,null,null,null,null],...]]

The html table structure is like this:

<table border=0 cellpadding=3 cellspacing=3 class="table table-condensed table-noline">

<tr style="font-size: 16px;">

<th class="bottom" valign=top width=50 align=left NOWRAP><b><a class=redglow style="color:#0000FF;" href="population.php?cc=AU&st=crcountry&asde=d&page=1" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Sort By Rank'); return false;">Rank</a></b></td>
<th class="bottom" valign=top width=200 align=left NOWRAP><b><a class=redglow style="color:#0000FF;" href="population.php?cc=AU&st=city&asde=d&page=1" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Sort By City'); return false;">City</a></b></td><th class="bottom" valign=top width=125 align=left><b><a class=redglow style="color:#0000FF;" href="population.php?cc=AU&st=state&asde=d&page=1" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Sort By State'); return false;">State</a></b></td><th class="bottom" valign=top width=100 align=left><b>Country</b></td><th class="bottom" valign=top width=75 align=right NOWRAP><b><a class=redglow style="color:#0000FF;" href="population.php?cc=AU&st=pop&asde=d&page=1" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Sort By Population'); return false;">Population</a></b></td>
<td></td>
</tr>

	<tr style="font-size:13px;" class="bb">
	<td valign=top><a name="1"></a>1.</td>
	<td valign=top><a class=redglow style="color:#0000FF;" href="/location.php?query=2158177&geoid=Y" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Melbourne'); return false;">Melbourne</a></td>
	<td valign=top width=150><a class=redglow style="color:#0000FF;" href="population.php?sc=Victoria&state=Victoria" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Victoria'); return false;">Victoria</a></td><td valign=top><a class=redglow style="color:#0000FF;" href="country.php?cc=AU&year=2020" onClick="recordOutboundLink(this, 'Population - City - Australia', 'Australia'); return false;">Australia</a></td>
	<td valign=top align=right>3,730,206</td>
	
	<tr style="font-size:13px;" class="bb">
@FriendsOfPHP FriendsOfPHP locked and limited conversation to collaborators Aug 5, 2021
@fabpot fabpot closed this as completed Aug 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants