# Private Property Search

In this example we execute a simple search on [Private Property](https://www.privateproperty.co.za/) and extract some useful information from the result using [LXML](http://lxml.de/).

In [None]:
import urllib.parse
import requests
import lxml.html
import pandas as pd

Documentation for `lxml.html` is [here](http://lxml.de/lxmlhtml.html).

## Retrieve Page

In [None]:
URL = 'https://www.privateproperty.co.za/Portal/Search/SearchBoxSearch'

Look at the form for the search box on the home page of Private Property and find minimal required set of parameters.

In [None]:
params = {
    'locationPhrase' : 'Glenwood, Durban',
    'listingType' : 'Sales',
}

In [None]:
with requests.get(URL, params=params) as r:
    doc = lxml.html.fromstring(r.text)

In [None]:
doc

Open the document in browser. This is a good check to ensure that you are getting what's expected.

In [None]:
lxml.html.open_in_browser(doc)

The styling is gone but the content is there.

## Parse Page

Locate the `<div>` with information for each of the properties.

In [None]:
information = doc.cssselect('.infoHolder')

This is a list of `Element` objects.

In [None]:
information

### A Single Element

Let's look at the first property in the list.

In [None]:
first_property = information[0]

Extract the tag name.

In [None]:
first_property.tag

### Elements are like Dictionaries

Access to attributes is similar to a dictionary.

In [None]:
first_property.items()

In [None]:
first_property.keys()

Similar to a dictionary, but not exactly the same. The indexing operator, for example, doesn't work and you need to use the `get()` method.

In [None]:
first_property.get('class')

### Elements are like Lists

You can access sub-elements by treating an `Element` as a list.

In [None]:
first_property[0]

In [None]:
first_property[0].text_content()

### Recursing into an Element

Extract three important attributes of first property by searching further into document tree.

In [None]:
first_property.cssselect('.title')[0].text_content()

In [None]:
first_property.cssselect('.address')[0].text_content()

In [None]:
first_property.cssselect('.priceDescription')[0].text_content()

But we can do that with a single selector and list comprehension. There are a few ways to do this. Here we are using a group selector.

In [None]:
[f.text_content() for f in information[0].cssselect('.title, .address, .priceDescription')]

### Multiple Tags

Now do it systematically across all properties.

In [None]:
properties = [[f.text_content() for f in info.cssselect('.title, .address, .priceDescription')]
              for info in information]

In [None]:
properties

For ease of use we can convert this into a data frame.

In [None]:
pd.DataFrame(properties, columns=['description', 'price', 'address'])