For a Real-Estate organization, it is very important to recognize if a property is vacant or not. For a simple case, this project is aimed to classify whether a property is land or house.
Here we use four different types of images.
- Assessor Image (property images (manually collected) released once in 5 years)
- Aerial/Satellite Images extracted from Bing and Google
- Streetside Images extracted from Bing and Google
- OSM building corners (parcel boundary).
- Assessor images: Assessor images we have are 5-9 years old and in 5-9 years lot's of properties (house) have been demolished and lots of houses are built on vacant land. Therefore, if the label says "a property is a house" then the image might indicate that its a "land". There are many such cases, which makes the model performance poor when only trained on assessor images.
External Data Collection and Preparation:
Aerial, streetside images from google maps and bing maps are updated every 1-2 years and are more recent. A model would be more reliable in these images. However, due to the 1-2 years of lag, we may still end
- Aerial images from Bing and Google maps. Best source of image readily available and most recent when compared to others.
- Streetside images from Bing and Google maps. These images are not always clear.
- Building boundary coordinates for chicago from Open Street Map: The OSM data is an open source contribution and hence the data may not be updted so frequently. Overlay building boundaries (collected from OSM) on satellite static images collected from google maps. For details look here Overlay building boundary on static images
Even with the external data we see lots of our labels not consistent with the image picture. So we typically have a data issue. We plan to use best all the images (Mixture of experts model). Since, there is a good chance that at least one image type would be consistent with the label. Also, we use bootstrapping techniques to correct the labels and augment them per iteration.
Here's a snapshot of the images:
Models (Deep Nets)
Let us now discuss all different models employed for different types of images.
RESNET-18 + a little variation
RESNET-18 model is trained with Satellite Images from google maps. RESNET's can go very deep and are very robust to the problem of vanishing gradient. In doing so, they are able to learn very complex features within the image. The center pixel in the google extracted image is the Latitude and Longitude of the address location. We use the OSM data to crop the surronding of the building from a image. We then resize (128x128x3) and zeropad them to make a shape of 224x224x3. While 224x224x3 is not a necessity (since we dont use pretrined weights), we do it to respect the RESNET architecture.
Convnet model is trained with Overlayed Images i.e. The idea here is to expicitely provide the model with
the knowledge of house and land using coloring scheme. The roof top of the houses are colored red. This allows the model to
learn very quickly in few steps. Our experiment shows that the model was able to learn a good distinction in just 2-3 steps. We use a simple Conv-net architecture becasue now due to the colors the model no longer needs
a deep architecture to learn simple features. We havent tried, but judging by the overlayed pictures we
think even a simple model could do a descent job classifying the image.
Challange: The building boundaries required to create overlayed images are collected from Open Street map. These may not be updated as frequently as Google maps. Moreover, getting building boundaries for all the location may not be feasible. One way to generate colored image given an satellite view is to use Fully Convolutional Networks for semantic segmenting[TODO].
Autoencoders are used for Assessor Images. Assessor images are 5-9 years old and there is a high chance that a Land property then would be a house now. This means that despite the label might say house, the image might indicate a land. So we could either trust the labels or the image. Autoencoder are unsupervised techniques that do not require a label to make a distinction between two labels. We feed in the autoencoder with images of house and land and leave it for the autoencoder to find an encoding that could distinguish between land and house. We create a encoding space of 64 dimensions and try k-means clustering with 2 initial centers.
Challange: Assessor images might be expensive to obtain, since these images are manually collected by organization/individuals. In a real scenario, finding assessor image for every address is overstated.
Data Pipeline (Using Apache Airflow):
Below is a view of Data pipeline that is achieved using Apache Airflow Framework.