Skip to content
A Serverless Crawler For Real State Data in Vancouver Using AWS Lambda, Dynamo, RDS MySQL and CloudWatch
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Bootstrapper Adding .ZIP packages for each lambda function Aug 22, 2017
Lambda_Setup - ZIP Packages for each function
ListingScraper Adding .ZIP packages for each lambda function Aug 22, 2017
Misc Initial Commit Aug 16, 2017
SearchResultsPaginator
.gitignore Initial commit Aug 16, 2017
LICENSE Initial commit Aug 16, 2017
README.md

README.md

ServerlessCrawler-Vancouver Real State

What is this project all about?

This project is a showcase of a concept I've been playing with for a while: Serverless Crawlers. (If you don't know what a Crawler is, feel free to visit my Crawler101 Repository). The goal and pros/cons of using this architecture can be found on my medium post

The goal here was to write an automatic data mining process (crawler) to capture real state data from Greater Vancouver Area listings. The catch? There's no actual server to be maintained. Once this is setup, all you need is a trigger to start the capture, and it runs by itself 100% on #AWS, nearly zero dolars a month.

We can leverage the Free Tier of 2 out of the 4 AWS services used on the project. Only Dynamo DB and RDS MySQL will cost anything, but still, you can keep a DynamoDB table running for 2 bucks a month, and an RDS MySQL database for cents (keeping it stopped while you're not using it) For more details you can refer to the cost's page on this project's wiki

What do I need before I start?

An Amazon Web Services Account, some python knowledge

What is the Tech Stack behind this?

  • AWS Lambda for the processing of the HTML pages and data scraping
  • DynamoDB for caching the urls to be captured, and to trigger lambda functions
  • RDS MySQL as the end database for the processed and structured data to be stored

Architecture

About Me

Marcello Lins is passionate about technology and crunching data for fun. Feel free to connect with me through Linkedin and find more about what I'm working at via my AboutMe Profile. Visit https://techflow.me/ for more awesomeness !

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.