Skip to content

cerinoligutom/connect-mls-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

connect-mls-scraper

An agent scraper for Connect MLS

Features

  • Have 4 inputs available that can be manipulated in the .env file
    • Status
    • Search Minimum Price
    • Search Maximum Price
    • Months Back
  • Summary for scraped data
  • Cross checks and updates fields of agents when duplicates are found
  • Creates 3 outputs
    1. For accumulated data
    2. Checkpoint for each accumulated data
    3. Instance of accumulated data

Prerequisites

  • NodeJS (get @ https://nodejs.org/)
  • Connect MLS account
  • .env file for credentials, parameters and configs
    • Make sure to fill up the USERID and PASSWORD section

Install dependencies

Run this command on your terminal/cmd.

$ npm i

Start scraping

Run this command on your terminal/cmd and wait for it to finish.

$ npm start

Output directory info

For every scrape instance, there will be 3 files that will be generated in the ~/connect-mls-scraper/output folder.

~/connect-mls-scraper/output/
  • This will contain the most recent and accumulated scraped data under the file named agents.csv.
  • This will also contain 2 folders named checkpoints and instances.
~/connect-mls-scraper/output/checkpoints/
  • File names here will be in the format agents-<date_in_milliseconds>.csv
  • Files here will be the updated agents.csv for the scrape instance.
~/connect-mls-scraper/output/instances/
  • File names here will be in the format <date_in_milliseconds>.csv
  • Files here will be the agents scraped for the scrape instance.

Example .env file

# Credentials
USERID=
PASSWORD=

# Parameters
STATUS_VALUE=ACT,NEW,BOM,EXT,PCH,AO,RFR,SLD,PND,PDB,EXP,CAN,WDN
SEARCH_PRICE_MIN=1
SEARCH_PRICE_MAX=2
MONTHS_BACK=24 Months

# Scraper Config

# Increase if it takes too long on your end to load the search form
NAVIGATE_TO_SEARCH_FORM_DELAY=5000  

# Increase if it takes too long to load each listing page on your end
LISTING_PAGE_DELAY=1000  

# Waiting time before agent tab closes
AGENT_PAGE_DELAY=2000  

# Set to "false" if you want to see the browser in action, "true" if not
SILENT=false

# Usually it's 500 max due to limitation of Connect MLS Search results
SEARCH_RESULTS_LIMIT=

Troubleshooting

  • An error occurred midway. Terminate the process with ctrl+C or closing the terminal/cmd.
  • If it takes too long on your end to load pages (usually due to internet connection), try increasing the *_DELAY values in the config as per page concerned.

Bugs/Issues

  • Please open a ticket on the issues section and I will look into it when I find time.

About

An agent scraper for Connect MLS using puppeteer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published