https://golang.org/
https://golang.org/doc/
https://github.com/jackdanger/collectlinks
https://godoc.org/github.com/PuerkitoBio/goquery
https://golang.org/dl/
https://golang.org/doc/install
1)Use apt package installer
sudo add-apt-repository ppa:longsleep/golang-backports
sudo apt update
sudo apt install golang-go
OR , Simply use Snap Installer
sudo snap install --classic go
1. Set $GOPATH environment variable points to Go Project directory.
2. Add $GOPATH/bin to $PATH
NOTE: Can set using Bash Alias (run to reset in each Go Project Root)
set go path in linux using command :
alias gopath='export GOPATH=$(pwd);export PATH=$PATH:$GOPATH/bin'
If Golang is already installed on your system and Go path is configured then follow the steps below to clone the repo and run the script in Linux console:
go get "github.com/jackdanger/collectlinks"
cd go/src
git clone https://github.com/agrawalnaman/web-crawler-using-Golang.git
command for running the script:go run crawl.go <Base URL>
cd web-crawler-using-Golang/
go run crawl.go http://rescale.com/
Crawled some small websites and manually checked for loops and infinite call backs
go run crawl.go https://www.crawler-test.com/links/repeated_internal_links
and other websites too .. for example
go run crawl.go https://www.crawler-test.com/links/page_with_external_links
A good website for testing the crawler is
go run crawl.go https://www.crawler-test.com/
Printed Timestamps after printing URLs of each page to show that multiple threads are simultaionsly parsing the URLs
Click here to view test run output snippet
ubuntu@ip-10-0-0-136:~/go/src/web-crawler-using-Golang$ go run crawl.go http://rescale.com/
[http://rescale.com/]
Start time of crawl of this URL: 2020-03-13 22:07:37.810176103 +0000 UTC m=+0.000699272
Stop time of crawl of this URL : 2020-03-13 22:07:38.578908454 +0000 UTC m=+0.769431501
http://rescale.com/
http://rescale.com/
http://rescale.com/jp/
http://rescale.com/chs/
http://rescale.com/kr/
http://rescale.com/products/
http://rescale.com/products/enterprise/
http://rescale.com/products/advanced/
http://rescale.com/products/elements/
http://rescale.com/products/government/
http://rescale.com/products/universities/
http://rescale.com/products/developer/
http://rescale.com/features/
http://rescale.com/features/visualization/
http://rescale.com/security/
http://rescale.com/pricing/
http://rescale.com/infrastructure/
http://rescale.com/features/file-management/
http://rescale.com/features/admin-portal/
http://rescale.com/features/api/
http://rescale.com/features/workflow/
http://rescale.com/
http://rescale.com/aws/
http://rescale.com/azure/
http://rescale.com/gcp/
http://rescale.com/ibm/
http://rescale.com/ansys/
http://rescale.com/avl/
http://rescale.com/siemens/
http://rescale.com/solutions/
http://rescale.com/solutions/engineers-and-scientists/
http://rescale.com/solutions/cxos/
http://rescale.com/solutions/hpc-management/
http://rescale.com/solutions/academia/
http://rescale.com/solutions/startups/
http://rescale.com/solutions/aerospace/
http://rescale.com/solutions/automotive/
http://rescale.com/solutions/consumer-products/
http://rescale.com/solutions/eda-and-electronics/
http://rescale.com/solutions/oil-and-gas/
http://rescale.com/solutions/life-sciences/
http://rescale.com/solutions/autonomous-driving/
http://rescale.com/solutions/automation-api-cli/
http://rescale.com/solutions/cae-mdo/
http://rescale.com/solutions/digital-twin/
http://rescale.com/solutions/driver-assistance-adas/
http://rescale.com/solutions/disaster-recovery/
http://rescale.com/solutions/iot-big-data/
http://rescale.com/solutions/machine-learning/
http://rescale.com/solutions/spdm/
http://rescale.com/storage/
http://rescale.com/software/
http://rescale.com/partners/
http://rescale.com/partners-list/
https://resources.rescale.com/
https://docs.rescale.com/
https://resources.rescale.com/?wpv-resource-type=white-paper
http://rescale.com/about/
http://rescale.com/investors/
https://resources.rescale.com//blog
https://resources.rescale.com//news
https://resources.rescale.com//events
http://rescale.com/jobs/
http://rescale.com/legal/
http://rescale.com/signup/
http://rescale.com/login/
https://info.rescale.com/case-studies/nissan
https://info.rescale.com/white-papers/cloud-3.0-the-rise-of-big-compute
https://info.rescale.com/case-studies/dinex-reduces-time-to-market-of-exhaust-systems-by-25-percent
https://www.youtube.com/watch?v=05HfJ8dZJXE
https://info.rescale.com/case-studies/boom-supersonic
https://www.youtube.com/watch?v=umiGy7fe5zc
https://www.youtube.com/watch?v=h1nsUGuklHw
https://www.youtube.com/watch?v=tPaq3Hmeg5Y
https://resources.rescale.com/?wpv-resource-type=video
https://resources.rescale.com/resource/a3-project-vahana-rescale-power-personal-flight/
https://resources.rescale.com/boom-technology-leverages-rescale-platform-to-enable-a-rebirth-of-supersonic-passenger-travel/
https://resources.rescale.com/resource/the-need-for-speed-drives-nascars-richard-childress-racing-to-the-cloud/
https://support.rescale.com/customer/en/portal/articles/2778993-trek-bicycle-uses-rescale-to-run-cutting-edge-coupled-optimization-analysis
http://rescale.com/booking/
https://resources.rescale.com/events/
https://resources.rescale.com/news/
https://resources.rescale.com/rescale-enables-faster-time-to-market-for-nissan/
https://resources.rescale.com/announcements/rescale-announces-strategic-partnership-offering-with-siemens-plm/
https://resources.rescale.com/announcements/rescale-announces-innovations-to-accelerate-time-to-results/
https://resources.rescale.com/announcements/rescale-receives-2018-hpcwire-editors-choice-award-for-best-hpc-in-the-cloud-platform/
https://resources.rescale.com/blog
http://info.rescale.com/contact_sales
https://www.linkedin.com/company/rescale/
https://twitter.com/rescaleinc
https://www.facebook.com/rescaleinc/
Start time of crawl of this URL: 2020-03-13 22:07:38.586394942 +0000 UTC m=+0.776918083
Stop time of crawl of this URL : 2020-03-13 22:07:39.315722492 +0000 UTC m=+1.506245536
http://rescale.com/products/
http://rescale.com/
http://rescale.com/products/
http://rescale.com/jp/products/
http://rescale.com/chs/products/
http://rescale.com/kr/products/
http://rescale.com/products/enterprise/
http://rescale.com/products/advanced/
http://rescale.com/products/elements/
http://rescale.com/products/government/
http://rescale.com/products/universities/
http://rescale.com/products/developer/
http://rescale.com/features/
http://rescale.com/features/visualization/
http://rescale.com/security/
http://rescale.com/pricing/
http://rescale.com/infrastructure/
http://rescale.com/features/file-management/
http://rescale.com/features/admin-portal/
http://rescale.com/features/api/
http://rescale.com/features/workflow/
http://rescale.com/products/
http://rescale.com/aws/
http://rescale.com/azure/
http://rescale.com/gcp/
http://rescale.com/ibm/
http://rescale.com/ansys/
http://rescale.com/avl/
http://rescale.com/siemens/
http://rescale.com/solutions/
http://rescale.com/solutions/engineers-and-scientists/
http://rescale.com/solutions/cxos/
http://rescale.com/solutions/hpc-management/
http://rescale.com/solutions/academia/
http://rescale.com/solutions/startups/
http://rescale.com/solutions/aerospace/
http://rescale.com/solutions/automotive/
http://rescale.com/solutions/consumer-products/
http://rescale.com/solutions/eda-and-electronics/
http://rescale.com/solutions/oil-and-gas/
http://rescale.com/solutions/life-sciences/
http://rescale.com/solutions/autonomous-driving/
http://rescale.com/solutions/automation-api-cli/
http://rescale.com/solutions/cae-mdo/
http://rescale.com/solutions/digital-twin/
http://rescale.com/solutions/driver-assistance-adas/
http://rescale.com/solutions/disaster-recovery/
http://rescale.com/solutions/iot-big-data/
http://rescale.com/solutions/machine-learning/
http://rescale.com/solutions/spdm/
http://rescale.com/storage/
http://rescale.com/software/
http://rescale.com/partners/
http://rescale.com/partners-list/
https://resources.rescale.com/
https://docs.rescale.com/
https://resources.rescale.com/?wpv-resource-type=white-paper
http://rescale.com/about/
http://rescale.com/investors/
https://resources.rescale.com//blog
https://resources.rescale.com//news
https://resources.rescale.com//events
http://rescale.com/jobs/
http://rescale.com/legal/
http://rescale.com/signup/
http://rescale.com/login/
http://rescale.com/products/pro/
http://rescale.com/booking/
http://info.rescale.com/contact_sales
https://resources.rescale.com/events/
https://resources.rescale.com/news/
https://resources.rescale.com/rescale-enables-faster-time-to-market-for-nissan/
https://resources.rescale.com/announcements/rescale-announces-strategic-partnership-offering-with-siemens-plm/
https://resources.rescale.com/announcements/rescale-announces-innovations-to-accelerate-time-to-results/
https://resources.rescale.com/announcements/rescale-receives-2018-hpcwire-editors-choice-award-for-best-hpc-in-the-cloud-platform/
https://resources.rescale.com/blog
https://www.linkedin.com/company/rescale/
https://twitter.com/rescaleinc
https://www.facebook.com/rescaleinc/
Start time of crawl of this URL: 2020-03-13 22:07:38.585727075 +0000 UTC m=+0.776250248
Stop time of crawl of this URL : 2020-03-13 22:07:39.326520789 +0000 UTC m=+1.517043764
http://rescale.com/chs/
http://rescale.com/chs/
http://rescale.com/
http://rescale.com/jp/
http://rescale.com/kr/
http://rescale.com/chs/products/
http://rescale.com/chs/products/enterprise/
http://rescale.com/chs/products/advanced/
http://rescale.com/chs/products/elements/
http://rescale.com/chs/products/government/
http://rescale.com/chs/products/universities/
http://rescale.com/chs/products/developer/
http://rescale.com/chs/features/
http://rescale.com/chs/features/visualization/
http://rescale.com/chs/security/
http://rescale.com/chs/pricing/
http://rescale.com/chs/infrastructure/
http://rescale.com/chs/features/file-management/
http://rescale.com/chs/features/admin-portal/
http://rescale.com/chs/features/api/
http://rescale.com/chs/features/workflow/
http://rescale.com/chs/
http://rescale.com/chs/aws/
http://rescale.com/chs/azure/
http://rescale.com/chs/gcp/
http://rescale.com/chs/ibm/
http://rescale.com/chs/ansys/
http://rescale.com/chs/avl/
http://rescale.com/chs/siemens/
http://rescale.com/chs/solutions/
http://rescale.com/chs/solutions/engineers-and-scientists/
http://rescale.com/chs/solutions/cxos/
http://rescale.com/chs/solutions/hpc-management/
http://rescale.com/chs/solutions/academia/
http://rescale.com/chs/solutions/startups/
http://rescale.com/chs/solutions/aerospace/
http://rescale.com/chs/solutions/automotive/
http://rescale.com/chs/solutions/consumer-products/
http://rescale.com/chs/solutions/eda-and-electronics/
http://rescale.com/chs/solutions/oil-and-gas/
http://rescale.com/chs/solutions/life-sciences/
http://rescale.com/chs/solutions/autonomous-driving/
http://rescale.com/chs/solutions/automation-api-cli/
http://rescale.com/chs/solutions/cae-mdo/
http://rescale.com/chs/solutions/digital-twin/
http://rescale.com/chs/solutions/driver-assistance-adas/
http://rescale.com/chs/solutions/disaster-recovery/
http://rescale.com/chs/solutions/iot-big-data/
http://rescale.com/chs/solutions/machine-learning/
http://rescale.com/chs/solutions/spdm/
http://rescale.com/chs/storage/
http://rescale.com/chs/software/
http://rescale.com/chs/partners/
http://rescale.com/chs/partners-list/
https://resources.rescale.com/
https://docs.rescale.com/
https://resources.rescale.com/?wpv-resource-type=white-paper
http://rescale.com/chs/about/
http://rescale.com/chs/investors/
https://resources.rescale.com//blog
https://resources.rescale.com//news
https://resources.rescale.com//events
http://rescale.com/chs/jobs/
http://rescale.com/chs/legal/
http://rescale.com/signup/
http://rescale.com/login/
https://info.rescale.com/case-studies/nissan
https://info.rescale.com/white-papers/cloud-3.0-the-rise-of-big-compute
https://info.rescale.com/case-studies/dinex-reduces-time-to-market-of-exhaust-systems-by-25-percent
https://www.youtube.com/watch?v=05HfJ8dZJXE
https://info.rescale.com/case-studies/boom-supersonic
https://www.youtube.com/watch?v=umiGy7fe5zc
https://www.youtube.com/watch?v=h1nsUGuklHw
https://www.youtube.com/watch?v=tPaq3Hmeg5Y
https://resources.rescale.com/?wpv-resource-type=video
https://resources.rescale.com/resource/a3-project-vahana-rescale-power-personal-flight/
https://resources.rescale.com/boom-technology-leverages-rescale-platform-to-enable-a-rebirth-of-supersonic-passenger-travel/
https://resources.rescale.com/resource/the-need-for-speed-drives-nascars-richard-childress-racing-to-the-cloud/
https://support.rescale.com/customer/en/portal/articles/2778993-trek-bicycle-uses-rescale-to-run-cutting-edge-coupled-optimization-analysis
http://rescale.com/chs/booking/
https://resources.rescale.com/events/
https://resources.rescale.com/blog
http://info.rescale.com/contact_sales
https://www.linkedin.com/company/rescale/
https://twitter.com/rescaleinc
https://www.facebook.com/rescaleinc/
Start time of crawl of this URL: 2020-03-13 22:07:38.5852948 +0000 UTC m=+0.775817962
Stop time of crawl of this URL : 2020-03-13 22:07:39.337943349 +0000 UTC m=+1.528466402
http://rescale.com/jp/
http://rescale.com/jp/
http://rescale.com/
http://rescale.com/chs/
http://rescale.com/kr/
http://rescale.com/jp/products/
http://rescale.com/jp/products/enterprise/
http://rescale.com/jp/products/advanced/
http://rescale.com/jp/products/elements/
http://rescale.com/jp/products/government/
http://rescale.com/jp/products/universities/
http://rescale.com/jp/products/developer/
http://rescale.com/jp/features/
http://rescale.com/jp/features/visualization/
http://rescale.com/jp/security/
http://rescale.com/jp/pricing/
http://rescale.com/jp/infrastructure/
http://rescale.com/jp/features/file-management/
http://rescale.com/jp/features/admin-portal/
http://rescale.com/jp/features/api/
http://rescale.com/jp/features/workflow/
http://rescale.com/jp/
http://rescale.com/jp/aws/
http://rescale.com/jp/azure/
http://rescale.com/jp/gcp/
http://rescale.com/jp/ibm/
http://rescale.com/jp/ansys/
http://rescale.com/jp/avl/
http://rescale.com/jp/siemens/
http://rescale.com/jp/solutions/
http://rescale.com/jp/solutions/engineers-and-scientists/
http://rescale.com/jp/solutions/cxos/