Euterpe

Euterpe is a web crawler that searches a website for internal and external broken links. Crawler is written in Python. Demo dashboard was bootstrapped with Create React App.

Available Scripts

Crawler

Requires scrapy installed, pip install scrapy.

scrapy crawl check_anchor_tags -t <type> -o <filename>

Note: this is only a prototype. Currently it only checks anchor tags

Runs crawler and logs output into a file.
-t json -o file.json will log output into file.json.
-t csv -o file.csv will log output into file.csv.

Demo Dashboard

Go into demo-dashboard folder and run either of these commands:

npm start

Runs the app in the development mode.
Open http://localhost:3000 to view it in the browser.

The page will reload if you make edits.
You will also see any lint errors in the console.
npm test

Launches the test runner in the interactive watch mode.
See the section about running tests for more information.
npm run build

Builds the app for production to the build folder.
It correctly bundles React in production mode and optimizes the build for the best performance.

The build is minified and the filenames include the hashes.
Your app is ready to be deployed!

See the section about deployment for more information.
npm run eject

Note: this is a one-way operation. Once you eject, you can’t go back!

If you aren’t satisfied with the build tool and configuration choices, you can eject at any time. This command will remove the single build dependency from your project.

Instead, it will copy all the configuration files and the transitive dependencies (Webpack, Babel, ESLint, etc) right into your project so you have full control over them. All of the commands except eject will still work, but they will point to the copied scripts so you can tweak them. At this point you’re on your own.

You don’t have to ever use eject. The curated feature set is suitable for small and middle deployments, and you shouldn’t feel obligated to use this feature. However we understand that this tool wouldn’t be useful if you couldn’t customize it when you are ready for it.

To Do

~~Map data with clicks data.~~
Allow input of website URL (currently hardcoded).
Allow checking of img src links.
Allow checking of script source links, css stylesheet links, favicon, etc?
Allow customisable settings for download timeout, DNS timeout, number of retries, etc.
Improve error handling.
Extract crawler statistics data (already exists but only in console).

Credits

This is a Government Digital Services (GDS) GovTech Hackweek Jan 2020 project. Team members involved are:

Cecilia Lim
Chan Win Hung
Cheong Jie Wei
Dave Quah
Lim Kim Yong

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data-mapping		data-mapping
data		data
demo-dashboard		demo-dashboard
linkchecker		linkchecker
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-mapping

data-mapping

data

data

demo-dashboard

demo-dashboard

linkchecker

linkchecker

README.md

README.md

scrapy.cfg

scrapy.cfg

Repository files navigation

Euterpe

Available Scripts

Crawler

Demo Dashboard

To Do

Credits

About

Releases

Packages

Contributors 3

Languages

Milleus/euterpe

Folders and files

Latest commit

History

Repository files navigation

Euterpe

Available Scripts

Crawler

Demo Dashboard

To Do

Credits

About

Resources

Stars

Watchers

Forks

Languages