Skip to content

a simple dart program to scrape website html,css and js files alongwith assets.

Notifications You must be signed in to change notification settings

aulolua/webscraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

webscraper

The given code is a Dart program that extracts the source code, including HTML, CSS, JavaScript, and images, from a specified website URL and saves them to a specified output directory.

Here's a breakdown of what the code does:

  1. The main function is asynchronous, and it initializes by prompting the user to enter the website URL and the output directory.
  2. After obtaining the input values, the program calls the extractSourceCode function with the website URL and output directory as parameters.
  3. The extractSourceCode function creates the output directory if it doesn't exist and uses the Dio package to make an HTTP GET request to the provided URL.
  4. It handles redirection if the response status code indicates a redirection (status codes >= 300 and < 400) and retrieves the final URL after following redirections.
  5. The HTML content of the webpage is parsed, and the main HTML file is saved in the output directory.
  6. CSS, JavaScript, and image files are identified within the HTML content and downloaded using the downloadFile function. The URLs for these resources are resolved relative to the base URL if they are not absolute URLs. The downloaded files are saved into the specified output directory.

In summary, this Dart program interacts with a specified website, retrieves its source code, and saves the HTML, CSS, JavaScript, and images to a chosen directory, preserving the relative structure and content of the original website.