Skip to content
/ scrape Public

a script which takes a file with the list of urls and scrapes each of url with goroutines.

Notifications You must be signed in to change notification settings

8ugr4/scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

// UPDATE 19.06.24/21.20/: net/url is not enough to scrape tags.

// use net/http and maybe html to parse the html response // to extract charset from

// goal: // input: list of URL's in a text file. (.txt) // --> UTF-8 // output: "url":"" in this format

// ::: WORKING STRUCTURE OF THE PROGRAM ::: // read the file, save the urls.

// http scraping

// use goroutines to scrape every URL

// use channels(buffer with the number of total goroutines)

// while reading with goroutines, use one goroutine to carry the input to the output file

// use another goroutine to convert the taken input from READER goroutines into expected format

// format is: "url":""

// target address URL

// EXAMPLE OUTPUT AT THE MOMENT: (21.40) /*

scrapedInput:= https://drstearns.github.io/tutorials/gojson/

scrapedInput:= https://leangaurav.medium.com/common-mistakes-when-using-golangs-sync-waitgroup-88188556ca54

scrapedInput:= https://stackoverflow.com/questions/48271388/for-loop-with-buffered-channel

Process finished with the exit code 0

*/

About

a script which takes a file with the list of urls and scrapes each of url with goroutines.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages