Skip to content

Latest commit

 

History

History
55 lines (37 loc) · 1.44 KB

README.md

File metadata and controls

55 lines (37 loc) · 1.44 KB

Generic library functions

CleanHTML
cleans a html source by removing attributes, styles and returns raw content
FileTypeCheck
check if the file type of given source path matches given file type
DateInSlice
check if a given date is in the given slice
DownloadFile
download a file given the source and destination
EnsureDirectory
make directory if not exist 
ExtractDomain
extract the main domain from a given source path
ExtractFileName
extract filename from a given source path
FixUrl
convert relative urls to absolute urls
HTMLStringToDoc
convert html string to a queryable document
Maximum
return maximum of a positive number slice
Minimum
return minimum of a positive number slice
ObjectIdInSlice
check if a given string exists in a given slice
ParseCategoriesString
converts a categories string into a slice
ParsePdf
reads and extract content from a given PDF source filepath
ProcessNameString
standardize titles to make them url compatible by removing error prone characters
StringContainsAnyInSlice
check if a given string is contained in any string in a given slice
StringInSlice
check if a given string exists in a given slice
StringMatchPercentage
check the similarity percentage of two given strings