Find (or replace, sync/async) attributes of elements (e.g. img src) in an html string, for browser & Node.js.
It's a new library, its stability has not been battle-tested. If you find a bug, please report.
8.4kB MINIFIED, 3.5kB MINIFIED + GZIPPED
npm i html-find-replace-element-attrs
Find all values of an attribute of a type of element.
htmlFindReplaceElementAttrs.find(
`<div><img src='a.jpg' alt=''><img src="b.jpg" alt=''><img src=c.jpg alt=''></div>`,
{ tag: "img", attr: "src" }
)
produces:
[
{
"value": "a.jpg",
"index": 15,
"quoteType": "'"
},
{
"value": "b.jpg",
"index": 39,
"quoteType": "\""
},
{
"value": "c.jpg",
"index": 62,
"quoteType": " "
}
]
If the attribute value is a url, it can also help you parse relative urls (you need to set parseAttrValueAsUrl
to true
and specify baseUrl
in options
)
htmlFindReplaceElementAttrs.find(
"<div><img src='../a.jpg' alt=''><img src='/b.jpg' alt=''><img src='c.jpg' alt=''></div>",
{ tag: "img", attr: "src", parseAttrValueAsUrl: true, baseUrl: "http://www.example.com/hello/world/" },
)
produces:
[
{
"value": "../a.jpg",
"index": 15,
"quoteType": "'",
"parsedUrl": "http://www.example.com/hello/a.jpg"
},
{
"value": "/b.jpg",
"index": 42,
"quoteType": "'",
"parsedUrl": "http://www.example.com/b.jpg"
},
{
"value": "c.jpg",
"index": 67,
"quoteType": "'",
"parsedUrl": "http://www.example.com/hello/world/c.jpg"
}
]
To parse protocol independent urls, set urlProtocol
in options
htmlFindReplaceElementAttrs.find(
"<div><img src='//example.com/a.jpg' alt=''><img src='//example.com/b.jpg' alt=''></div>",
{ tag: "img", attr: "src", parseAttrValueAsUrl: true, urlProtocol: "https" },
)
produces:
[
{
"value": "//example.com/a.jpg",
"index": 15,
"quoteType": "'",
"parsedUrl": "https://example.com/a.jpg"
},
{
"value": "//example.com/b.jpg",
"index": 53,
"quoteType": "'",
"parsedUrl": "https://example.com/b.jpg"
}
]
What element are you looking for (e.g. "img")
What attribute are you looking for (e.g. "src")
If set to true, the return value will also contain parsedUrl
.
Make sure to set baseUrl
if the html contains relative url, or with parseAttrValueAsUrl
turned on it will throw an exception.
Used in conjunction with parseAttrValueAsUrl
(e.g. "http://example.com/test/")
Used when parsing protocol independent urls (defaults to the one in baseUrl
or "http").
Return value is an array of {value, index, quoteType, parsedUrl?}
The value as is in the html string.
Position of the value in the html string.
What kind of quote surrounds the value.
Can be "
, '
, or
Caution: a space
is used to indicate "no quotes", e.g. in <img width=100>
It will be present only when you set parseAttrValueAsUrl
to true.
It indicates what the final url should be in the context you provide.
Useful when you need to interact with the links in attribute values. (e.g. when you need to download and localize all images from img src
)
Replace values of an attribute of a type of element.
htmlFindReplaceElementAttrs.replace(
"<div><img src='../a.jpg' alt=''><img src='/b.jpg' alt=''><img src='c.jpg' alt=''></div>",
"http://www.abc.com/1.jpg",
{ tag: "img", attr: "src" },
)
produces:
<div><img src='http://www.abc.com/1.jpg' alt=''><img src='http://www.abc.com/1.jpg' alt=''><img src='http://www.abc.com/1.jpg' alt=''></div>
It works with callback functions.
htmlFindReplaceElementAttrs.replace(
"<div><img src='../a.jpg' alt=''><img src='/b.jpg' alt=''><img src='c.jpg' alt=''></div>",
item => item.value.toUpperCase(),
{ tag: "img", attr: "src" },
)
produces:
<div><img src='../A.JPG' alt=''><img src='/B.JPG' alt=''><img src='C.JPG' alt=''></div>"
The parameter passed into the callback function is the same as that in the return value
of find
(i.e. {value, index, quoteType, parsedUrl?}
)
The options
is the same as that of find
, too.
e.g. You may use parseAttrValueAsUrl
.
htmlFindReplaceElementAttrs.replace(
"<div><img src='../a.jpg' alt=''><img src='//example2.com/b.jpg' alt=''></div>",
item => item.parsedUrl,
{
tag: "img",
attr: "src",
parseAttrValueAsUrl: true,
baseUrl: "https://www.example.com/hello/world",
urlProtocol: "http",
},
)
produces:
<div><img src='https://www.example.com/hello/a.jpg' alt=''><img src='http://example2.com/b.jpg' alt=''></div>
It is also smart enough to automatically add quotes for you sometimes.
htmlFindReplaceElementAttrs.replace(
"<div><img src=a.jpg></div>",
"hello world.jpg",
{
tag: "img",
attr: "src"
},
)
produces:
<div><img src="hello world.jpg"></div>
Same as that of find
string
The async version of replace
.
It will return a Promise
, callbacks can return Promise
s too.
It is very useful when you want to perform async operations during the replacement.
htmlFindReplaceElementAttrs.replaceAsync('<img src="./abc.jpg"/>',
_ => new Promise(resolve => {
setTimeout(() => resolve(_.value.toUpperCase()), 1000)
}),
{
tag: "img",
attr: "src",
}
)
produces a Promise
that will resolve in 1000ms, with the value:
<img src="./ABC.JPG"/>
Same as that of replace
Promise<string>
htmlFindReplaceElementAttrs.replaceAsync(html, (item) => new Promise(resolve => {
downloadImage(item.parsedUrl).then(downloadResult => {
resolve(downloadResult.path);
});
}),
{ tag: "img", attr: "src", parseAttrValueAsUrl: true, baseUrl: "https://www.gravatar.com" },
).then(replacedHtml => {
console.log(replacedHtml)
});
function downloadImage(imageUrl) {
return new Promise(resolve => {
axios({
url: imageUrl,
responseType: 'stream',
}).then(response => {
let tmpPath = "tmp_" + Math.random();
response.data.pipe(fs.createWriteStream(tmpPath));
response.data.on('end', () => {
resolve({ path: tmpPath });
})
});
});
}
MIT