After the preamble, now exactly what is HTML agility pack π and why it is used? Many times, it becomes a requirement to read or what is technically called as parse an HTML document where the source could be a file, or a string or another web source. Thus, what is HTML agility pack c# is that it is one of the .NET libraries that gives the C# developer π to read and write the DOM (Document Object Model) and has explicit support for plain XPath or XSLT and the bonus is
[Subscribe YouTube Channel] (http://bit.ly/2lSE3r6)
- First, you can install nuget package from the link.
- Under the section, Package Manager copy the install code. For example, if there is content such as >>> PM> Install-Package HtmlAgilityPack -Version x.x.x, then you shall copy the text that follows after PM>.
- After copying the code, now go to your Visual Studio Application and click on Tools menu in the menu bar.
- From the menu drop down, go to library manager β Package Manager Console.
- In the lower half of the Application, now you will see the Package Manager Console opened and the cursor blinking.
- You must paste the code that you copied from the site using the help of step:2 by using the combination of hotkeys Ctrl and V
βΊοΈ . - After pasting the code hit enter and the application will take care of the installation π.
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc = web.load(βhttps://technologycrowds.comβ);
HtmlAgility is a very great tool as we have seen how it can be used to traverse the entire HTML content of webpages in C#, it can also be understood that the HTML content can be manipulated with much ease.
using System;
using HtmlAgilityPack;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
// declaring & loading dom
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc = web.Load("https://en.wikipedia.org/wiki/Main_Page");
// filter html elements on the basis of class name
IEnumerable<HtmlNode> nodes = doc.DocumentNode.Descendants().Where(n => n.HasClass("mw-jump-link"));
foreach(var item in nodes)
{
// displaying final output
Console.WriteLine(item.InnerText);
}
}
}
using System.Collections.Generic; using System.Linq;
using System.Text;
using System.Threading.Tasks; using HtmlAgilityPack;
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc = web.Load("http://technologyCrowds.com");
GetMetaInformation(doc, "description");
static void GetMetaInformation(HtmlAgilityPack.HtmlDocument htmldoc, string value)
{
HtmlNode tcNode = htmldoc.DocumentNode.SelectSingleNode("//meta[@name='" + value + "']");
string fulldescription = string.Empty;
if (tcNode != null)
{
HtmlAttribute desc;
desc = tcNode.Attributes["content"];
Console.ForegroundColor = ConsoleColor.Red;
Console.Write(desc.Value);
Console.ReadLine();
}
}
var html = @"
var html = @"<TD>
</TD>
<TD>
<INPUT value=Technology>
<INPUT value=Crowds>
</TD>
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var node = htmlDoc.DocumentNode.SelectNodes("//td/input");
foreach (var node in nodes)
{
Console.WriteLine(node.Attributes["value"].Value);
}
- Technology
- Crowds
SelectSingleNode is a type of function that takes in an XPath expression and produces a result that contains the first HtmlAgilityPack.HtmlNode. The return value could also be null if there are no matching nodes.
var html = @"
var html = @"<TD>
</TD>
<TD>
<INPUT value=Technology>
<INPUT value=Crowds>
</TD>
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var node = htmlDoc.DocumentNode.SelectNodes("//td/input").First()
.Attributes["value"].Value;
Console.WriteLine(node);
- Technology
var html =
@"<body>
<h1>.Net Core</h1>
This is <b>C#, ASP.Net</b> paragraph
<h1>
.Net Core with Angular</h1>
This is <b>HTML Agility Pack</b> sample
</body>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/p");
foreach (var node in htmlNodes)
{
Console.WriteLine(node.InnerHtml);
}
- This is C#, ASP.Net paragraph This is HTML Agility Pack sample
var html =
@"<body>
<h1>
.Net Core</h1>
This is <b>C#, ASP.Net</b> paragraph
<h1>
.Net Core with Angular</h1>
This is <b>HTML Agility Pack</b> sample
</body>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/p");
foreach (var node in htmlNodes)
{
Console.WriteLine(node.InnerText);
}
- This is C#, ASP.Net paragraph This is HTML Agility Pack sample
var html =
@"<body>
<h1>.Net Core</h1>
<p>This is <b>C#, ASP.Net</b> paragraph</p>
<h1>.Net Core with Angular</h1>
<p>This is <b>HTML Agility Pack</b> sample</p>
</body>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/p");
foreach (var node in htmlNodes)
{
Console.WriteLine(node.OuterHtml);
}
var html =
@"<body>
<h1>.Net Core</h1>
<p>This is <b>C#, ASP.Net</b> paragraph</p>
<h1>.Net Core with Angular</h1>
<p>This is <b>HTML Agility Pack</b> sample</p>
</body>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var node = htmlDoc.DocumentNode.SelectSingleNode("//body/h1");
HtmlNode parentNode = node.ParentNode;
Console.WriteLine(parentNode.Name);
- body
** Free Video Library: Learn HTML Agility Pack Step by Step **