Skip to content

Latest commit

 

History

History
81 lines (67 loc) · 2.56 KB

how-to-identify-document-type.md

File metadata and controls

81 lines (67 loc) · 2.56 KB
title description type page_title slug position tags res_type
How to identify the actual document type when the filename extension is not correct
This knowledge base article describes how to identify the actual document type when the filename extension is not correct
how-to
How to identify the actual document type when the filename extension is not correct
how-to-identify-document-type
0
processing, file, filename, extension, incorrect
kb
Product Version Product Author
2022.1.217 WordsProcessing Martin Velikov

Description

This article describes how to identify the actual document type when the filename extension is incorrect which helps us to determine the appropriate format provider.

Solution

The following example demonstrates how to read two documents with ".doc" filename extensions but actually different document types. Using the StringBuilder class we are creating the document signature (header) string, which later to compare with predefined values. Once having the right document type we can determine which format provider to use to import the document.

[C#] Example

{{region how-to-identify-document-type1}}

List<byte[]> documents = new List<byte[]>();
documents.Add(File.ReadAllBytes("rtf.doc"));
documents.Add(File.ReadAllBytes("doc.doc"));

foreach (byte[] document in documents)
{
	string headerCode = GetHeaderInfo(document).ToUpper();

	//! The signatures are taken from: https://www.filesignatures.net/index.php?page=search
	if (headerCode.StartsWith("7B5C72746631"))
	{
		//! The document is RTF
		RtfFormatProvider rtfFormatProvider = new RtfFormatProvider();
		RadFlowDocument rtfDocument = rtfFormatProvider.Import(new MemoryStream(document));
	}
	else if (headerCode.StartsWith("D0CF11E0A1B11AE1"))
	{
		//! The document is DOC
		DocFormatProvider docFormatProvider = new DocFormatProvider();
		RadFlowDocument docDocument = docFormatProvider.Import(document);
	}
}

{{endregion}}

[C#] Getting document header

{{region how-to-identify-document-type1}}

private static string GetHeaderInfo(byte[] documentData)
{
	byte[] buffer = documentData.Take(8).ToArray();

	StringBuilder sb = new StringBuilder();
	foreach (byte b in buffer)
	{
		sb.Append(b.ToString("X2"));
	}

	return sb.ToString();
}

{{endregion}}