Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document metadata properties #33

Closed
mguinness opened this issue Apr 17, 2019 · 8 comments
Closed

Document metadata properties #33

mguinness opened this issue Apr 17, 2019 · 8 comments

Comments

@mguinness
Copy link
Contributor

mguinness commented Apr 17, 2019

When the includeProperties option is set to true the metadata is included in the output. Would it be possible to expose a new property on the FilterReader class as a dictionary? I can put a PR together if you have no objections.

@Sicos1977
Copy link
Owner

Sure ... if you make it optional

@Sicos1977
Copy link
Owner

I just released a new package https://www.nuget.org/packages/IFilterTextReader/1.6.1

@mguinness
Copy link
Contributor Author

New package works great, thanks!

@mantis
Copy link
Contributor

mantis commented May 1, 2019

@Sicos1977 and @mguinness the only problem with this is that it's possible to have meta data properties are duplicated e.g.

Names: foo
Names: bar

In this scenario, the dictionary generates a key already exists exception. I'll log a separate issue for this also

@mguinness
Copy link
Contributor Author

Out of interest what is the output of filtdump of an example file? I imagine the tags are coming from different sections in the file. Changing the field type to List<KeyValuePair<string, object>> would work.

@mantis
Copy link
Contributor

mantis commented May 27, 2019

@mguinness - sorry, I didn't rush back to this - in this case it's the same section, but the 'different sections' is also a problem

CHUNK: ---------------------------------------------------------------
Attribute = {2C443B1E-F1E2-404F-974D-E21FEF8E70AA}\Names
idChunk = 13
BreakType = 2 (Sentence)
Flags (chunkstate) = (Value)
Locale = 2057 (0x809)
IdChunkSource = 13
cwcStartSource = 0
cwcLenSource = 0

VALUE: ---------------------------------------------------------------
Type = 31 (0x1f), VT_LPWSTR
Value = "Test A"

CHUNK: ---------------------------------------------------------------
Attribute = {2C443B1E-F1E2-404F-974D-E21FEF8E70AA}\Names
idChunk = 14
BreakType = 2 (Sentence)
Flags (chunkstate) = (Value)
Locale = 2057 (0x809)
IdChunkSource = 14
cwcStartSource = 0
cwcLenSource = 0

VALUE: ---------------------------------------------------------------
Type = 31 (0x1f), VT_LPWSTR
Value = "Test B"


  <rdf:Description rdf:about=""
        xmlns:TestSchema="http://test">
     <TestSchema:Names>
        <rdf:Bag>
           <rdf:li>Test A</rdf:li>
           <rdf:li>Test B</rdf:li>
        </rdf:Bag>
     </TestSchema:Names>
  </rdf:Description>

Now, whilst we changed to <string, object> - and i'm going to look at this again soon - for some reason, I seem to recall thinking that including the schema into the output would be useful: Pretty sure I found that <string becomes 'Names' - so if a purpose is to allow an application to filter on a specific filter lets say the meta data property output doesn't let you identify the same name from different paths if there is a conflict. So for example, I have

image

Where we have System.Title, title and Title.

One of them is dc:tittle - the other is TestSchema:Title - and presumably the System.Title is the default document title outside the metadata. This I think is the issue that you were hitting on?

@mguinness
Copy link
Contributor Author

Thanks for the reply. The example you cited seems more like an array of names. Can you upload a small example document?

@mantis
Copy link
Contributor

mantis commented Aug 17, 2019

@mguinness - it was indeed an array of names - sample image uploaded below: (hopefully github doesn't modify it)

pixel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants