Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Handles varying types of bib file encodings #336

wants to merge 3 commits into


None yet
2 participants

This pull request addresses issue #335

The added code reads the first 32 bytes of a bib file and checks if it can determine the encoding of a file. If so it sets the encoding to the determined encoding. If it is not able to determine the encoding it uses UTF-8 as a default value.

It fixes issue #335 for the following encodings:

  • UTF-8 with BOM
  • UTF-16-LE with BOM
  • UTF-16-BE with BOM

chid commented on 4f1eaa1 Mar 2, 2014

typo of the comment, addtion => addition

probably not too important, but could you close raw after you use it? perhaps just above the bibf line

otherwise looks good!

raw is not a file object which could be closed. I am by far no python guru but I assume pythons gc takes care of the, let us say anonymous file object, after the 32 bytes are read.

chid commented Mar 3, 2014

Sorry my mistake, you are correct! (irrelevant information, some implementations of python may not close it though)

chid commented Mar 3, 2014

I suppose with a little bit extra code you can guarantee that it closes after use (rather than wait for python gc to work, though possibly it won't make too much difference in real life usage)

with open(bibfname, 'rb') as f:
    raw = f.read(32)

@iamolivinius iamolivinius deleted the iamolivinius:st3 branch Jul 30, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment