Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] i.fusion.hpf - change file encoding from us-ascii to utf-8 #525

Closed
hellik opened this issue May 7, 2021 · 21 comments
Closed

[Bug] i.fusion.hpf - change file encoding from us-ascii to utf-8 #525

hellik opened this issue May 7, 2021 · 21 comments

Comments

@hellik
Copy link
Member

hellik commented May 7, 2021

Name of the addon
i.fusion.hpf

it seems not all text/python files in the i.fusion.hpf source are utf-8 encoded

$ file --mime-encoding *
Makefile:                        us-ascii
README.md:                       utf-8
constants.py:                    utf-8
high_pass_filter.py:             us-ascii
i.fusion.hpf.html:               us-ascii
i.fusion.hpf.py:                 us-ascii
i_fusion_hpf_lsat7_hpf_rgb.png:  binary
i_fusion_hpf_lsat7_orig_rgb.png: binary
licenses:                        binary
test_high_pass_filter.py:        us-ascii

this causes a g.extension crash in winGRASS

Expected behavior
no g.extension error

solution
encoding of files should changed to utf-8

@hellik
Copy link
Member Author

hellik commented May 7, 2021

though not sure when looking at other addons, e.g.

$ file --mime-encoding *
Makefile:                 us-ascii
i.sentinel.download.html: us-ascii
i.sentinel.download.py:   us-ascii

source downloaded via zip file from github

@neteler
Copy link
Member

neteler commented May 8, 2021

I have done a full repo check and grouped and counted files with identical encoding:

cd grass-addons/grass7/

# charset=us-ascii
for i in $(find . -type f) ; do file -i $i ; done | grep ascii | wc -l
2290

# charset=utf-8
for i in $(find . -type f) ; do file -i $i ; done | grep utf | wc -l
90

Likely it should be streamlined...

@hellik
Copy link
Member Author

hellik commented May 8, 2021

lets have closer look into the files

D:\temp\testencoding\grass-addons-master\grass7\imagery\i.fusion.hpf>cat constants.py
#!/usr/bin/env python
#-*- coding:utf-8 -*-

"""
@author: Nikos Alexandris | October 2014
"""

"""
Constants for the HPFA Image Fusion Technique:
Kernel Size, Center Value, Modulation Factor (all depend on Resolution Ratio).

Sources:

- "Optimizing the High-Pass Filter Addition Technique for Image Fusion",
Ute G. Gangkofner, Pushkar S. Pradhan, and Derrold W. Holcomb (2008).

- â?oERDAS IMAGINE.â?? Accessed March 19, 2015. <==

these quotings are suspicious:

- “ERDAS IMAGINE.” Accessed March 19, 2015.

@hellik
Copy link
Member Author

hellik commented May 8, 2021

these quotings are suspicious:

- “ERDAS IMAGINE.” Accessed March 19, 2015.

bingo, these suspicious quotes causes the crash mentioned in OSGeo/grass#1496 (comment)

with these quotes

>>> for r, d, f in os.walk(srcdir):
...    for file in f:
...       if file.endswith(".py"):
...          pyfiles.append(os.path.join(r, file))
...
>>> pyfiles
['D:\\temp\\testencoding\\grass-addons-master\\grass7\\imagery\\i.fusion.hpf\\constants.py', 'D:\\temp\\testencoding\\grass-addons-master\\grass7\\imagery\\i.fusion.hpf\\high_pass_filter.py', 'D:\\temp\\testencoding\\grass-addons-master\\grass7\\imagery\\i.fusion.hpf\\i.fusion.hpf.py', 'D:\\temp\\testencoding\\grass-addons-master\\grass7\\imagery\\i.fusion.hpf\\test_high_pass_filter.py']
>>> for filename in pyfiles:
...    with fileinput.FileInput(filename, inplace=True) as file:
...       for line in file:
...          print(line.replace("#!/usr/bin/env python\n", "#!/usr/bin/env python3\n"), end="",)
...
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "C:\OSGEO4~1\apps\Python37\lib\fileinput.py", line 252, in __next__
    line = self._readline()
  File "C:\OSGEO4~1\apps\Python37\lib\fileinput.py", line 366, in _readline
    return self._readline()
  File "C:\OSGEO4~1\apps\Python37\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 401: character maps to <undefined>

- “ERDAS IMAGINE.” Accessed March 19, 2015. changed to - "ERDAS IMAGINE." Accessed March 19, 2015. <= look at the changed quotes ;-)

>>> pyfiles
['D:\\temp\\testencoding\\grass-addons-master\\grass7\\imagery\\i.fusion.hpf\\constants.py', 'D:\\temp\\testencoding\\grass-addons-master\\grass7\\imagery\\i.fusion.hpf\\high_pass_filter.py', 'D:\\temp\\testencoding\\grass-addons-master\\grass7\\imagery\\i.fusion.hpf\\i.fusion.hpf.py', 'D:\\temp\\testencoding\\grass-addons-master\\grass7\\imagery\\i.fusion.hpf\\test_high_pass_filter.py']
>>> for filename in pyfiles:
...    with fileinput.FileInput(filename, inplace=True) as file:
...       for line in file:
...          print(line.replace("#!/usr/bin/env python\n", "#!/usr/bin/env python3\n"), end="",)

this line.replace mechanism also used by g.extension works now.

so no file encoding change is needed for a working i.fusion.hpf installation in winGRASS; only quote types should be changed.

in a long term, encoding streamlining should be discussed.

@neteler
Copy link
Member

neteler commented May 8, 2021

these quotings are suspicious:
- “ERDAS IMAGINE.” Accessed March 19, 2015.

bingo, these suspicious quotes causes the crash mentioned in OSGeo/grass#1496 (comment)

I have changed such characters in #526, please try.

@nilason
Copy link
Contributor

nilason commented May 9, 2021

@hellik

would the following change of g.extensions.pymake any difference:

# replace:
with fileinput.FileInput(filename, inplace=True) as file:

# with:
with fileinput.FileInput(filename, inplace=True, openhook=fileinput.hook_encoded("utf-8")) as file:

@ninsbl
Copy link
Member

ninsbl commented May 9, 2021

@hellik

would the following change of g.extensions.pymake any difference:

# replace:
with fileinput.FileInput(filename, inplace=True) as file:

# with:
with fileinput.FileInput(filename, inplace=True, openhook=fileinput.hook_encoded("utf-8")) as file:

THanks for the suggestion @nilason . Unfortunately, openhook is not supported with inplace editing:
ValueError: FileInput cannot use an opening hook in inplace mode

But I also think it could/should be adressed at that line in g.extension as non-ascii characters might be required in module code. A possible workaround (without inplace editing) is mentioned here: https://stackoverflow.com/questions/25203040/combining-inplace-filtering-and-the-setting-of-encoding-in-the-fileinput-module

I could add such a solution to OSGeo/grass#1565 hopefuly a bit later today...

@ninsbl
Copy link
Member

ninsbl commented May 9, 2021

With this function for shebang replacement:

def replace_shebang(file):
    """"""
    tmp_name = 'tmp.txt'

    with codecs.open(file, 'r', encoding='utf8') as fi, \
         codecs.open(tmp_name, 'w', encoding='utf8') as fo:

        for line in fi:
            new_line = line.replace("#!/usr/bin/env python\n","#!/usr/bin/env python3\n")
            fo.write(new_line)

    os.remove(file) # remove original
    os.rename(tmp_name, file) # rename temp to original name    

i.fusion.hpf installs just fine...
Will add it to my PR...

@NikosAlexandris
Copy link
Member

these quotings are suspicious:
- “ERDAS IMAGINE.” Accessed March 19, 2015.

bingo, these suspicious quotes causes the crash mentioned in OSGeo/grass#1496 (comment)

Sorry for the late reply. Indeed, these quotes likely come from a copy-paste from some webpage.

@NikosAlexandris
Copy link
Member

only quote types should be changed.

I will have to go through the whole conversation. Is it only the erroneous quotes that have to be 'fixed'?

@hellik
Copy link
Member Author

hellik commented May 10, 2021

only quote types should be changed.

I will have to go through the whole conversation. Is it only the erroneous quotes that have to be 'fixed'?

Yes

@nilason
Copy link
Contributor

nilason commented May 10, 2021

Just installed i.fusion.hpf successfully with 7.9 on Windows, using only latest version of OSGeo/grass#1565 (with the new replace_shebang_win() function). No need for other changes for this.

@NikosAlexandris
Copy link
Member

these quotings are suspicious:
- “ERDAS IMAGINE.” Accessed March 19, 2015.

bingo, these suspicious quotes causes the crash mentioned in OSGeo/grass#1496 (comment)

Erasing the double quotes, and adding new double quotes, supposedly "normal" ones, does not show any git diff!
Is this normal?

@NikosAlexandris
Copy link
Member

these quotings are suspicious:
- “ERDAS IMAGINE.” Accessed March 19, 2015.

bingo, these suspicious quotes causes the crash mentioned in OSGeo/grass#1496 (comment)

Erasing the double quotes, and adding new double quotes, supposedly "normal" ones, does not show any git diff!
Is this normal?

I have done some change to the quotes, naive replacement to single quotes, then back to double quotes, hopefully correct ones.
Now, I am not sure I can split the pull request as I have some more changes in the master branch.

I don't know if it's possible to make a cherry-pick pull request. I guess I will have to create a new branch, roll back the commits I did in master, and only re-apply changes of interest (here: changes that concern the 'quotes' issue). Then re-do a pull request.

Is that right? Can someone confirm?

@neteler
Copy link
Member

neteler commented May 12, 2021

these quotings are suspicious:
- “ERDAS IMAGINE.” Accessed March 19, 2015.

bingo, these suspicious quotes causes the crash mentioned in OSGeo/grass#1496 (comment)

Erasing the double quotes, and adding new double quotes, supposedly "normal" ones, does not show any git diff!
Is this normal?

I guess no. Because the the proposed changes here, e.g. in grass7/imagery/i.fusion.hpf/constants.py:

https://github.com/OSGeo/grass-addons/pull/526/files#diff-1cf0c220e1a6bfdbcaf36aab0fb5026e515a949fd97f52c061752a881fbaa2e3L14

@neteler
Copy link
Member

neteler commented May 12, 2021

I have done some change to the quotes, naive replacement to single quotes, then back to double quotes, hopefully correct ones.
Now, I am not sure I can split the pull request as I have some more changes in the master branch.

Which PR do you refer to?

@nilason
Copy link
Contributor

nilason commented May 12, 2021

To return to the original reported issue of ascii vs. utf-8 encoding of files.

UTF-8 is fully compatible with ASCII, to quote wikipedia: "It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well."

With this in mind I found it suspect that file -i reports seemingly random variation in encoding of the files in source directory. The following test:

echo "asdf“" > a.txt | file -i a.txt
a.txt: text/plain; charset=utf-8
echo "asdf" > b.txt | file -i b.txt
b.txt: text/plain; charset=us-ascii

... suggests that file -i shows "utf-8" if there are characters in the file exceeding the ascii range, and "us-ascii" if only ascii range of characters are present.

I'd say there is probably no problem in general with the file encodings in the code base, but on Windows there are cases when reading python files doesn't default on assuming "utf-8". This has to be explicitly requested. The English style quoting character is a perfectly valid utf-8 character which shouldn't need to be changed to a universal "character.

@NikosAlexandris
Copy link
Member

these quotings are suspicious:
- “ERDAS IMAGINE.” Accessed March 19, 2015.

bingo, these suspicious quotes causes the crash mentioned in OSGeo/grass#1496 (comment)

Erasing the double quotes, and adding new double quotes, supposedly "normal" ones, does not show any git diff!
Is this normal?

I guess no. Because the the proposed changes here, e.g. in grass7/imagery/i.fusion.hpf/constants.py:

https://github.com/OSGeo/grass-addons/pull/526/files#diff-1cf0c220e1a6bfdbcaf36aab0fb5026e515a949fd97f52c061752a881fbaa2e3L14

Hmm? Not sure why I don't see a diff.

@NikosAlexandris
Copy link
Member

I have done some change to the quotes, naive replacement to single quotes, then back to double quotes, hopefully correct ones.
Now, I am not sure I can split the pull request as I have some more changes in the master branch.

Which PR do you refer to?

An 'old' draft -- #510

@nilason
Copy link
Contributor

nilason commented May 19, 2021

I believe this ticket may be closed as 'invalid'. See #525 (comment).

@NikosAlexandris
Copy link
Member

Closing it after @nilason's suggestion. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants