Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./updateHostsFile.py --auto crashes when run under path contains CJK characters using Python 2.7.12 #316

Closed
Vdragon opened this issue May 21, 2017 · 32 comments

Comments

@Vdragon
Copy link

Vdragon commented May 21, 2017

Console Output

> python updateHostsFile.py --auto
Traceback (most recent call last):                                                        
  File "updateHostsFile.py", line 790, in <module>                                        
    main()                                                                                
  File "updateHostsFile.py", line 147, in main                                            
    options["outputsubfolder"])                                                           
  File "updateHostsFile.py", line 765, in path_join_robust                                
    "likely a LOCALE issue:\n\n" + str(e))                                                
locale.Error: Unable to construct path. This is likely a LOCALE issue:                    

'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)

Reporter's Environment

Working Directory

/home/<username>/工作空間/第三方專案/Unified hosts file with base extensions - Extending and consolidating hosts files from several well-curated sources like adaway.org, mvps.org, malwaredomainlist.com, someonewhocares.org, and potentially others.

Operating System

KDE Neon(based on Ubuntu 16.04 AMD64)

Python

Python 2.7.12

Unified hosts file with base extensions

commit 231dc43

Locale

LANG=zh_TW.UTF-8
LANGUAGE=zh_TW:zh_CN
LC_CTYPE="zh_TW.UTF-8"
LC_NUMERIC="zh_TW.UTF-8"
LC_TIME="zh_TW.UTF-8"
LC_COLLATE="zh_TW.UTF-8"
LC_MONETARY="zh_TW.UTF-8"
LC_MESSAGES="zh_TW.UTF-8"
LC_PAPER="zh_TW.UTF-8"
LC_NAME="zh_TW.UTF-8"
LC_ADDRESS="zh_TW.UTF-8"
LC_TELEPHONE="zh_TW.UTF-8"
LC_MEASUREMENT="zh_TW.UTF-8"
LC_IDENTIFICATION="zh_TW.UTF-8"
LC_ALL=
@gfyoung
Copy link
Contributor

gfyoung commented May 21, 2017

@Vdragon : The scripts don't have a help message, which is why the --help flag is being ignored. Afterwards, updateHostsFile.py is called, leading to the failure you're seeing here.

gfyoung added a commit to forking-repos/hosts that referenced this issue May 21, 2017
gfyoung added a commit to forking-repos/hosts that referenced this issue May 22, 2017
@Vdragon
Copy link
Author

Vdragon commented May 22, 2017

I just noticed the suspicious file in the same directory and tried to see what it can do...

Anyway IMO updateHostsFile.py shouldn't crash even when it is called in a wrong way

@Vdragon Vdragon changed the title ./makeHosts --help crashes when run under path contains CJK characters ./updateHostsFile.py --auto crashes when run under path contains CJK characters May 22, 2017
@gfyoung
Copy link
Contributor

gfyoung commented May 22, 2017

@Vdragon : It's not a "wrong" way (the invocation is perfectly fine). However, it's because you have a LOCALE issue that's preventing the script from running when we try to construct the file path. I suspect it's because you are using Python 2.x instead of Python 3.x, which will handle the CJK characters MUCH BETTER than Python 2.x can.

@Vdragon
Copy link
Author

Vdragon commented May 22, 2017

I've updated the issue as it can be simplified, unless we no longer support Python 2 it's still a valid issue

@Vdragon Vdragon changed the title ./updateHostsFile.py --auto crashes when run under path contains CJK characters ./updateHostsFile.py --auto crashes when run under path contains CJK characters using Python 2.7.12 May 22, 2017
@gfyoung
Copy link
Contributor

gfyoung commented May 22, 2017

It's more that it's an issue with Python 2.x that's making this very difficult to execute. Try running the following commands FROM the directory of updateHostsFile.py in the Python interpreter:

>>> import os
>>> current_dir = os.getcwd()
>>> os.path.join(current_dir, "data")

What happens?

@Vdragon
Copy link
Author

Vdragon commented May 22, 2017

What happens?

> python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> current_dir = os.getcwd()
>>> os.path.join(current_dir, "data")
'/home/<username>/\xe5\xb7\xa5\xe4\xbd\x9c\xe7\xa9\xba\xe9\x96\x93/\xe7\xac\xac\xe4\xb8\x89\xe6\x96\xb9\xe5\xb0\x88\xe6\xa1\x88/Unified hosts file with base extensions - Extending and consolidating hosts files from several well-curated sources like adaway.org, mvps.org, malwaredomainlist.com, someonewhocares.org, and potentially others./data'

@gfyoung
Copy link
Contributor

gfyoung commented May 22, 2017

Okay, can you add the following line of code above this line in the script (line 146 - 147) and run?

print(BASEDIR_PATH)
print(options["outputsubfolder"])

# This line already exists in the code, just wanted to indicate where!
options["outputpath"] = path_join_robust(BASEDIR_PATH, options["outputsubfolder"])

I'd be interested to see what gets printed.

@Vdragon
Copy link
Author

Vdragon commented May 22, 2017

> python updateHostsFile.py --auto
/home/<username>/工作空間/第三方專案/Unified hosts file with base extensions - Extending and consolidating hosts files from several well-curated sources like adaway.org, mvps.org, malwaredomainlist.com, someonewhocares.org, and potentially others.

Traceback (most recent call last):
  File "updateHostsFile.py", line 791, in <module>
    main()
  File "updateHostsFile.py", line 148, in main
    options["outputsubfolder"])
  File "updateHostsFile.py", line 766, in path_join_robust
    "likely a LOCALE issue:\n\n" + str(e))
locale.Error: Unable to construct path. This is likely a LOCALE issue:

'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)

@gfyoung
Copy link
Contributor

gfyoung commented May 22, 2017

Interesting...it's printing the CJK correctly now. Could you add two more lines as follows:

print(type(BASEDIR_PATH))
print(type(options["outputsubfolder"]))

print(BASEDIR_PATH)
print(options["outputsubfolder"])

# This line already exists in the code, just wanted to indicate where!
options["outputpath"] = path_join_robust(BASEDIR_PATH, options["outputsubfolder"])

@gfyoung
Copy link
Contributor

gfyoung commented May 23, 2017

@Vdragon : Any updates on this one? I unfortunately can't debug this locally, so I would need your feedback to diagnose the issue (or figure out a workaround).

@Vdragon
Copy link
Author

Vdragon commented May 23, 2017

Stuck on the indentation error for a bit, anyway:

(shell) python updateHostsFile.py --auto
<type 'str'>
<type 'unicode'>
/home/<username>/工作空間/第三方專案/Unified hosts file with base extensions - Extending and consolidating hosts files from several well-curated sources like adaway.org, mvps.org, malwaredomainlist.com, someonewhocares.org, and potentially others.

Traceback (most recent call last):
  File "updateHostsFile.py", line 794, in <module>
    main()
  File "updateHostsFile.py", line 151, in main
    options["outputsubfolder"])
  File "updateHostsFile.py", line 769, in path_join_robust
    "likely a LOCALE issue:\n\n" + str(e))
locale.Error: Unable to construct path. This is likely a LOCALE issue:

'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)

@gfyoung
Copy link
Contributor

gfyoung commented May 23, 2017

Okay. I think we can now replicate the os.path.join issue. Can you create a Python file (e.g. test.py) in the same directory as updateHostsFile.py with the following code:

import os

BASEDIR_PATH = os.path.dirname(os.path.realpath(__file__))
print(os.path.join(BASEDIR_PATH, unicode("")))

and then run this with Python 2.x?

@Vdragon
Copy link
Author

Vdragon commented May 23, 2017

wierd.

(shell) python test.py 
  File "test.py", line 5
    
                                                 ^
SyntaxError: invalid syntax

@gfyoung
Copy link
Contributor

gfyoung commented May 23, 2017

There is no line 5 even...can you cat test.py ? That shouldn't break with a SyntaxError.

@Vdragon
Copy link
Author

Vdragon commented May 23, 2017

(shell) xclip -out -selection clipboard >test.py
(shell) cat test.py 
import os

BASEDIR_PATH = os.path.dirname(os.path.realpath(__file__))
print(os.path.join(BASEDIR_PATH, unicode(""))(shell) python test.py 
  File "test.py", line 5
    
                                                 ^
SyntaxError: invalid syntax
(shell) python --version
Python 2.7.12
(shell) 

@gfyoung
Copy link
Contributor

gfyoung commented May 23, 2017

Hmm...not sure why I didn't catch it earlier...there's a parenthesis missing at the end...oops 😄

Just add a ) at the end of the last line and run again.

@Vdragon
Copy link
Author

Vdragon commented May 23, 2017

(shell) xclip -out -selection clipboard >test.py
(shell) python test.py 
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    print(os.path.join(BASEDIR_PATH, unicode("")))
  File "/usr/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)

@gfyoung
Copy link
Contributor

gfyoung commented May 23, 2017

There we go! That's the error I was looking for. So now that we have isolated the error, can you try this:

import os

BASEDIR_PATH = unicode(os.path.dirname(os.path.realpath(__file__)))
print(os.path.join(BASEDIR_PATH, unicode("")))

If this works, can you find this line in updateHostsFile.py:

BASEDIR_PATH = os.path.dirname(os.path.realpath(__file__))

and replace it with

BASEDIR_PATH = unicode(os.path.dirname(os.path.realpath(__file__)))

and try running the script again?

@Vdragon
Copy link
Author

Vdragon commented May 23, 2017

(shell) xclip -out -selection clipboard >test.py
(shell) python test.py 
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    BASEDIR_PATH = unicode(os.path.dirname(os.path.realpath(__file__)))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)

@Vdragon
Copy link
Author

Vdragon commented May 23, 2017

I've been reporting similar issues on another Python application, Git Cola a while ago, maybe the fixes of those issues can help:
https://github.com/git-cola/git-cola/issues?utf8=%E2%9C%93&q=is%3Aissue%20author%3AVdragon%20is%3Aclosed%20UnicodeDecodeError

@gfyoung
Copy link
Contributor

gfyoung commented May 23, 2017

I've been reporting similars issues on Git Cola a while ago, maybe the fixes of those issues can help

I'll take a look, but how about trying one more thing:

import os

BASEDIR_PATH = os.path.dirname(os.path.realpath(__file__))
print(os.path.join(BASEDIR_PATH, str(unicode(""))))  # str(unicode(...)) is weird but just try

@Vdragon
Copy link
Author

Vdragon commented May 23, 2017

(shell) xclip -out -selection clipboard >test.py
(shell) python test.py 
/home/<username>/工作空間/第三方專案/Unified hosts file with base extensions - Extending and consolidating hosts files from several well-curated sources like adaway.org, mvps.org, malwaredomainlist.com, someonewhocares.org, and potentially others./

@gfyoung
Copy link
Contributor

gfyoung commented May 23, 2017

Okay, so what you can try now is changing this line in updateHostsFile.py:

options["outputpath"] = path_join_robust(BASEDIR_PATH, options["outputsubfolder"])

to this:

options["outputpath"] = path_join_robust(BASEDIR_PATH, str(options["outputsubfolder"]))

and run the script again. I think it will still crash, but the stacktrace should be different now.

@Vdragon
Copy link
Author

Vdragon commented May 23, 2017

(shell) xdg-open updateHostsFile.py
(shell) git diff
diff --git a/updateHostsFile.py b/updateHostsFile.py
index dbe75cf..038ced9 100644
--- a/updateHostsFile.py
+++ b/updateHostsFile.py
@@ -143,8 +143,7 @@ def main():
 
     options = vars(parser.parse_args())
 
-    options["outputpath"] = path_join_robust(BASEDIR_PATH,
-                                             options["outputsubfolder"])
+    options["outputpath"] = path_join_robust(BASEDIR_PATH, str(options["outputsubfolder"]))
     options["freshen"] = not options["noupdate"]
 
     settings = get_defaults()
(shell) python updateHostsFile.py 
Traceback (most recent call last):
  File "updateHostsFile.py", line 789, in <module>
    main()
  File "updateHostsFile.py", line 149, in main
    settings = get_defaults()
  File "updateHostsFile.py", line 74, in get_defaults
    "datapath": path_join_robust(BASEDIR_PATH, "data"),
  File "updateHostsFile.py", line 764, in path_join_robust
    "likely a LOCALE issue:\n\n" + str(e))
locale.Error: Unable to construct path. This is likely a LOCALE issue:

'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)
(shell)

@gfyoung
Copy link
Contributor

gfyoung commented May 23, 2017

Awesome, so it looks like we need to ensure that the paths we are joining are all strings (they're currently unicode because we imported unicode_literals from __future__). So I think if you start walking through the script and replace all string arguments to path_join_robust (except for BASEDIR_PATH) with str(...), the script will work. Let me know how that works.

@Vdragon
Copy link
Author

Vdragon commented May 23, 2017

(shell) git diff updateHostsFile.py | cat
diff --git a/updateHostsFile.py b/updateHostsFile.py
index dbe75cf..83fbac9 100644
--- a/updateHostsFile.py
+++ b/updateHostsFile.py
@@ -52,7 +52,7 @@ def write_data(f, data):
 
 def list_dir_no_hidden(path):
     # This function doesn't list hidden files
-    return glob(path_join_robust(path, "*"))
+    return glob(path_join_robust(str(path), str("*")))
 
 
 # Project Settings
@@ -71,13 +71,13 @@ def get_defaults():
 
     return {
         "numberofrules": 0,
-        "datapath": path_join_robust(BASEDIR_PATH, "data"),
+        "datapath": path_join_robust(BASEDIR_PATH, str("data")),
         "freshen": True,
         "replace": False,
         "backup": False,
         "skipstatichosts": False,
         "keepdomaincomments": False,
-        "extensionspath": path_join_robust(BASEDIR_PATH, "extensions"),
+        "extensionspath": path_join_robust(BASEDIR_PATH, str("extensions")),
         "extensions": [],
         "outputsubfolder": "",
         "hostfilename": "hosts",
@@ -87,16 +87,16 @@ def get_defaults():
         "sourcesdata": [],
         "readmefilename": "readme.md",
         "readmetemplate": path_join_robust(BASEDIR_PATH,
-                                           "readme_template.md"),
+                                           str("readme_template.md")),
         "readmedata": {},
         "readmedatafilename": path_join_robust(BASEDIR_PATH,
-                                               "readmeData.json"),
+                                               str("readmeData.json")),
         "exclusionpattern": "([a-zA-Z\d-]+\.){0,}",
         "exclusionregexs": [],
         "exclusions": [],
         "commonexclusions": ["hulu.com"],
-        "blacklistfile": path_join_robust(BASEDIR_PATH, "blacklist"),
-        "whitelistfile": path_join_robust(BASEDIR_PATH, "whitelist")}
+        "blacklistfile": path_join_robust(BASEDIR_PATH, str("blacklist")),
+        "whitelistfile": path_join_robust(BASEDIR_PATH, str("whitelist"))}
 
 
 def main():
@@ -143,8 +143,7 @@ def main():
 
     options = vars(parser.parse_args())
 
-    options["outputpath"] = path_join_robust(BASEDIR_PATH,
-                                             options["outputsubfolder"])
+    options["outputpath"] = path_join_robust(BASEDIR_PATH, str(options["outputsubfolder"]))
     options["freshen"] = not options["noupdate"]
 
     settings = get_defaults()
@@ -174,9 +173,9 @@ def main():
     finalize_file(final_file)
 
     if settings["ziphosts"]:
-        zf = zipfile.ZipFile(path_join_robust(settings["outputsubfolder"],
-                                              "hosts.zip"), mode='w')
-        zf.write(path_join_robust(settings["outputsubfolder"], "hosts"),
+        zf = zipfile.ZipFile(path_join_robust(str(settings["outputsubfolder"]),
+                                              str("hosts.zip")), mode='w')
+        zf.write(path_join_robust(str(settings["outputsubfolder"]), str("hosts")),
                  compress_type=zipfile.ZIP_DEFLATED, arcname='hosts')
         zf.close()
 
@@ -192,9 +191,9 @@ def main():
 # Prompt the User
 def prompt_for_update():
     # Create hosts file if it doesn't exists
-    if not os.path.isfile(path_join_robust(BASEDIR_PATH, "hosts")):
+    if not os.path.isfile(path_join_robust(BASEDIR_PATH, str("hosts"))):
         try:
-            open(path_join_robust(BASEDIR_PATH, "hosts"), "w+").close()
+            open(path_join_robust(BASEDIR_PATH, str("hosts")), "w+").close()
         except:
             print_failure("ERROR: No 'hosts' file in the folder,"
                           "try creating one manually")
@@ -317,8 +316,8 @@ def update_all_sources():
             updated_file = updated_file.replace("\r", "")
 
             hosts_file = open(path_join_robust(BASEDIR_PATH,
-                                               os.path.dirname(source),
-                                               settings["hostfilename"]), "wb")
+                                               str(os.path.dirname(source)),
+                                               str(settings["hostfilename"])), "wb")
             write_data(hosts_file, updated_file)
             hosts_file.close()
         except:
@@ -346,12 +345,12 @@ def create_initial_file():
     # spin the sources for extensions to the base file
     for source in settings["extensions"]:
         for filename in recursive_glob(path_join_robust(
-                settings["extensionspath"], source), settings["hostfilename"]):
+                str(settings["extensionspath"]), str(source)), settings["hostfilename"]):
             with open(filename, "r") as curFile:
                 write_data(merge_file, curFile.read())
 
         for update_file_path in recursive_glob(path_join_robust(
-                settings["extensionspath"], source),
+                str(settings["extensionspath"]), str(source)),
                 settings["sourcedatafilename"]):
             update_file = open(update_file_path, "r")
             update_data = json.load(update_file)
@@ -379,7 +378,7 @@ def remove_dups_and_excl(merge_file):
         os.makedirs(settings["outputpath"])
 
     # Another mode is required to read and write the file in Python 3
-    final_file = open(path_join_robust(settings["outputpath"], "hosts"),
+    final_file = open(path_join_robust(str(settings["outputpath"]), str("hosts")),
                       "w+b" if PY3 else "w+")
 
     merge_file.seek(0)  # reset file pointer
@@ -479,7 +478,7 @@ def write_opening_header(final_file):
     write_data(final_file, "# Fetch the latest version of this file: "
                            "https://raw.githubusercontent.com/"
                            "StevenBlack/hosts/master/" +
-               path_join_robust(settings["outputsubfolder"], "") + "hosts\n")
+               path_join_robust(str(settings["outputsubfolder"]), str("")) + "hosts\n")
     write_data(final_file, "# Project home page: https://github.com/"
                            "StevenBlack/hosts\n#\n")
     write_data(final_file, "# ==============================="
@@ -499,7 +498,7 @@ def write_opening_header(final_file):
             write_data(final_file, "127.0.0.53 " + socket.gethostname() + "\n")
         write_data(final_file, "\n")
 
-    preamble = path_join_robust(BASEDIR_PATH, "myhosts")
+    preamble = path_join_robust(BASEDIR_PATH, str("myhosts"))
     if os.path.isfile(preamble):
         with open(preamble, "r") as f:
             write_data(final_file, f.read())
@@ -513,7 +512,7 @@ def update_readme_data():
         extensions_key = "-".join(settings["extensions"])
 
     generation_data = {"location": path_join_robust(
-        settings["outputsubfolder"], ""),
+        str(settings["outputsubfolder"]), str("")),
                        "entries": settings["numberofrules"],
                        "sourcesdata": settings["sourcesdata"]}
     settings["readmedata"][extensions_key] = generation_data
@@ -639,13 +638,13 @@ def flush_dns_cache():
 # Hotfix since merging with an already existing
 # hosts file leads to artifacts and duplicates
 def remove_old_hosts_file():
-    old_file_path = path_join_robust(BASEDIR_PATH, "hosts")
+    old_file_path = path_join_robust(BASEDIR_PATH, str("hosts"))
     # create if already removed, so remove wont raise an error
     open(old_file_path, "a").close()
 
     if settings["backup"]:
-        backup_file_path = path_join_robust(BASEDIR_PATH, "hosts-{}".format(
-            time.strftime("%Y-%m-%d-%H-%M-%S")))
+        backup_file_path = path_join_robust(BASEDIR_PATH, str("hosts-{}".format(
+            time.strftime("%Y-%m-%d-%H-%M-%S"))))
 
         # Make a backup copy, marking the date in which the list was updated
         shutil.copy(old_file_path, backup_file_path)
@@ -733,7 +732,7 @@ def recursive_glob(stem, file_pattern):
         matches = []
         for root, dirnames, filenames in os.walk(stem):
             for filename in fnmatch.filter(filenames, file_pattern):
-                matches.append(path_join_robust(root, filename))
+                matches.append(path_join_robust(str(root), str(filename)))
     return matches
 
 
(shell) python updateHostsFile.py --auto
Updating source ./data/someonewhocares.org from http://someonewhocares.org/hosts/zero/hosts
Updating source ./data/Badd-Boyz-Hosts from https://raw.githubusercontent.com/mitchellkrogza/Badd-Boyz-Hosts/master/hosts
Updating source ./data/UncheckyAds from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/UncheckyAds/hosts
Updating source ./data/tyzbit from https://raw.githubusercontent.com/tyzbit/hosts/master/data/tyzbit/hosts
Updating source ./data/malwaredomainlist.com from https://www.malwaredomainlist.com/hostslist/hosts.txt
Updating source ./data/add.Spam from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Spam/hosts
Updating source ./data/laicure from https://raw.githubusercontent.com/Laicure/HostsY/master/custHost
Updating source ./data/add.Risk from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Risk/hosts
Updating source ./data/StevenBlack from https://raw.githubusercontent.com/StevenBlack/hosts/master/data/StevenBlack/hosts
Updating source ./data/SpotifyAds from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/SpotifyAds/hosts
Updating source ./data/yoyo.org from https://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&mimetype=plaintext&useip=0.0.0.0
Updating source ./data/add.2o7Net from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.2o7Net/hosts
Updating source ./data/mvps.org from http://winhelp2002.mvps.org/hosts.txt
Updating source ./data/adaway.org from https://raw.githubusercontent.com/AdAway/adaway.github.io/master/hosts.txt
Updating source ./data/KADhosts from https://raw.githubusercontent.com/azet12/KADhosts/master/KADhosts.txt
Updating source ./data/add.Dead from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Dead/hosts
Updating source ./extensions/social from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/social-hosts
Updating source ./extensions/gambling from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/gambling-hosts
Updating source ./extensions/fakenews from https://raw.githubusercontent.com/marktron/fakenews/master/fakenews
Updating source ./extensions/porn/clefspeare13 from https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/0.0.0.0/hosts
Updating source ./extensions/porn/sinfonietta from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/pornography-hosts
updateHostsFile.py:730: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if stem == "*":
Success! The hosts file has been saved in folder 
It contains 36,011 unique entries.
(shell) 

@gfyoung
Copy link
Contributor

gfyoung commented May 23, 2017

Okay! So not entirely sure where that UnicodeWarning is originating (i.e. stacktrace). However, at least this patch seems to confirm what I thought was the issue. Okay, so the diff you just showed should patch everything. Let me put together something a little more compact in a PR and see if that works for you too.

gfyoung added a commit to forking-repos/hosts that referenced this issue May 23, 2017
gfyoung added a commit to forking-repos/hosts that referenced this issue May 23, 2017
@gfyoung
Copy link
Contributor

gfyoung commented May 23, 2017

@Vdragon : Okay, PR is up. Let me know if the patch works for you (and whether you still get that warning).

gfyoung added a commit to forking-repos/hosts that referenced this issue May 23, 2017
@Vdragon
Copy link
Author

Vdragon commented May 23, 2017

(shell) git status
HEAD detached at gfyoung/unicode-str-join
Untracked files:
  (use "git add <file>..." to include in what will be committed)

        test.py

nothing added to commit but untracked files present (use "git add" to track)
(shell) python updateHostsFile.py --auto
Updating source ./data/someonewhocares.org from http://someonewhocares.org/hosts/zero/hosts
Updating source ./data/Badd-Boyz-Hosts from https://raw.githubusercontent.com/mitchellkrogza/Badd-Boyz-Hosts/master/hosts
Updating source ./data/UncheckyAds from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/UncheckyAds/hosts
Updating source ./data/tyzbit from https://raw.githubusercontent.com/tyzbit/hosts/master/data/tyzbit/hosts
Updating source ./data/malwaredomainlist.com from https://www.malwaredomainlist.com/hostslist/hosts.txt
Updating source ./data/add.Spam from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Spam/hosts
Updating source ./data/laicure from https://raw.githubusercontent.com/Laicure/HostsY/master/custHost
Updating source ./data/add.Risk from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Risk/hosts
Updating source ./data/StevenBlack from https://raw.githubusercontent.com/StevenBlack/hosts/master/data/StevenBlack/hosts
Updating source ./data/SpotifyAds from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/SpotifyAds/hosts
Updating source ./data/yoyo.org from https://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&mimetype=plaintext&useip=0.0.0.0
Updating source ./data/add.2o7Net from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.2o7Net/hosts
Updating source ./data/mvps.org from http://winhelp2002.mvps.org/hosts.txt
Updating source ./data/adaway.org from https://raw.githubusercontent.com/AdAway/adaway.github.io/master/hosts.txt
Updating source ./data/KADhosts from https://raw.githubusercontent.com/azet12/KADhosts/master/KADhosts.txt
Updating source ./data/add.Dead from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Dead/hosts
Updating source ./extensions/social from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/social-hosts
Updating source ./extensions/gambling from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/gambling-hosts
Updating source ./extensions/fakenews from https://raw.githubusercontent.com/marktron/fakenews/master/fakenews
Updating source ./extensions/porn/clefspeare13 from https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/0.0.0.0/hosts
Updating source ./extensions/porn/sinfonietta from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/pornography-hosts
updateHostsFile.py:731: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if stem == "*":
Success! The hosts file has been saved in folder 
It contains 36,011 unique entries.
(shell) 

@gfyoung
Copy link
Contributor

gfyoung commented May 23, 2017

I think that warning is because stem contains your directory name, which can't be casted to unicode, so I think I'll leave that alone for now.

@gfyoung
Copy link
Contributor

gfyoung commented May 23, 2017

Actually, I change my mind on that. Can you add the following change to the line here:

if stem == "*"

and change it to be:

if stem == str("*")

on top of my changes and see if the warning goes away?

@Vdragon
Copy link
Author

Vdragon commented May 23, 2017

Yep.

(shell) git diff
diff --git a/updateHostsFile.py b/updateHostsFile.py
index 1c55d58..0442efc 100644
--- a/updateHostsFile.py
+++ b/updateHostsFile.py
@@ -728,7 +728,7 @@ def recursive_glob(stem, file_pattern):
     if sys.version_info >= (3, 5):
         return glob(stem + "/**/" + file_pattern, recursive=True)
     else:
-        if stem == "*":
+        if stem == str("*"):
             stem = "."
         matches = []
         for root, dirnames, filenames in os.walk(stem):
(shell) python updateHostsFile.py --auto
Updating source ./data/someonewhocares.org from http://someonewhocares.org/hosts/zero/hosts
Updating source ./data/Badd-Boyz-Hosts from https://raw.githubusercontent.com/mitchellkrogza/Badd-Boyz-Hosts/master/hosts
Updating source ./data/UncheckyAds from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/UncheckyAds/hosts
Updating source ./data/tyzbit from https://raw.githubusercontent.com/tyzbit/hosts/master/data/tyzbit/hosts
Updating source ./data/malwaredomainlist.com from https://www.malwaredomainlist.com/hostslist/hosts.txt
Updating source ./data/add.Spam from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Spam/hosts
Updating source ./data/laicure from https://raw.githubusercontent.com/Laicure/HostsY/master/custHost
Updating source ./data/add.Risk from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Risk/hosts
Updating source ./data/StevenBlack from https://raw.githubusercontent.com/StevenBlack/hosts/master/data/StevenBlack/hosts
Updating source ./data/SpotifyAds from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/SpotifyAds/hosts
Updating source ./data/yoyo.org from https://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&mimetype=plaintext&useip=0.0.0.0
Updating source ./data/add.2o7Net from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.2o7Net/hosts
Updating source ./data/mvps.org from http://winhelp2002.mvps.org/hosts.txt
Updating source ./data/adaway.org from https://raw.githubusercontent.com/AdAway/adaway.github.io/master/hosts.txt
Updating source ./data/KADhosts from https://raw.githubusercontent.com/azet12/KADhosts/master/KADhosts.txt
Updating source ./data/add.Dead from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.Dead/hosts
Updating source ./extensions/social from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/social-hosts
Updating source ./extensions/gambling from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/gambling-hosts
Updating source ./extensions/fakenews from https://raw.githubusercontent.com/marktron/fakenews/master/fakenews
Updating source ./extensions/porn/clefspeare13 from https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/0.0.0.0/hosts
Updating source ./extensions/porn/sinfonietta from https://raw.githubusercontent.com/Sinfonietta/hostfiles/master/pornography-hosts
Success! The hosts file has been saved in folder 
It contains 36,011 unique entries.

gfyoung added a commit to forking-repos/hosts that referenced this issue May 23, 2017
gfyoung added a commit to forking-repos/hosts that referenced this issue May 23, 2017
gfyoung added a commit to forking-repos/hosts that referenced this issue May 23, 2017
indrajitr pushed a commit to indrajitr/adware-hosts that referenced this issue Jul 9, 2017
indrajitr pushed a commit to indrajitr/adware-hosts that referenced this issue Jul 9, 2017
mitchellkrogza pushed a commit to mitchellkrogza/hosts that referenced this issue Aug 14, 2017
Closes StevenBlackgh-316.


Former-commit-id: e983cb3
Former-commit-id: af7c10eda1667a10a498998bb48f3fb347b92089
mitchellkrogza pushed a commit to mitchellkrogza/hosts that referenced this issue Aug 14, 2017
Closes StevenBlackgh-316.


Former-commit-id: 265b800
Former-commit-id: df22c169726f960134951f465ec8ca2ca3ffd982
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants