Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global variable values in ColumnEncodingUtility not passed as expected to multiprocessing pool on Windows #58

Closed
lewyh opened this issue Nov 16, 2015 · 4 comments
Assignees

Comments

@lewyh
Copy link

lewyh commented Nov 16, 2015

Using the column encoding utility on Windows leads to problems when providing values via the command line. An initial connection to Redshift is established from inside the main() function, but when analyze() is called by multiprocessing.map, the values in the global variables (db_user, db_pwd, etc...) that are passed to analyze() are not those provided in the command line, but rather the values defined by the block earlier in the script (lines 65-84). Some of these variables can be supplied by setting environment variables, however some (e.g. db_pwd, analyze_schema) are expected to be supplied via the command line. As a result, several variables required by analyze() are None.

The Python documentation recommends that for Windows compatibility, that arguments are passed explicitly to functions called via multiprocessing, rather than relying on global variables.

An example of code that works on Linux but not as expected on Windows:

from multiprocessing import Pool

str_ex = "Not Updated"
def f(i):
    print(str_ex)

def main():
    global str_ex
    str_ex = "Updated"
    p = Pool(3)
    p.map(f, range(3))

if __name__ == "__main__":
    main()

On Linux the function f() sees the updated string, whereas on Windows, "Not Updated" is printed three times.

@IanMeyers
Copy link
Contributor

Thanks so much for this great bug report. I will look to fix these issues and update this when I've resolved it.

@IanMeyers IanMeyers self-assigned this Nov 16, 2015
@joeVFC
Copy link

joeVFC commented Jan 5, 2018

I'm hitting this same issue. Any updates?

@IanMeyers
Copy link
Contributor

Unfortunately on Windows you can only use a thread count of 1 at this time.

@saeedma8
Copy link
Contributor

Amazon Redshift Column Encoding Utility is now deprecated. Instead, please use Redshift's Automatic Table Optimization features through:

ALTER TABLE table_name ALTER SORTKEY AUTO;
ALTER TABLE table_name ALTER DISTSTYLE AUTO;

https://docs.aws.amazon.com/redshift/latest/dg/t_Creating_tables.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants