Skip to content

Commit

Permalink
[AIRFLOW-6505] Let emoji encoded properly for json.dumps() -- BaseSQL…
Browse files Browse the repository at this point in the history
…ToGoogleCloudStorageOperator

Make sure you have checked _all_ steps below.

### Jira

- [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-6505) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
  - https://issues.apache.org/jira/browse/AIRFLOW-6505

### Description
Emoji can't be encoded properly when ` json.dumps()` and 'UTF-8', problem fixed by adding parameter `ensure_ascii=False`. 
In line BaseSQLToGoogleCloudStorageOperato.

 

For example

the emoji 🍻 encoded differentially when in use or not use  `ensure_ascii=False`.

In Use (correct UTF-8 encode): "\xf0\x9f\x8d\xbb"

Not Use (only 2 slash):"\\\ud83c\\\udf7b"

 

Ref: https://stackoverflow.com/questions/51183947/python-json-dumps-doesnt-encode-emojis-properly



### Tests

- [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:

### Commits

- [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
  1. Subject is separated from body by a blank line
  1. Subject is limited to 50 characters (not including Jira issue reference)
  1. Subject does not end with a period
  1. Subject uses the imperative mood ("add", not "adding")
  1. Body wraps at 72 characters
  1. Body explains "what" and "why", not "how"

### Documentation

- [x] In case of new functionality, my PR adds documentation that describes how to use it.
  - All the public functions and the classes in the PR contain docstrings that explain what it does
  - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release

### Code Quality

- [x] Passes `flake8`
  • Loading branch information
damon09273 committed Jan 8, 2020
1 parent 80bd5ff commit 6444656
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion airflow/contrib/operators/sql_to_gcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ def _write_local_data_files(self, cursor):
row_dict = dict(zip(schema, row))

# TODO validate that row isn't > 2MB. BQ enforces a hard row size of 2MB.
tmp_file_handle.write(json.dumps(row_dict, sort_keys=True).encode('utf-8'))
tmp_file_handle.write(json.dumps(row_dict, sort_keys=True, ensure_ascii=False).encode('utf-8'))

# Append newline to make dumps BigQuery compatible.
tmp_file_handle.write(b'\n')
Expand Down

0 comments on commit 6444656

Please sign in to comment.