Skip to content

Commit 3ef270b

Browse files
author
epriestley
committedAug 22, 2015
Allow transaction publishers to pass binary data to workers
Summary: Ref T8672. Ref T9187. Root issue in at least one case is: - User makes a commit including a file with some non-UTF8 text (say, a Japanese file full of Shift-JIS). - We pass the file to the TransactionEditor so it can inline or attach the patch if the server is configured for these things. - When inlining patches, we convert them to UTF8 before inlining. We must do this since the rest of the mail is UTF8. - When attaching patches, we send them in the original encoding (as file attachments). This is correct, and means we need to give the worker the raw patch in whatever encoding it was originally in: we can't just convert it to utf8 earlier, or we'd attach the wrong patch in some cases. - TransactionEditor does its thing (e.g., creates the commit), then gets ready to send mail about whatever it did. - The publishing work now happens in the daemon queue, so we prepare to queue a PublishWorker and pass it the patch (with some other data). - When we queue workers, we serialize the state data with JSON. So far, so good. But this is where things go wrong: - JSON can't encode binary data, and can't encode Shift-JIS. The encoding silently fails and we ignore it. Then we get to the worker, and things go wrong-er: - Since the data is bad, we fatal. This isn't a permanent failure, so we continue retrying the task indefinitely. This applies several fixes: # When queueing tasks, fail loudly when JSON encoding fails. # In the worker, fail permanently when data can't be decoded. # Allow Editors to specify that some of their data is binary and needs special handling. This is fairly messy, but some simpler alternatives don't seem like good ways forward: - We can't convert to UTF8 earlier, because we need the original raw patch when adding it as an attachment. - We could encode //only// this field, but I suspect some other fields will also need attention, so that adding a mechanism will be worthwhile. In particular, I suspect filenames //may// be causing a similar problem in some cases. - We could convert task data to always use a serialize()-based binary safe encoding, but this is a larger change and I think it's correct that things are UTF8 by default, even if it makes a bit of a mess. I'd rather have an explicit mess like this than a lot of binary data floating around. The change to make `LiskDAO` will almost certainly catch some other problems too, so I'm going to hold this until after `stable` is cut. These problems were existing problems (i.e., the code was previously breaking or destroying data) so it's definitely correct to catch them, but this will make the problems much more obvious/urgent than they previously were. Test Plan: - Created a commit with a bunch of Shift-JIS stuff in a file. - Tried to import it. Prior to patch: - Broken PublishWorker with distant, irrelevant error message. With patch partially applied (only new error checking): - Explicit, local error message about bad key in serialized data. With patch fully applied: - Import went fine and mail generated. Reviewers: chad Reviewed By: chad Subscribers: devurandom, nevogd Maniphest Tasks: T8672, T9187 Differential Revision: https://secure.phabricator.com/D13939
1 parent 1edc64c commit 3ef270b

File tree

4 files changed

+131
-12
lines changed

4 files changed

+131
-12
lines changed
 

‎src/applications/audit/editor/PhabricatorAuditEditor.php

+6
Original file line numberDiff line numberDiff line change
@@ -949,6 +949,12 @@ protected function getCustomWorkerState() {
949949
);
950950
}
951951

952+
protected function getCustomWorkerStateEncoding() {
953+
return array(
954+
'rawPatch' => self::STORAGE_ENCODING_BINARY,
955+
);
956+
}
957+
952958
protected function loadCustomWorkerState(array $state) {
953959
$this->rawPatch = idx($state, 'rawPatch');
954960
$this->affectedFiles = idx($state, 'affectedFiles');

‎src/applications/transactions/editor/PhabricatorApplicationTransactionEditor.php

+118-10
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,8 @@ abstract class PhabricatorApplicationTransactionEditor
6969
private $feedNotifyPHIDs = array();
7070
private $feedRelatedPHIDs = array();
7171

72+
const STORAGE_ENCODING_BINARY = 'binary';
73+
7274
/**
7375
* Get the class name for the application this editor is a part of.
7476
*
@@ -2637,6 +2639,21 @@ protected function addCustomFieldsToMailBody(
26372639
}
26382640

26392641

2642+
/**
2643+
* @task mail
2644+
*/
2645+
private function runHeraldMailRules(array $messages) {
2646+
foreach ($messages as $message) {
2647+
$engine = new HeraldEngine();
2648+
$adapter = id(new PhabricatorMailOutboundMailHeraldAdapter())
2649+
->setObject($message);
2650+
2651+
$rules = $engine->loadRulesForAdapter($adapter);
2652+
$effects = $engine->applyRules($rules, $adapter);
2653+
$engine->applyEffects($effects, $adapter, $rules);
2654+
}
2655+
}
2656+
26402657

26412658
/* -( Publishing Feed Stories )-------------------------------------------- */
26422659

@@ -3060,9 +3077,13 @@ final private function getWorkerState() {
30603077
$state[$property] = $this->$property;
30613078
}
30623079

3080+
$custom_state = $this->getCustomWorkerState();
3081+
$custom_encoding = $this->getCustomWorkerStateEncoding();
3082+
30633083
$state += array(
30643084
'excludeMailRecipientPHIDs' => $this->getExcludeMailRecipientPHIDs(),
3065-
'custom' => $this->getCustomWorkerState(),
3085+
'custom' => $this->encodeStateForStorage($custom_state, $custom_encoding),
3086+
'custom.encoding' => $custom_encoding,
30663087
);
30673088

30683089
return $state;
@@ -3080,6 +3101,21 @@ protected function getCustomWorkerState() {
30803101
}
30813102

30823103

3104+
/**
3105+
* Hook; return storage encoding for custom properties which need to be
3106+
* passed to workers.
3107+
*
3108+
* This primarily allows binary data to be passed to workers and survive
3109+
* JSON encoding.
3110+
*
3111+
* @return dict<string, string> Property encodings.
3112+
* @task workers
3113+
*/
3114+
protected function getCustomWorkerStateEncoding() {
3115+
return array();
3116+
}
3117+
3118+
30833119
/**
30843120
* Load editor state using a dictionary emitted by @{method:getWorkerState}.
30853121
*
@@ -3097,7 +3133,10 @@ final public function loadWorkerState(array $state) {
30973133
$exclude = idx($state, 'excludeMailRecipientPHIDs', array());
30983134
$this->setExcludeMailRecipientPHIDs($exclude);
30993135

3100-
$custom = idx($state, 'custom', array());
3136+
$custom_state = idx($state, 'custom', array());
3137+
$custom_encodings = idx($state, 'custom.encoding', array());
3138+
$custom = $this->decodeStateFromStorage($custom_state, $custom_encodings);
3139+
31013140
$this->loadCustomWorkerState($custom);
31023141

31033142
return $this;
@@ -3143,16 +3182,85 @@ private function getAutomaticStateProperties() {
31433182
);
31443183
}
31453184

3146-
private function runHeraldMailRules(array $messages) {
3147-
foreach ($messages as $message) {
3148-
$engine = new HeraldEngine();
3149-
$adapter = id(new PhabricatorMailOutboundMailHeraldAdapter())
3150-
->setObject($message);
3185+
/**
3186+
* Apply encodings prior to storage.
3187+
*
3188+
* See @{method:getCustomWorkerStateEncoding}.
3189+
*
3190+
* @param map<string, wild> Map of values to encode.
3191+
* @param map<string, string> Map of encodings to apply.
3192+
* @return map<string, wild> Map of encoded values.
3193+
* @task workers
3194+
*/
3195+
final private function encodeStateForStorage(
3196+
array $state,
3197+
array $encodings) {
3198+
3199+
foreach ($state as $key => $value) {
3200+
$encoding = idx($encodings, $key);
3201+
switch ($encoding) {
3202+
case self::STORAGE_ENCODING_BINARY:
3203+
// The mechanics of this encoding (serialize + base64) are a little
3204+
// awkward, but it allows us encode arrays and still be JSON-safe
3205+
// with binary data.
3206+
3207+
$value = @serialize($value);
3208+
if ($value === false) {
3209+
throw new Exception(
3210+
pht(
3211+
'Failed to serialize() value for key "%s".',
3212+
$key));
3213+
}
31513214

3152-
$rules = $engine->loadRulesForAdapter($adapter);
3153-
$effects = $engine->applyRules($rules, $adapter);
3154-
$engine->applyEffects($effects, $adapter, $rules);
3215+
$value = base64_encode($value);
3216+
if ($value === false) {
3217+
throw new Exception(
3218+
pht(
3219+
'Failed to base64 encode value for key "%s".',
3220+
$key));
3221+
}
3222+
break;
3223+
}
3224+
$state[$key] = $value;
31553225
}
3226+
3227+
return $state;
3228+
}
3229+
3230+
3231+
/**
3232+
* Undo storage encoding applied when storing state.
3233+
*
3234+
* See @{method:getCustomWorkerStateEncoding}.
3235+
*
3236+
* @param map<string, wild> Map of encoded values.
3237+
* @param map<string, string> Map of encodings.
3238+
* @return map<string, wild> Map of decoded values.
3239+
* @task workers
3240+
*/
3241+
final private function decodeStateFromStorage(
3242+
array $state,
3243+
array $encodings) {
3244+
3245+
foreach ($state as $key => $value) {
3246+
$encoding = idx($encodings, $key);
3247+
switch ($encoding) {
3248+
case self::STORAGE_ENCODING_BINARY:
3249+
$value = base64_decode($value);
3250+
if ($value === false) {
3251+
throw new Exception(
3252+
pht(
3253+
'Failed to base64_decode() value for key "%s".',
3254+
$key));
3255+
}
3256+
3257+
$value = unserialize($value);
3258+
break;
3259+
}
3260+
$state[$key] = $value;
3261+
}
3262+
3263+
return $state;
31563264
}
31573265

31583266
}

‎src/applications/transactions/worker/PhabricatorApplicationTransactionPublishWorker.php

+6-1
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,14 @@ protected function doWork() {
2626
* Load the object the transactions affect.
2727
*/
2828
private function loadObject() {
29-
$data = $this->getTaskData();
3029
$viewer = PhabricatorUser::getOmnipotentUser();
3130

31+
$data = $this->getTaskData();
32+
if (!is_array($data)) {
33+
throw new PhabricatorWorkerPermanentFailureException(
34+
pht('Task has invalid task data.'));
35+
}
36+
3237
$phid = idx($data, 'objectPHID');
3338
if (!$phid) {
3439
throw new PhabricatorWorkerPermanentFailureException(

‎src/infrastructure/storage/lisk/LiskDAO.php

+1-1
Original file line numberDiff line numberDiff line change
@@ -1651,7 +1651,7 @@ protected function applyLiskDataSerialization(array &$data, $deserialize) {
16511651
if ($deserialize) {
16521652
$data[$col] = json_decode($data[$col], true);
16531653
} else {
1654-
$data[$col] = json_encode($data[$col]);
1654+
$data[$col] = phutil_json_encode($data[$col]);
16551655
}
16561656
break;
16571657
default:

0 commit comments

Comments
 (0)
Failed to load comments.