Skip to content
This repository has been archived by the owner on Dec 17, 2021. It is now read-only.

fail to run reddit_tft example preprocess part on google cloud #57

Closed
suiyuan2009 opened this issue Jun 28, 2017 · 10 comments
Closed

fail to run reddit_tft example preprocess part on google cloud #57

suiyuan2009 opened this issue Jun 28, 2017 · 10 comments

Comments

@suiyuan2009
Copy link

python preprocess.py --training_data fh-bigquery.reddit_comments.2015_12                      --eval_data fh-bigquery.reddit_comments.2016_01                      --predict_data fh-bigquery.reddit_comments.2016_02                      --output_dir $GCS_PATH/preproc                      --project_id $PROJECT                      --cloud
No handlers could be found for logger "oauth2client.contrib.multistore_file"
Traceback (most recent call last):
  File "preprocess.py", line 258, in <module>
    main()
  File "preprocess.py", line 254, in main
    frequency_threshold=args.frequency_threshold)
  File "preprocess.py", line 149, in preprocess
    pipeline=pipeline))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/transforms/ptransform.py", line 709, in __ror__
    return self.transform.__ror__(pvalueish, self.label)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/transforms/ptransform.py", line 388, in __ror__
    result = p.apply(self, pvalueish, label)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.py", line 229, in apply
    return self.apply(transform, pvalueish)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.py", line 265, in apply
    pvalueish_result = self.runner.apply(transform, pvalueish)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/runner.py", line 150, in apply
    return m(transform, input)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/runner.py", line 156, in apply_PTransform
    return transform.expand(input)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/beam/tft_beam_io/beam_metadata_io.py", line 57, in expand
    metadata_io.write_metadata(metadata, self._path)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_metadata/metadata_io.py", line 56, in write_metadata
    version.write(metadata, vdir)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_metadata/version_api.py", line 88, in write
    vdir.create()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_metadata/metadata_directory.py", line 57, in create
    tf.gfile.MakeDirs(self._basepath)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 367, in recursive_create_dir
    pywrap_tensorflow.RecursivelyCreateDir(compat.as_bytes(dirname), status)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: 'object' must be a non-empty string.
@elmer-garduno
Copy link
Contributor

Did you specified --output_dir, that seems to be the problem. I'll make sure to make it required.

@suiyuan2009
Copy link
Author

I use

python preprocess.py --training_data fh-bigquery.reddit_comments.2015_12 \
                     --eval_data fh-bigquery.reddit_comments.2016_01 \
                     --predict_data fh-bigquery.reddit_comments.2016_02 \
                     --output_dir $GCS_PATH/preproc \
                     --project_id $PROJECT \
                     --cloud

in https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/reddit_tft/README.md.

@elmer-garduno
Copy link
Contributor

Just by looking at the symptoms is $GCS_PATH defined on your current environment?

In other words were all these statements successful?

PROJECT=$(gcloud config list project --format "value(core.project)")
BUCKET="gs://${PROJECT}-ml"

GCS_PATH="${BUCKET}/${USER}/reddit_comments"

@suiyuan2009
Copy link
Author

echo $GCS_PATH
gs://megvii-test-ml/dongziming/reddit_comments

@elmer-garduno
Copy link
Contributor

Thanks for the information by any chance are you running tensorflow version 1.2?

@suiyuan2009
Copy link
Author

>>> import tensorflow as tf
>>> tf.__version__
'1.2.0'
>>> 

I tried tf 1.1.0, error is same.

@elmer-garduno
Copy link
Contributor

Hi @suiyuan2009, I debugged this issue on a clean environment, could you please try the following?

This ensures that the preprocessing code is always executed on a clean directory in case that's the problem.

JOB_ID="reddit_${USER}$(date +%Y%m%d%H%M%S)"
PREPROCESS_OUTPUT="${GCS_PATH}/${JOB_ID}"
python preprocess.py --training_data fh-bigquery.reddit_comments.2015_12
--eval_data fh-bigquery.reddit_comments.2016_01
--predict_data fh-bigquery.reddit_comments.2016_02
--output_dir "${PREPROCESS_OUTPUT}"
--project_id "${PROJECT}"
--cloud

@elmer-garduno
Copy link
Contributor

One more thing could you please verify if your output bucket is regional?

@elmer-garduno
Copy link
Contributor

Also if you could indicate us if you are running on Mac (OS X)?

@elmer-garduno
Copy link
Contributor

Please take a look at #56, this should be fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants