Skip to content

Commit

Permalink
Merge pull request #6 from bxparks/develop
Browse files Browse the repository at this point in the history
Add format='md' to pypandoc.convert() to prevent exception during 'pip3 install'.
  • Loading branch information
bxparks committed Jan 23, 2018
2 parents 2805281 + 6b0ac5e commit b802f09
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 14 deletions.
24 changes: 15 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,21 +53,27 @@ the STDIN. (CSV is not supported currently.) It scans every record in the
input data file to deduce the table's schema. It prints the JSON formatted
schema file on the STDOUT. There are at least 3 ways to run this script:

1) If you installed using `pip3`, then it should have installed a small helper
1\. **Shell script**

If you installed using `pip3`, then it should have installed a small helper
script named `generate-schema` in your local `./bin` directory of your current
environment (depending on whether you are using a virtual environment).

```
$ generate-schema < file.data.json > file.schema.json
```

2) You can invoke the module directly using:
2\. **Python module**

You can invoke the module directly using:
```
$ python3 -m bigquery_schema_generator.generate_schema < file.data.json > file.schema.json
```
This is essentially what the `generate-schema` command does.

3) If you retrieved this code from its [GitHub
3\. **Python script**

If you retrieved this code from its [GitHub
repository](https://github.com/bxparks/bigquery-schema-generator), then you can invoke
the Python script directly:
```
Expand Down Expand Up @@ -103,7 +109,7 @@ $ bq show --schema mydataset.mytable | python -m json.tool
(The `python -m json.tool` command will pretty-print the JSON formatted schema
file.) This schema file should be identical to `file.schema.json`.

### Options
### Flag Options

The `generate_schema.py` script supports a handful of command line flags:

Expand All @@ -112,15 +118,15 @@ The `generate_schema.py` script supports a handful of command line flags:
* `--debugging_interval lines` Number of lines between heartbeat debugging messages. Default 1000.
* `--debugging_map` Print the metadata schema map for debugging purposes

#### Help
#### Help (`--help`)

Print the built-in help strings:

```
$ generate-schema --help
```

#### Null Values
#### Keep Nulls (`--keep_nulls`)

Normally when the input data file contains a field which has a null, empty
array or empty record as its value, the field is suppressed in the schema file.
Expand Down Expand Up @@ -167,7 +173,7 @@ Example:
$ generate-schema --keep_nulls < file.data.json > file.schema.json
```

#### Debugging Interval
#### Debugging Interval (`--debugging_interval`)

By default, the `generate_schema.py` script prints a short progress message
every 1000 lines of input data. This interval can be changed using the
Expand All @@ -177,7 +183,7 @@ every 1000 lines of input data. This interval can be changed using the
$ generate-schema --debugging_interval 1000 < file.data.json > file.schema.json
```

#### Debugging Map
#### Debugging Map (`--debugging_map`)

Instead of printing out the BigQuery schema, the `--debugging_map` prints out
the bookkeeping metadata map which is used internally to keep track of the
Expand Down Expand Up @@ -228,7 +234,7 @@ INFO:root:Processed 1 lines

In most cases, the data file will be stored in a file:
```
cat > file.data.json
$ cat > file.data.json
{ "a": [1, 2] }
{ "i": 3 }
^D
Expand Down
6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
# Convert README.md to README.rst because PyPI does not support Markdown.
try:
import pypandoc
long_description = pypandoc.convert('README.md', 'rst')
except OSError:
long_description = pypandoc.convert('README.md', 'rst', format='md')
except (EnvironmentError, RuntimeError):
with open('README.md', encoding="utf-8") as f:
long_description = f.read()

setup(name='bigquery-schema-generator',
version='0.1.2',
version='0.1.3',
description='BigQuery schema generator',
long_description=long_description,
url='https://github.com/bxparks/bigquery-schema-generator',
Expand Down
4 changes: 2 additions & 2 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ file which is parsed by the unit test program. This has two advantages:
* the `testdata.txt` data can be reused for versions written in other languages

The output of `test_generate_schema.py` should look something like this:

```
$ ./test_generate_schema.py
----------------------------------------------------------------------
Ran 4 tests in 0.002s
Expand All @@ -27,5 +29,3 @@ Test chunk 11: First record: { "i": [1, 2] }
Test chunk 12: First record: { "r" : { "i": 3 } }
Test chunk 13: First record: { "r" : [{ "i": 4 }] }
```


0 comments on commit b802f09

Please sign in to comment.