target-amplitude

This is a Singer target that reads JSON-formatted data from Google BigQuery and send it to Amplitude APIs. following the Singer spec.

Install target-amplitude-batch

Run the following command. It runs setup.py and installs target-amplitude-batch into the virtual prepared env, -e emulates how a user of the package would install requirements.

pip install -e .

Add the config.json file in the working directory, following config.sample.json:

{
  "api_key": "Amplitude api_key of the projectID to which we want to send the data",
  "flush_queue_size": 10000,
  "flush_interval_millis": 1000,
  "flush_max_retries": 12,
  "use_batch": true,
  "is_batch_identify": false,
  "is_schemaless": false
}

How to use it

Step 1: Set up tap-bigquery

Please follow the description here

Step 2: Add tap config file

Create a file called tap_config.json in the working directory:

{
  "streams": [
      {"name": "<some_schema_name>",
       "table": "`<project>.<dataset>.<table>`",
       "columns": ["<col_name_0>", "<col_name_1>", "<col_name_2>"],
       "datetime_key": "<your_key>",
       "filters": ["country='us'", "state='CA'",
                   "registered_on>=DATE_ADD(current_date, INTERVAL -7 day)"
                  ] // also optional: these are parsed in 'WHERE' clause
      }
    ],
  "start_datetime": "2017-01-01T00:00:00Z", // This can be set at the command line argument
  "end_datetime": "2017-02-01T00:00:00Z", // end_datetime is optional
  "limit": 100,
  "start_always_inclusive": false // default is false, optional
}

The required parameters is at least one stream (one bigquery table/view) to copy.
- It is not a recommended BigQuery practice to use * to specify the columns as it may blow up the cost for a table with a large number of columns.
- filters are optional but strongly recommend using this over a large partitioned table to control the cost. LIMIT (The authors of tap-bigquery is not responsible for the cost incurred by running this program. Always test thoroughly with small data set first.
- start_datetime must also be set in the config file or as the command line argument (See the next step).
- limit will limit the number of results, but it does not result in reduce the query cost.

The table/view is expected to have a column to indicate the creation or update date and time so the tap sends the query with `ORDER BY and use the column to record the bookmark (See State section).

Step 3: Create catalog

Run tap-bigquery in discovery mode to let it create json schema file and then run them together, piping the output of tap-bigquery to target-amplitude-batch:

tap-bigquery -c tap_config.json -d > catalog.json

Step 4: Run

tap-bigquery -c tap_config.json \
    --catalog catalog.json --start_datetime '2020-08-01T00:00:00Z' \
    --end_datetime '2020-08-02T01:00:00Z' | target-amplitude-batch --config target_config.json \
    > state.json

State

This target emits state. The command also takes a state file input with --state <file-name> option. If the state is set, start_datetime config and command line argument are ignored and the datetime value from last_update key is used as the resuming point.

To avoid the data duplication, start datetime is exclusive start_datetime < datetime_column when the target runs with state option. If you fear a data loss because of this, just use the --start_datetime option instead of state.

The tap itself does not output a state file. It anticipate the target program or a downstream process to finalize the state safetly and produce a state file.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
sample_config.json		sample_config.json
setup.cfg		setup.cfg
setup.py		setup.py
target_amplitude_batch.py		target_amplitude_batch.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

target-amplitude

Contents

Install target-amplitude-batch

How to use it

Step 1: Set up tap-bigquery

Step 2: Add tap config file

Step 3: Create catalog

Step 4: Run

State

About

Releases 10

Packages

Contributors 2

Languages

License

sendinblue/target-amplitude-batch

Folders and files

Latest commit

History

Repository files navigation

target-amplitude

Contents

Install target-amplitude-batch

How to use it

Step 1: Set up tap-bigquery

Step 2: Add tap config file

Step 3: Create catalog

Step 4: Run

State

About

Resources

License

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 2

Languages

Packages