# Teradataデータベースにおけるデータの入出力

- このノートブックでは、Teradataデータベースにおけるデータの入出力の方法のうち、**tdloadコマンドを利用する方法** を紹介します。
- `tdload`は、Teradataとのデータの入出力に特化したコマンドラインツールで、下記の3つをサポートします。
  1. ローカルファイル（CSVなどのフラット形式）からTeradataテーブルへのロード
  1. Teradataテーブルをローカルファイルへの出力
  1. Teradataテーブル間のデータのコピー  
- `tdload`は、比較的シンプルな設定を想定したコマンドです。より複雑な設定をするツールとして、`tbuild` があります。こちらは別途デモを紹介する予定です。
- `tdload`, `tbuild`はいずれも **Teradata Parallel Transporter (TPT)** の一部です。本資料でなるべく紹介を完結させますが、[ユーザーガイド](https://docs.teradata.com/r/tOURC6AVX_E6BLP7rmJgQw/root)も併せてご参照ください。

注：

  - ClearScape Experience には TPTがインストールされていないため、このノートブックを実行することはできません
  - そのため、これは別環境で実行した例です。参考用としてご覧ください
  - 特に、インターネット経由での接続を行っているため、所要時間は通常よりも長くなっています

## 依存ライブラリ

- `tdload`はコマンドラインツールですので、Python環境は不要です。本資料では、自己完結的なデモを構成する目的でコマンドをノートブック上でPython経由で起動しますが、実際にはコマンドプロンプトやシェル端末から実行することができます。
- また、テストデータの作成やコマンド結果の確認のために各種Pythonライブラリを使用しますが、`tdload`コマンドの実行そのものには不要です。
- 下記は、本ノートブック実行に必要なライブラリのインストールコマンド例です。

In [None]:
%pip install pandas teradataml ipython-sql

In [2]:
# デモ用のファイル置き場を作成
import os

tmpdir = "tmp"
os.makedirs(tmpdir, exist_ok=True)

## Teradataへの接続

In [3]:
from getpass import getpass
from urllib.parse import quote_plus

# 接続情報
host = getpass("Host > ")
user = "demo_user"
database = "demo_user"
password = getpass("Password > ")
dbs_port = 1025
encryptdata = "true"

connstr = (
  f"teradatasql://{user}:{quote_plus(password)}@{host}/?"
  f"&database={database}"
  f"&dbs_port={dbs_port}"
  f"&encryptdata={encryptdata}"
)

%load_ext sql
%config SqlMagic.autopandas=True
%config SqlMagic.displaycon=False
%sql {connstr}

# 接続確認
%sql SELECT database, current_timestamp

Host >  ·····················································
Password >  ········


1 rows affected.


Unnamed: 0,Database,Current TimeStamp(6)
0,DEMO_USER,2024-04-02 14:26:31.330000-04:00


In [4]:
# teradataml のコンテキストを開始
from sqlalchemy import create_engine
from teradataml import create_context, remove_context, DataFrame
engine = create_engine(connstr)
context = create_context(tdsqlengine=engine, temp_database_name=user)

# 接続確認
DataFrame('"dbc"."dbcInfoV"')

InfoKey,InfoData
LANGUAGE SUPPORT MODE,Standard
VERSION,17.20.03.23
RELEASE,17.20.03.23


In [5]:
# テーブルが存在すれば削除する関数
def _drop_if_exists(table_name):
  import sys
  from teradataml import get_connection
  conn = get_connection()
  try:
    conn.execute(f"DROP TABLE {table_name}")
    print(f"Deleted {table_name}", file=sys.stderr)
  except:
    pass
  

def _clean_job_junks(table_name):
  # JOB失敗時に発生するオブジェクトの残りを削除
  _drop_if_exists(table_name)
  _drop_if_exists(table_name + "_ET")
  _drop_if_exists(table_name + "_UV")


## tdloadのインストール

`tdload`は Teradata Tools and Utilities の一部です。Teradata downloads から、使用するOSに合わせてダウンロード、インストールしてください。

- [Windows](https://downloads.teradata.com/download/tools/teradata-tools-and-utilities-windows-installation-package)
- [Mac](https://downloads.teradata.com/download/tools/teradata-tools-and-utilities-macos-installation-package)
- [Linux](https://downloads.teradata.com/download/tools/teradata-tools-and-utilities-linux-installation-package-0)

### パスの設定

インストール後、ツールの場所にパスが通っていない場合があります。適宜、パスを設定してください。

- Macの場合
  - 通常、`tdload`は `"/Library/Application Support/teradata/client/17.00/bin"` に配置されます（適宜バージョン番号を読み替えてください）。
  - たとえば、下記を`~/.bash_profile`に追記することでパスを設定できます。
    ```
    export PATH="$PATH:/Library/Application Support/teradata/client/17.00/bin"
    ```
- Windowsの場合
  - TBA

In [6]:
# tdload コマンドの存在を確認
!tdload

Teradata Load Utility Version 17.20.00.11 64-Bit

Usage : tdload -f filename -u username -t tablename
              [-h hostname] [-p password] [-c charset_id]
              [-d delimiter] [-j filename] [-L LogFilePath]
              [-I ConfigurationFileName]
              [-r CheckpointDirectory] [-R RetryLimit]
              [-w RestartWaitPeriod] [--NoLoadSlot]
              [-z CheckpointInterval] [-v] [-x] [--SourceInstances]
              [--TargetInstances] [--StagingTable staging_table_name]
              [--DefaultStagingTable] [--InsertStmt]
              [--SharedMemorySize size] [--help] [JobName]

    Short option descriptions:

        - f filename         - full path name of an input data file
        - t tablename        - name of a target table
        - u username         - user id of the Teradata logon account

        [-p userpasswd]      - password of the Teradata logon account
                               (If omitted, tdload will prompt for password.)
        [

## tdload 基本コマンド

より詳しい使用方法・オプションいついては、後のデモならびに[ユーザーガイド](https://docs.teradata.com/r/tOURC6AVX_E6BLP7rmJgQw/root)をご参照ください。

### ローカルファイル --> Teradata

```shell
tdload -f <source file> -h <ip:port> -u <user> -p <password> -t <target table> [jobname]
```

### Teradata --> ローカルファイル

```shell
# テーブル全体を取得する場合
tdload --SourceTdpid <ip:port> --SourceUserName <user> --SourceUserPassword <password> --SourceTable <source table> --TargetFileName <target file> [jobname]

# クエリ結果を取得する場合
tdload --SourceTdpid <ip:port> --SourceUserName <user> --SourceUserPassword <password> --SelectStmt <select query> --TargetFileName <target file> [jobname]
```

## ローカルファイルをTeradataへロード

### 最小設定のtdload

- ロード先のテーブルは事前に用意
- デフォルトはコンマ区切り
- tdloadコマンド:
  ```shell
  tdload -f <filename> -h <database ip and port> -u <user> -p <password> -t <target table>
  ```

In [7]:
def make_random_data(n):
  import random
  import pandas as pd
  x = random.choices(range(100), k=n)
  y = random.choices(["apple", "banana", "cherry", "durian"], k=n)
  z = [random.random() for _ in range(n)]
  return pd.DataFrame({"x":x, "y":y, "z":z})

# テスト用データの作成
df = make_random_data(1000)
display(df)

# csv形式で保存
savename = os.path.join(tmpdir, "tdloadtest.csv")
df.to_csv(savename, index=False, header=False)

# ファイルの中身を確認
with open(savename) as f:
  print(f.read()[:200])

Unnamed: 0,x,y,z
0,96,durian,0.062467
1,5,apple,0.116924
2,92,apple,0.557616
3,42,cherry,0.720538
4,12,apple,0.120265
...,...,...,...
995,76,apple,0.364543
996,62,durian,0.119408
997,87,apple,0.257500
998,18,apple,0.942268


96,durian,0.062467175905832284
5,apple,0.1169243875575946
92,apple,0.5576164806903904
42,cherry,0.7205376843541185
12,apple,0.12026486963749572
53,banana,0.4415719404352604
41,apple,0.5311424481049186


In [8]:
# ロード先のテーブルを作成
# CREATEクエリを実行してもよいですが、ここではteradatamlを利用する例を示します
from teradataml import copy_to_sql
from teradatasqlalchemy import INTEGER, VARCHAR


table_name = "tdload_test"
_clean_job_junks(table_name)

empty_df = df.loc[[]]
types_ = {"x": INTEGER(), "y": VARCHAR(10, "LATIN")}
%time copy_to_sql(empty_df, table_name, types=types_, if_exists="replace")

# 結果の確認
a = %sql SELECT TOP 10 * FROM tdload_test
display(a)
a = %sql SELECT count(*) FROM tdload_test
display(a)
a = %sql SHOW TABLE tdload_test
print(a.values[0,0].replace("\r", "\n"))

Deleted tdload_test


CPU times: user 15.1 ms, sys: 8.47 ms, total: 23.5 ms
Wall time: 1.36 s
0 rows affected.


1 rows affected.


Unnamed: 0,Count(*)
0,0


1 rows affected.
CREATE MULTISET TABLE DEMO_USER.tdload_test ,FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      x INTEGER,
      y VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
      z FLOAT)
NO PRIMARY INDEX ;


In [9]:
# tdload コマンドを実行
import subprocess

command = [
  "tdload", 
  "-f", savename,
  "-h", "{}:{}".format(host, dbs_port),
  "-u", user,
  "-p", password,
  "-t", "tdload_test",
  "my-first-loadjob"
]
%time p = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

print("Return code:", p.returncode)
print("===========================")
print("Standard output:")
print(p.stdout.decode())
print("===========================")
print("Standard error:")
print(p.stderr.decode())
print("===========================")

CPU times: user 5.19 ms, sys: 12.7 ms, total: 17.9 ms
Wall time: 24.4 s
Return code: 0
Standard output:
Teradata Parallel Transporter Version 17.20.00.11 64-Bit
The global configuration file '/Library/Application Support/teradata/client/17.20/tbuild/twbcfg.ini' is used.
   Log Directory: /Library/Application Support/teradata/client/17.20/tbuild/logs
   Checkpoint Directory: /Library/Application Support/teradata/client/17.20/tbuild/checkpoint

Job log: /Library/Application Support/teradata/client/17.20/tbuild/logs/my-first-loadjob-77.out
Job id is my-first-loadjob-77, running on TD-C02Z50MMLVDQ
Teradata Parallel Transporter DataConnector Operator Version 17.20.00.11
$FILE_READER[1]: Instance 1 directing private log report to 'FileReaderLog-1'.
$FILE_READER[1]: DataConnector Producer operator Instances: 1
$FILE_READER[1]: ECI operator ID: '$FILE_READER-31931'
$FILE_READER[1]: Operator instance 1 processing file 'tmp/tdloadtest.csv'.
Teradata Parallel Transporter Load Operator Version 17.

In [10]:
# 結果を確認
a = %sql SELECT TOP 10 * FROM tdload_test
display(a)
a = %sql SELECT count(*) FROM tdload_test
display(a)

10 rows affected.


Unnamed: 0,x,y,z
0,92,apple,0.557616
1,12,apple,0.120265
2,53,banana,0.441572
3,41,apple,0.531142
4,64,banana,0.191872
5,48,banana,0.746083
6,82,durian,0.74272
7,42,cherry,0.720538
8,5,apple,0.116924
9,96,durian,0.062467


1 rows affected.


Unnamed: 0,Count(*)
0,1000


In [11]:
# もう１度実行すると、さらにデータが追加される
%time p = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

print("Return code:", p.returncode)
print("===========================")
print("Standard output:")
print(p.stdout.decode())
print("===========================")
print("Standard error:")
print(p.stderr.decode())
print("===========================")

# 結果を確認
a = %sql SELECT TOP 10 * FROM tdload_test
display(a)
a = %sql SELECT count(*) FROM tdload_test
display(a)

CPU times: user 23 ms, sys: 20 ms, total: 43 ms
Wall time: 47.8 s
Return code: 0
Standard output:
Teradata Parallel Transporter Version 17.20.00.11 64-Bit
The global configuration file '/Library/Application Support/teradata/client/17.20/tbuild/twbcfg.ini' is used.
   Log Directory: /Library/Application Support/teradata/client/17.20/tbuild/logs
   Checkpoint Directory: /Library/Application Support/teradata/client/17.20/tbuild/checkpoint

Job log: /Library/Application Support/teradata/client/17.20/tbuild/logs/my-first-loadjob-78.out
Job id is my-first-loadjob-78, running on TD-C02Z50MMLVDQ
Teradata Parallel Transporter DataConnector Operator Version 17.20.00.11
$FILE_READER[1]: Instance 1 directing private log report to 'FileReaderLog-1'.
$FILE_READER[1]: DataConnector Producer operator Instances: 1
$FILE_READER[1]: ECI operator ID: '$FILE_READER-31956'
$FILE_READER[1]: Operator instance 1 processing file 'tmp/tdloadtest.csv'.
Teradata Parallel Transporter Stream Operator Version 17.20.0

Unnamed: 0,x,y,z
0,92,apple,0.557616
1,12,apple,0.120265
2,53,banana,0.441572
3,41,apple,0.531142
4,64,banana,0.191872
5,48,banana,0.746083
6,96,durian,0.062467
7,5,apple,0.116924
8,92,apple,0.557616
9,42,cherry,0.720538


1 rows affected.


Unnamed: 0,Count(*)
0,2000


### タブ区切りファイル

- `-d TAB` オプションを付与する

In [12]:
df = make_random_data(1200)
display(df)

# csv形式で保存
savename = os.path.join(tmpdir, "tdloadtest_tab.tsv")
df.to_csv(savename, index=False, header=False, sep="\t")

# ファイルの中身を確認
with open(savename) as f:
  print(f.read()[:200])

Unnamed: 0,x,y,z
0,27,cherry,0.004770
1,98,cherry,0.541251
2,93,cherry,0.769231
3,46,cherry,0.991705
4,20,banana,0.493578
...,...,...,...
1195,46,apple,0.860119
1196,48,banana,0.638892
1197,95,durian,0.766699
1198,96,durian,0.122196


27	cherry	0.004769568956497605
98	cherry	0.5412506513153533
93	cherry	0.7692308907142845
46	cherry	0.9917047074874928
20	banana	0.4935776106817925
20	banana	0.5400173092261606
31	cherry	0.842141164306


In [13]:
command = [
  "tdload", 
  "-f", savename,
  "-d", "TAB",
  "-h", "{}:{}".format(host, dbs_port),
  "-u", user,
  "-p", password,
  "-t", "tdload_test",
  "loadjob-with-tsv"
]
%time p = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

print("Return code:", p.returncode)
print("===========================")
print("Standard output:")
print(p.stdout.decode())
print("===========================")
print("Standard error:")
print(p.stderr.decode())
print("===========================")

# 結果を確認
a = %sql SELECT TOP 10 * FROM tdload_test
display(a)
a = %sql SELECT count(*) FROM tdload_test
display(a)

CPU times: user 44.5 ms, sys: 28.6 ms, total: 73.1 ms
Wall time: 49.4 s
Return code: 0
Standard output:
Teradata Parallel Transporter Version 17.20.00.11 64-Bit
The global configuration file '/Library/Application Support/teradata/client/17.20/tbuild/twbcfg.ini' is used.
   Log Directory: /Library/Application Support/teradata/client/17.20/tbuild/logs
   Checkpoint Directory: /Library/Application Support/teradata/client/17.20/tbuild/checkpoint

Job log: /Library/Application Support/teradata/client/17.20/tbuild/logs/loadjob-with-tsv-79.out
Job id is loadjob-with-tsv-79, running on TD-C02Z50MMLVDQ
Teradata Parallel Transporter DataConnector Operator Version 17.20.00.11
$FILE_READER[1]: Instance 1 directing private log report to 'FileReaderLog-1'.
Teradata Parallel Transporter Stream Operator Version 17.20.00.11
$STREAM: private log specified: StreamLog
$FILE_READER[1]: DataConnector Producer operator Instances: 1
$FILE_READER[1]: ECI operator ID: '$FILE_READER-31999'
$FILE_READER[1]: Opera

Unnamed: 0,x,y,z
0,92,apple,0.557616
1,12,apple,0.120265
2,53,banana,0.441572
3,41,apple,0.531142
4,64,banana,0.191872
5,48,banana,0.746083
6,96,durian,0.062467
7,5,apple,0.116924
8,92,apple,0.557616
9,42,cherry,0.720538


1 rows affected.


Unnamed: 0,Count(*)
0,3200


### ヘッダー付きファイルのロード

- `--sourceSkipRows <n>` オプションを付与することで、最初のn行をスキップする

CPU times: user 3.24 ms, sys: 10.9 ms, total: 14.1 ms
Wall time: 19.9 s
Return code: 0
Standard output:
Teradata Parallel Transporter Version 17.00.00.16 64-Bit
The global configuration file '/Library/Application Support/teradata/client/17.00/tbuild/twbcfg.ini' is used.
   Log Directory: /Library/Application Support/teradata/client/17.00/tbuild/logs
   Checkpoint Directory: /Library/Application Support/teradata/client/17.00/tbuild/checkpoint

Job log: /Library/Application Support/teradata/client/17.00/tbuild/logs/my-first-exportjob-96.out
Job id is my-first-exportjob-96, running on TD-C02Z50MMLVDQ
Teradata Parallel Transporter DataConnector Operator Version 17.00.00.16
Teradata Parallel Transporter Export Operator Version 17.00.00.16
$EXPORT: private log specified: ExportLog
$FILE_WRITER[1]: DataConnector Consumer operator Instances: 1
$FILE_WRITER[1]: ECI operator ID: '$FILE_WRITER-80737'
$FILE_WRITER[1]: Operator instance 1 processing file 'tmp/tdload-exported.csv'.
$EXPORT: connecti

Unnamed: 0,0,1,2
0,51,cherry,0.540910
1,4,banana,0.578536
2,2,cherry,0.911830
3,46,cherry,0.196104
4,7,banana,0.421327
...,...,...,...
3295,16,durian,0.923292
3296,53,cherry,0.724708
3297,33,durian,0.219491
3298,56,banana,0.448314


### 抽出内容を制御

- `--SelectStmt <query>` でSELECT文を用いて抽出する内容を指定できる

## Teradataからローカルファイルへデータ抽出

### 基本的な抽出コマンド

- `SourceTdpid <host:port>` で抽出元のデータベースのIPとポートを指定
- `SourceTable <table name>` でテーブル全体を抽出
- 結果には列名がつかないので、適宜補う必要がある


In [16]:
import pandas as pd

outfile = os.path.join(tmpdir, "tdload-exported.csv")
command = [
  "tdload",
  "--SourceTdpid", "{}:{}".format(host, dbs_port),
  "--SourceUserName", user,
  "--SourceUserPassword", password,
  "--SourceTable", "tdload_test",
  "--TargetFileName", outfile,
  "my-first-exportjob"
]

%time p = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

print("Return code:", p.returncode)
print("===========================")
print("Standard output:")
print(p.stdout.decode())
print("===========================")
print("Standard error:")
print(p.stderr.decode())
print("===========================")

# 抽出結果の確認
x = pd.read_csv(outfile, header=None)
x

CPU times: user 5.09 ms, sys: 13.3 ms, total: 18.3 ms
Wall time: 13.9 s
Return code: 0
Standard output:
Teradata Parallel Transporter Version 17.20.00.11 64-Bit
The global configuration file '/Library/Application Support/teradata/client/17.20/tbuild/twbcfg.ini' is used.
   Log Directory: /Library/Application Support/teradata/client/17.20/tbuild/logs
   Checkpoint Directory: /Library/Application Support/teradata/client/17.20/tbuild/checkpoint

Job log: /Library/Application Support/teradata/client/17.20/tbuild/logs/my-first-exportjob-81.out
Job id is my-first-exportjob-81, running on TD-C02Z50MMLVDQ
Teradata Parallel Transporter DataConnector Operator Version 17.20.00.11
$FILE_WRITER[1]: Instance 1 directing private log report to 'FileWriterLog-1'.
$FILE_WRITER[1]: DataConnector Consumer operator Instances: 1
$FILE_WRITER[1]: ECI operator ID: '$FILE_WRITER-32077'
$FILE_WRITER[1]: Operator instance 1 processing file 'tmp/tdload-exported.csv'.
Teradata Parallel Transporter Export Operator 

Unnamed: 0,0,1,2
0,96,durian,0.062467
1,5,apple,0.116924
2,92,apple,0.557616
3,42,cherry,0.720538
4,12,apple,0.120265
...,...,...,...
3895,46,apple,0.860119
3896,48,banana,0.638892
3897,95,durian,0.766699
3898,96,durian,0.122196


### 抽出内容を制御
- `--SelectStmt <query>` でSELECT文を用いて抽出する内容を指定できる

In [17]:
outfile = os.path.join(tmpdir, "tdload-exported-apple.csv")
command = [
  "tdload",
  "--SourceTdpid", "{}:{}".format(host, dbs_port),
  "--SourceUserName", user,
  "--SourceUserPassword", password,
  "--SelectStmt", "SELECT * FROM tdload_test WHERE y = 'apple'",
  "--TargetFileName", outfile,
  "exportjob-with-query"
]

%time p = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

print("Return code:", p.returncode)
print("===========================")
print("Standard output:")
print(p.stdout.decode())
print("===========================")
print("Standard error:")
print(p.stderr.decode())
print("===========================")

# 抽出結果の確認
x = pd.read_csv(outfile, header=None)
x

CPU times: user 5.15 ms, sys: 13.2 ms, total: 18.3 ms
Wall time: 13.6 s
Return code: 0
Standard output:
Teradata Parallel Transporter Version 17.20.00.11 64-Bit
The global configuration file '/Library/Application Support/teradata/client/17.20/tbuild/twbcfg.ini' is used.
   Log Directory: /Library/Application Support/teradata/client/17.20/tbuild/logs
   Checkpoint Directory: /Library/Application Support/teradata/client/17.20/tbuild/checkpoint

Job log: /Library/Application Support/teradata/client/17.20/tbuild/logs/exportjob-with-query-82.out
Job id is exportjob-with-query-82, running on TD-C02Z50MMLVDQ
Teradata Parallel Transporter DataConnector Operator Version 17.20.00.11
$FILE_WRITER[1]: Instance 1 directing private log report to 'FileWriterLog-1'.
Teradata Parallel Transporter Export Operator Version 17.20.00.11
$EXPORT: private log specified: ExportLog
$FILE_WRITER[1]: DataConnector Consumer operator Instances: 1
$FILE_WRITER[1]: ECI operator ID: '$FILE_WRITER-32092'
$FILE_WRITER[1

Unnamed: 0,0,1,2
0,5,apple,0.116924
1,92,apple,0.557616
2,12,apple,0.120265
3,41,apple,0.531142
4,5,apple,0.774828
...,...,...,...
996,32,apple,0.687444
997,64,apple,0.394531
998,54,apple,0.277423
999,73,apple,0.282837


### Teradataテーブルのデータを別のテーブルへロード

In [18]:
q = """
CREATE MULTISET TABLE tdload_test2
  ,FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL
(
  x INTEGER,
  y VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
  z FLOAT
)
NO PRIMARY INDEX
"""
_clean_job_junks("tdload_test2")
%sql {q}

Deleted tdload_test2


0 rows affected.


In [19]:
# make the target table empty
%sql DELETE FROM tdload_test2

command = [
  "tdload",
  "--SourceTdpid", "{}:{}".format(host, dbs_port),
  "--SourceUserName", user,
  "--SourceUserPassword", password,
  #"--SourceTable", "tdload_test",
  "--SelectStmt", "SELECT * FROM tdload_test WHERE y = 'banana'",
  "--TargetTdpid", "{}:{}".format(host, dbs_port),
  "--TargetUserName", user,
  "--TargetUserPassword", password,
  "--TargetTable", "tdload_test2",
  # it's okay to specify databasename explicitly
  "teradata-to-teradata"
]

%time p = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

print("Return code:", p.returncode)
print("===========================")
print("Standard output:")
print(p.stdout.decode())
print("===========================")
print("Standard error:")
print(p.stderr.decode())
print("===========================")

0 rows affected.
CPU times: user 3.76 ms, sys: 6.76 ms, total: 10.5 ms
Wall time: 25.5 s
Return code: 0
Standard output:
Teradata Parallel Transporter Version 17.20.00.11 64-Bit
The global configuration file '/Library/Application Support/teradata/client/17.20/tbuild/twbcfg.ini' is used.
   Log Directory: /Library/Application Support/teradata/client/17.20/tbuild/logs
   Checkpoint Directory: /Library/Application Support/teradata/client/17.20/tbuild/checkpoint

Job log: /Library/Application Support/teradata/client/17.20/tbuild/logs/teradata-to-teradata-83.out
Job id is teradata-to-teradata-83, running on TD-C02Z50MMLVDQ
Teradata Parallel Transporter Export Operator Version 17.20.00.11
Teradata Parallel Transporter Load Operator Version 17.20.00.11
$EXPORT: private log specified: ExportLog
$LOAD: private log specified: LoadLog
$LOAD: connecting sessions
$EXPORT: connecting sessions
$LOAD: preparing target table
$LOAD: entering Acquisition Phase
$EXPORT: sending SELECT request
$EXPORT: ent

In [20]:
# 結果の確認
a = %sql SELECT TOP 10 * FROM tdload_test2
display(a)

a = %sql SELECT y, count(*) FROM tdload_test2 GROUP BY y
display(a)

10 rows affected.


Unnamed: 0,x,y,z
0,48,banana,0.746083
1,75,banana,0.819887
2,56,banana,0.278024
3,44,banana,0.897795
4,35,banana,0.687687
5,80,banana,0.95862
6,86,banana,0.799677
7,68,banana,0.017246
8,64,banana,0.191872
9,53,banana,0.441572


1 rows affected.


Unnamed: 0,y,Count(*)
0,banana,934


## 実行中のジョブの監視
- twbstat: 現在実行中のジョブを一覧表示
- twbcmd: 実行中のジョブの状態確認ないし操作
- ジョブIDは、自分でつけた job name に、連番を加えたもの：{job name}-{number}

In [21]:
# 実行時間の長いジョブを定義
df = make_random_data(5000000)
display(df)
savename = os.path.join(tmpdir, "tdloadtest_5m.csv")
df.to_csv(savename, index=False, header=False)
del df

_clean_job_junks("tdload_test")
q = """
CREATE MULTISET TABLE tdload_test
  ,FALLBACK
  ,NO BEFORE JOURNAL
  ,NO AFTER JOURNAL  
(
  x INTEGER,
  y VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
  z FLOAT
)
"""
%sql {q}

Unnamed: 0,x,y,z
0,82,cherry,0.318184
1,42,cherry,0.667681
2,46,banana,0.759340
3,25,banana,0.632203
4,94,durian,0.201442
...,...,...,...
4999995,93,apple,0.406113
4999996,2,durian,0.584759
4999997,0,durian,0.253913
4999998,63,banana,0.264331


Deleted tdload_test


0 rows affected.


In [22]:
# ジョブを開始
# subprocess.run の代わりに subprocess.Popenを使うことでバックグラウンドで実行
# コンソールでの ... & に対応
command = [
  "tdload", 
  "-f", savename,
  "-h", "{}:{}".format(host, dbs_port),
  "-u", user,
  "-p", password,
  "-t", "tdload_test",
  "my-long-job"
]

p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

In [23]:
import time
time.sleep(5)
# ジョブ一覧の反映までにややタイムラグがあるので少し待つ

# twbstat: ジョブ一覧
p2 = subprocess.run(["twbstat"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out = p2.stdout.decode()
print("job list *****")
print(out)
print("********")

# twbcmd <jobid> JOB STATUS: ジョブの状態
jobids = [v for v in out.split("\n") if v.startswith("my-long-job-")]
if len(jobids) >= 1:
  jobid = jobids[-1]
  print("status of", jobid)
  p3 = subprocess.run(["twbcmd", jobid, "JOB", "STATUS"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  print("-----")
  print(p3.stdout.decode())
  print("-----")
else:
  print("No job is running")  

job list *****
Using job directory /Library/Application Support/teradata/client/17.20/tbuild/logs

Jobs running: 1

my-long-job-84

********
status of my-long-job-84
-----
Using job directory /Library/Application Support/teradata/client/17.20/tbuild/logs

Command, JOB STATUS, successfully sent to my-long-job-84.

-----


In [24]:
%%time
# tdload ジョブ の終了まで待つ
elapsed = 0
while p.poll() is None:
  p3 = subprocess.run(["twbcmd", jobid, "JOB", "STATUS"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  print("*** Current job status ***")
  print(p3.stdout.decode())
  time.sleep(60)
  elapsed += 1
  print(f"Elapsed: {elapsed} mins")

print("Return code:", p.returncode)
print("===========================")
print("Standard output:")
print(p.stdout.read().decode())
print("===========================")
print("Standard error:")
print(p.stderr.read().decode())
print("===========================")

*** Current job status ***
Using job directory /Library/Application Support/teradata/client/17.20/tbuild/logs

Command, JOB STATUS, successfully sent to my-long-job-84.

Elapsed: 1 mins
*** Current job status ***
Using job directory /Library/Application Support/teradata/client/17.20/tbuild/logs

Command, JOB STATUS, successfully sent to my-long-job-84.

Elapsed: 2 mins
*** Current job status ***
Using job directory /Library/Application Support/teradata/client/17.20/tbuild/logs

Command, JOB STATUS, successfully sent to my-long-job-84.

Elapsed: 3 mins
*** Current job status ***
Using job directory /Library/Application Support/teradata/client/17.20/tbuild/logs

Command, JOB STATUS, successfully sent to my-long-job-84.

Elapsed: 4 mins
*** Current job status ***
Using job directory /Library/Application Support/teradata/client/17.20/tbuild/logs

Command, JOB STATUS, successfully sent to my-long-job-84.

Elapsed: 5 mins
*** Current job status ***
Using job directory /Library/Application Su

In [25]:
# 結果の確認
a = %sql SELECT TOP 10 * FROM tdload_test
display(a)

a = %sql SELECT count(*) FROM tdload_test
display(a)

10 rows affected.


Unnamed: 0,x,y,z
0,28,banana,0.035027
1,90,cherry,0.328262
2,54,apple,0.620177
3,2,durian,0.437784
4,87,apple,0.567366
5,86,cherry,0.48672
6,13,durian,0.397314
7,42,durian,0.211646
8,41,banana,0.829438
9,70,durian,0.675324


1 rows affected.


Unnamed: 0,Count(*)
0,5000000
