<a href="https://colab.research.google.com/github/AlisonJD/tb_examples/blob/main/Add_a_Column_to_a_Data_Source_using_the_CLI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Add a Column to a Data Source using the CLI

Based on Tinybird blog post:

https://blog.tinybird.co/2021/05/25/add-column/

If you have opened the notebook in Google Colab then `Copy to Drive` (see above).

In [1]:
#@title Mount your Google Drive to save and use local files
from google.colab import drive
drive.mount('/content/gdrive', force_remount=False)

% cd "/content/gdrive/My Drive/Colab Notebooks/Tinybird/tb_examples"

Mounted at /content/gdrive
/content/gdrive/My Drive/Colab Notebooks/Tinybird/tb_examples


In [2]:
#@title Install Tinybird CLI, os and your token
!pip install tinybird-cli -q

import os

if not os.path.isfile('.tinyb'):
  !tb auth

if not os.path.isdir('datasources'):
  !tb init

[?25l[K     |████▌                           | 10 kB 11.6 MB/s eta 0:00:01[K     |█████████                       | 20 kB 13.7 MB/s eta 0:00:01[K     |█████████████▌                  | 30 kB 15.6 MB/s eta 0:00:01[K     |██████████████████              | 40 kB 12.1 MB/s eta 0:00:01[K     |██████████████████████▋         | 51 kB 5.5 MB/s eta 0:00:01[K     |███████████████████████████     | 61 kB 6.0 MB/s eta 0:00:01[K     |███████████████████████████████▋| 71 kB 6.0 MB/s eta 0:00:01[K     |████████████████████████████████| 72 kB 830 kB/s 
[K     |████████████████████████████████| 46 kB 3.2 MB/s 
[K     |████████████████████████████████| 81 kB 7.8 MB/s 
[K     |████████████████████████████████| 54 kB 2.1 MB/s 
[K     |████████████████████████████████| 61 kB 6.8 MB/s 
[K     |████████████████████████████████| 86 kB 4.8 MB/s 
[?25h  Building wheel for tabulate (setup.py) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account al

In [3]:
#@title Helper function to write to files
def write_text_to_file(filename, text):
  with open(filename, 'w') as f: f.write(text)

# Worked Example from Blog:

# Add a Column to a Data Source using the CLI

Business changes, so does data. New attributes in datasets are the norm, not the exception.

You can now add new columns to your existing Data Sources, without worrying about what happens with your existing data ingestion (we will keep importing data with the old schema and start accepting data with the new schema).

Since you can materialize data to other Data Sources at ingestion time, changing the schema of your Data Source could have downstream effects. Don't worry we've solved that.

## 1. Create a Sample Data Source

In [48]:
filename="datasources/fixtures/my_ds.csv"
text='''
n,v
1,A
2,B
3,C
4,D
5,E
6,F
7,G

'''

write_text_to_file(filename, text)

In [49]:
!tb datasource generate datasources/fixtures/my_ds.csv --force

[92m** Generated datasources/my_ds.datasource
** => Create it on the server running: $ tb push datasources/my_ds.datasource
** => Append data using: $ tb datasource append my_ds datasources/fixtures/my_ds.csv`
[0m
[92m** => Generated fixture datasources/fixtures/my_ds.csv[0m


In [50]:
!cat datasources/my_ds.datasource

DESCRIPTION generated from datasources/fixtures/my_ds.csv

SCHEMA >
    `n` Int16,
    `v` String

In [51]:
!tb datasource append my_ds datasources/fixtures/my_ds.csv
# the row in quaratine is the row containing column names

[0m** 🥚 starting import process[0m
[91m
** There was an error with file contents: 1 row in quarantine.[0m
[92m** 🐥 done[0m
[92m** Appended 8 new rows[0m
[92m** Total rows in my_ds: 7[0m
[92m** Data appended to Data Source 'my_ds' successfully![0m
[0m** Data pushed to my_ds[0m


In [52]:
!tb sql "select * from my_ds order by n"

---------
| [1;32mn[0m | [1;32mv[0m |
---------
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
| 6 | F |
| 7 | G |
---------


## 2. Add New Columns



To add new columns, just add them to the end of the current schema definition and then do `tb push --force`.

In [53]:
filename="datasources/my_ds.datasource"
text='''
DESCRIPTION my_ds with some new columns

SCHEMA >
    `n` Int16,
    `v` String,
    `v_str_def_thing` String DEFAULT 'thing',
    `v_str_no_default` String,
    `v_int_def_3` Int16 DEFAULT 3,
    `v_int_no_default` Int16

'''

write_text_to_file(filename, text)

In [54]:
!cat datasources/my_ds.datasource


DESCRIPTION my_ds with some new columns

SCHEMA >
    `n` Int16,
    `v` String,
    `v_str_def_thing` String DEFAULT 'thing',
    `v_str_no_default` String,
    `v_int_def_3` Int16 DEFAULT 3,
    `v_int_no_default` Int16



In [55]:
!tb push datasources/my_ds.datasource --force --yes

[0m** Processing datasources/my_ds.datasource[0m
[0m** Building dependencies[0m
[0m** Running my_ds [0m
[0m** The schema of 'my_ds' has changed.[0m
**   -  ADD COLUMN `v_str_def_thing` String DEFAULT 'thing'
**   -  ADD COLUMN `v_str_no_default` String
**   -  ADD COLUMN `v_int_def_3` Int16 DEFAULT 3
**   -  ADD COLUMN `v_int_no_default` Int16
[92m** The Data Source has been correctly updated.[0m
[0m** Not pushing fixtures[0m


## 3. Add Data with the New Columns

In [56]:
filename="datasources/fixtures/my_ds_new_cols.csv"
text='''
n,v,v_str_def_thing,v_str_no_default,v_int_def_3,v_int_no_default
8,H,other,word,5,10
9,I,,dog,0,6
10,J,again,words,1,8
'''

write_text_to_file(filename, text)

In [57]:
!tb datasource append my_ds datasources/fixtures/my_ds_new_cols.csv
# again the row of column names goes into quarantine

[0m** 🥚 starting import process[0m
[91m
** There was an error with file contents: 1 row in quarantine.[0m
[92m** 🐥 done[0m
[92m** Appended 4 new rows[0m
[92m** Total rows in my_ds: 10[0m
[92m** Data appended to Data Source 'my_ds' successfully![0m
[0m** Data pushed to my_ds[0m


## 4. Add Data with the Old Columns

In [58]:
!tb sql "select * from my_ds order by n"

--------------------------------------------------------------------------------
|  [1;32mn[0m | [1;32mv[0m | [1;32mv_str_def_thing[0m | [1;32mv_str_no_default[0m | [1;32mv_int_def_3[0m | [1;32mv_int_no_default[0m |
--------------------------------------------------------------------------------
|  1 | A | thing           |                  |           3 |                0 |
|  2 | B | thing           |                  |           3 |                0 |
|  3 | C | thing           |                  |           3 |                0 |
|  4 | D | thing           |                  |           3 |                0 |
|  5 | E | thing           |                  |           3 |                0 |
|  6 | F | thing           |                  |           3 |                0 |
|  7 | G | thing           |                  |           3 |                0 |
|  8 | H | other           | word             |           5 |               10 |
|  9 | I |                 | dog           

By default columns will have an empty string or a 0, depending on the type. In the schema you can specify other default values for the new columns.

In [59]:
filename="datasources/fixtures/my_ds_old_cols.csv"
text='''
n,v
11,K
12,L

'''

write_text_to_file(filename, text)

In [60]:
!tb datasource append my_ds datasources/fixtures/my_ds_old_cols.csv
# again the row of column names goes into quarantine

[0m** 🥚 starting import process[0m
[91m
** There was an error with file contents: 1 row in quarantine.[0m
[92m** 🐥 done[0m
[92m** Appended 4 new rows[0m
[92m** Total rows in my_ds: 12[0m
[92m** Data appended to Data Source 'my_ds' successfully![0m
[0m** Data pushed to my_ds[0m


In [61]:
!tb sql "select * from my_ds order by n"

--------------------------------------------------------------------------------
|  [1;32mn[0m | [1;32mv[0m | [1;32mv_str_def_thing[0m | [1;32mv_str_no_default[0m | [1;32mv_int_def_3[0m | [1;32mv_int_no_default[0m |
--------------------------------------------------------------------------------
|  1 | A | thing           |                  |           3 |                0 |
|  2 | B | thing           |                  |           3 |                0 |
|  3 | C | thing           |                  |           3 |                0 |
|  4 | D | thing           |                  |           3 |                0 |
|  5 | E | thing           |                  |           3 |                0 |
|  6 | F | thing           |                  |           3 |                0 |
|  7 | G | thing           |                  |           3 |                0 |
|  8 | H | other           | word             |           5 |               10 |
|  9 | I |                 | dog           

## 5. Notes
1. If you are materializing views from the Data Source that you are adding columns to with `SELECT * FROM ...` the views will break because the target Data Sources won’t have all the columns. To avoid this, use column names instead of * when creating materialized views.

2. You can only add columns to Data Sources that have a `Null` engine or one in the `MergeTree` family``

3. You can keep importing data as if your schema hadn’t changed. Default values will be used for the new columns if a value is not provided for them. At any point, you can start importing with the new schema by sending data that contains the new columns.

4. All the new columns have to be added at the end of the schema of a current Data Source not in between existing columns.