# Add New Data to Existing Table Lineage

Adding new data to an existing dataset is a common task, as more data is collected and we want to leverage it to improve the model. This notebook demonstrates how to add new data to an existing 3LC dataset by creating a new table that merges two or more existing tables.

We will cover two examples:
1. Adding new data with the same classes.
2. Adding new data with different classes, requiring a new, merged schema.

In [None]:
from tlc_tools.common import get_dataset_path
import tlc

## Add new data with the same classes

We will reuse the cats and dogs dataset from the previous section and add a new batch of data.

Before we add it, we need to create a Table with the new data.

In [None]:
data_path = get_dataset_path("more-cats-and-dogs")

dataset_name = "cats-and-dogs"
project_name = "image-classification-table"

new_data_table = tlc.Table.from_image_folder(
    data_path,
    table_name="new-data",
    dataset_name=dataset_name,
    project_name=project_name,
    add_weight_column=True,
    if_exists="overwrite",
)

new_data_table

Now that we have the new Table, let's also get the cats and dogs dataset from the previous section.

In [None]:
initial_table = tlc.Table.from_names("initial", dataset_name, project_name)
initial_table

Now that we have the two tables, we are ready to combine them using Table.join_tables().

In [None]:
joined_table = tlc.Table.join_tables([initial_table, new_data_table], table_name="added-more-data")
joined_table

## Add new data with different classes

We will now create a new image folder table containing animals in the categories "bats" and "frogs". This table will be joined with our existing table.


In [None]:
data_path = get_dataset_path("bats-and-frogs")

dataset_name = "cats-and-dogs"
project_name = "image-classification-table"

more_new_data_table = tlc.Table.from_image_folder(
    data_path,
    table_name="more-new-data",
    dataset_name=dataset_name,
    project_name=project_name,
    add_weight_column=True, # Set to zero somehow here
    weight_value=0.0,
    if_exists="overwrite",
)

more_new_data_table

In [None]:
more_new_data_table.table_rows[2]

We now create yet another Table by joining the previous table with the table with new images of bats and frogs.

In [None]:
joined_again_table = tlc.Table.join_tables([joined_table, more_new_data_table], table_name="added-bats-and-frogs-data")
joined_again_table

Originally, the two tables had different value maps. Let's inspect them:

In [None]:
joined_table.get_simple_value_map("label")

In [None]:
more_new_data_table.get_simple_value_map("label")

Notice how these value maps had overlapping class indices. Table.join_tables() handled this by combining these schemas:

In [None]:
joined_again_table.get_simple_value_map("label")

It also updated the data correspondingly:

In [None]:
joined_again_table[25]