<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/drive/1oMEenCfGl19MtRfHdCNdsjGxwDqlo085" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelpandas/blob/main/notebooks/local-files.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# _**Creating Data Rows from Row Data Local Files with LabelPandas**_

## _**Documentation**_

**Requirements:**

- A `row_data` column - This column must be URLs that point to the asset to-be-uploaded
  - This notebook will show how to create URLs for local files

- Either a `dataset_id` column or an input argument for `dataset_id`
  - If uploading to multiple datasets, provide a `dataset_id` column 
  - If uploading to one dataset, provide a `dataset_id` input argument
    - _This can still be a column if it's already in your CSV file_

**Recommended:**
- A `global_key` column
  - This column contains unique identifiers for your data rows
  - If none is provided, will default to your `row_data` column
- An `external_id` column
  - This column contains non-unique identifiers for your data rows
  - If none is provided, will default to your `global_key` column  

## Code

Install LabelPandas and Labelbox

In [None]:
## Install LabelPandas
!pip install labelpandas --upgrade -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/187.7 KB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m163.8/187.7 KB[0m [31m4.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m187.7/187.7 KB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import labelpandas as lp
import pandas as pd

Define runtime variables

In [None]:
csv_path = "https://raw.githubusercontent.com/Labelbox/labelpandas/main/datasets/local-files.csv" # Path to your CSV file
api_key = "" # Labelbox API Key

Load a CSV (and for this demo, local files)

In [None]:
!git clone https://github.com/Labelbox/labelpandas -q # Get images from LabelPandas repo

df = pd.read_csv(csv_path)
df.head(10)

Unnamed: 0,external_id,global_key,file_path
0,6.jpg,labelpandas-test-art-6.jpg,labelpandas/datasets/images/6.jpg
1,10.jpg,labelpandas-test-art-10.jpg,labelpandas/datasets/images/10.jpg
2,2.jpg,labelpandas-test-art-2.jpg,labelpandas/datasets/images/2.jpg
3,3.jpg,labelpandas-test-art-3.jpg,labelpandas/datasets/images/3.jpg
4,8.jpg,labelpandas-test-art-8.jpg,labelpandas/datasets/images/8.jpg
5,9.jpg,labelpandas-test-art-9.jpg,labelpandas/datasets/images/9.jpg
6,4.jpg,labelpandas-test-art-4.jpg,labelpandas/datasets/images/4.jpg
7,1.jpg,labelpandas-test-art-1.jpg,labelpandas/datasets/images/1.jpg
8,5.jpg,labelpandas-test-art-5.jpg,labelpandas/datasets/images/5.jpg
9,7.jpg,labelpandas-test-art-7.jpg,labelpandas/datasets/images/7.jpg


Create a Dataset (for demonstration purposes only)

In [None]:
client = lp.Client(lb_api_key=api_key)

In [None]:
datset_id = client.lb_client.create_dataset(name="LabelPandas-local-files").uid

Convert Local Files to Labelbox-ready URLs

In [None]:
df = lp.load_local_files(
    client=client,
    table=df, 
    file_path_column="file_path",
    verbose=True
)
df.head(10)

Creating URLs for 10 local files...


100%|██████████| 10/10 [00:09<00:00,  1.11it/s]

Success - 10 URLs created in `row_data` column





Unnamed: 0,external_id,global_key,file_path,row_data
0,6.jpg,labelpandas-test-art-6.jpg,labelpandas/datasets/images/6.jpg,gs://labelbox-193903.appspot.com/cklgtitp0gi50...
1,10.jpg,labelpandas-test-art-10.jpg,labelpandas/datasets/images/10.jpg,gs://labelbox-193903.appspot.com/cklgtitp0gi50...
2,2.jpg,labelpandas-test-art-2.jpg,labelpandas/datasets/images/2.jpg,gs://labelbox-193903.appspot.com/cklgtitp0gi50...
3,3.jpg,labelpandas-test-art-3.jpg,labelpandas/datasets/images/3.jpg,gs://labelbox-193903.appspot.com/cklgtitp0gi50...
4,8.jpg,labelpandas-test-art-8.jpg,labelpandas/datasets/images/8.jpg,gs://labelbox-193903.appspot.com/cklgtitp0gi50...
5,9.jpg,labelpandas-test-art-9.jpg,labelpandas/datasets/images/9.jpg,gs://labelbox-193903.appspot.com/cklgtitp0gi50...
6,4.jpg,labelpandas-test-art-4.jpg,labelpandas/datasets/images/4.jpg,gs://labelbox-193903.appspot.com/cklgtitp0gi50...
7,1.jpg,labelpandas-test-art-1.jpg,labelpandas/datasets/images/1.jpg,gs://labelbox-193903.appspot.com/cklgtitp0gi50...
8,5.jpg,labelpandas-test-art-5.jpg,labelpandas/datasets/images/5.jpg,gs://labelbox-193903.appspot.com/cklgtitp0gi50...
9,7.jpg,labelpandas-test-art-7.jpg,labelpandas/datasets/images/7.jpg,gs://labelbox-193903.appspot.com/cklgtitp0gi50...


Upload to Labelbox

In [None]:
results = client.create_data_rows_from_table(
    table = df,
    dataset_id = datset_id,
    skip_duplicates = False, # If True, will skip data rows where a global key is already in use,
    verbose = True, # If True, prints information about code execution
)

Creating upload list - 10 rows in Pandas DataFrame
Beginning data row upload for dataset ID cle3r5m6q4n1r070c9rl3gxwj: uploading 10 data rows
Batch #1: 10 data rows
Success: Upload batch number 1 successful
Upload complete - all data rows uploaded
