Skip to content

How to update an unstructured dataset with a bunch of (large) files?

Sven Thiel edited this page May 2, 2022 · 3 revisions

Within BEXIS2, there is no limitation for an unstructured dataset about the number of attached files. Although the system provides a facility to upload a file to a certain dataset, this workflow might not be desirable and even cumbersome.

First of all, the transfer of big files via BEXIS2 is error prone (e.g. timeout, disconnect, bandwith, etc...). Additinally, the system allows an upload of a single file at the time. If you need to upload several or hundreds of files, this approach isn't working out.

For this reason, the next section will contain a detailed explanation of how a bunch of files could be added to a dataset directly on the server an via SQL statements and keeping the dataset's integrity.

This approach assumes that the data is already on the server.

Please do following steps for a dataset update:

  1. Connect yourself to the remote server that runs you BEXIS2 instance. In general, it should be RDP.
  2. Identify the folder with all files that should be uploaded.
  3. Copy & paste the PowerShell script and adapt it related to your:
    1. dataset id
    2. dataset version id
    3. folder that contains the files
    4. (optionally) output file
  4. Execute the PowerShell script.
  5. Open "pgAdmin" and connect to the proper database.
  6. Copy & paste all sql statements into a new query tool windows within pgAdmin.
  7. Execute the sql statements (INSERTs).
7b1bc4ce-6964-494e-89b6-4469937c5ab4.png.png
pgAdmin

Optionally, go to the table "datasetversions" and change the field "changedescription" to a proper description.

3364f515-4b2a-4c38-800a-0cf1efc6cf77.png
pgAdmin

PowerShell Script for SQL INSERT statements

Please copy & paste the following content and save it as a "*.ps1" (PowerShell Script) file on the server.

# replace it with the actual dataset id
$datasetid = 3507 

# replace it with the actual dataset version id
$datasetversionref = 5286

# replace it with the actual folder, that contain all files to add
$files = Get-ChildItem "E:\iBDP212\Temp\sventhiel"

$result = foreach ($f in $files){
    Write-Output "INSERT INTO public.contentdescriptors(
        versionno, extra, orderno, mimetype, name, uri, datasetversionref)
        VALUES (1, null, 1, 'application/x-zip-compressed', 'unstructuredData', 'Datasets\$datasetid\$f', $datasetversionref);"
} 

<# 
replace the Out-File parameter to change the files that will contain the sql statements after execution.
#>
$result | Out-File sql_statments.txt

What are the limitations of the script?

Currently, the script has follow requirements that need to match:

  1. all files are located inside a single folder on a local drive.
  2. all files are of type 'application/x-zip-compressed'.

Additionally, the script will output a set of sql statements within the output file, only. So, there is no automatic execution of it. Also, there might be other modification you need to do afterwards, depending on your wishes - e.g. changing the description of the related dataset version that it fits to the "multi"-upload. But this is not mandatory, only for reasoning.

Clone this wiki locally