Skip to content

Conversation

@max-zilla
Copy link
Contributor

@max-zilla max-zilla commented Oct 10, 2023

This adds an extractor that will try to concatenate CSV/TSV/XLSX files when uploaded to Clowder.

To test:

export CLOWDER_VERSION=2
python concatenate.py

(there's also a Dockerfile)

Then upload two CSVs to a dataset and see that concatenated.csv is created. If you already have 2+ CSVs in the dataset and you upload a new one, all will be merged into the output. Uploading additional CSVs will update the concatenated file as it goes.

File types are currently separated, meaning CSVs will get their own merge separate from Excel etc. This would be easy to change but not sure if there's a good use case for that?
image

Uses pyclowder files.delete so requires this version of pyclowder: clowder-framework/pyclowder#92

Copy link
Member

@longshuicy longshuicy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test works. But I haven't tested the built image in docker

Copy link
Member

@ddey2 ddey2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested in pycharm. I had to install pandas and set Clowder_version. It works well.

Approving this. I guess we can merge once you address the other comments.

@max-zilla max-zilla merged commit 0ba84d2 into main Nov 7, 2023
@max-zilla max-zilla deleted the csv_concatenator branch November 7, 2023 16:14
@longshuicy longshuicy linked an issue Nov 8, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CSV concatenator demo

4 participants