Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kaggle CLI Listing and Downloading files where archive has a space in the folder name #500

Closed
rholowczak opened this issue Sep 12, 2023 · 1 comment

Comments

@rholowczak
Copy link

rholowczak commented Sep 12, 2023

Kaggle CLI has issues when working with datasets that have nested folders with spaces in the folder names.
One example is this dataset: viktoriiashkurenko/278k-spotify-songs

We can use the Kaggle CLI to get a list of files in the dataset:

$ kaggle datasets files viktoriiashkurenko/278k-spotify-songs
name                                size  creationDate
--------------------------------  ------  -------------------
artists.csv                          6MB  2023-05-18 17:11:45
music_genres.txt                     3KB  2023-05-18 17:11:45
final_tracks.csv                    61MB  2023-05-18 17:11:45
im_getting_these_vibes_uknow.txt     2KB  2023-05-18 17:11:45
main_dataset.csv                   115MB  2023-05-18 17:11:45
final_playlists.csv               1000KB  2023-05-18 17:11:45

However, this list omits the nested folder: "Cleaned Analyses"

It does not seem possible to list the files in that folder. Possibly this is due to the space in the name of the folder:

kaggle datasets files viktoriiashkurenko/278k-spotify-songs/Cleaned Analyses
usage: kaggle [-h] [-v] {competitions,c,datasets,d,kernels,k,models,m,files,f,config} ...
kaggle: error: unrecognized arguments: Analyses

Enclosing the path in single or double quotes does not help. Also trying the escape the space or replace it with an HTML encoded space (%20) does not seem to work. This is on Windows command shell if that makes a difference:

kaggle datasets files "viktoriiashkurenko/278k-spotify-songs/Cleaned Analyses"
400 - Bad Request - Invalid datasetVersionNumber value

We can extract one file from a dataset by specifying the "-f" option:

kaggle datasets download -d viktoriiashkurenko/278k-spotify-songs -f artists.csv

It seems we can put quotes around the full file path to extract individual files:

kaggle datasets download -d viktoriiashkurenko/278k-spotify-songs -f "Cleaned Analyses/Cleaned Analyses/000CbwTZICdj6uprlrc1f1.pickle"

Perhaps I am just missing some obvious tricks or command-line options. Please let me know if you have any suggestions.

Thanks

@rholowczak
Copy link
Author

I closed this issue as I believe the title is misleading. The real issue is that currently one can not get a complete list of all files in the dataset including those in nested folders.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant