Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't import files with non alphanumeric chars in path #475

Closed
Nedfire2347 opened this issue Feb 21, 2020 · 7 comments
Closed

Can't import files with non alphanumeric chars in path #475

Nedfire2347 opened this issue Feb 21, 2020 · 7 comments
Labels

Comments

@Nedfire2347
Copy link

Hello again, I'm currently working with @src7 on some dumps
And we can't import these kinds of sample with bin/import_dir.py because of their names.

Samples Examples :
Importing folder named : Collection #1_BTC combos
Import hieracrhi with files named : api_scrape_item.php?i=aZe0Rt1Y

Concerned code in this file :

for dirname, dirnames, filenames in os.walk(args.directory):

Notice : It could also lead to an exploitable vulnerabilty

@Nedfire2347 Nedfire2347 changed the title Can't import files with non alphanumeric chars Can't import files with non alphanumeric chars in path Feb 21, 2020
@src7
Copy link

src7 commented Feb 21, 2020

Hello,

For now I use this workaround to rename
to_import/1970/01/01/api_scrape_item.php?i=aZe0Rt1Y
to
to_import/1970/01/01/aZe0Rt1Y.txt
in batch in a very efficient way.

Command (in the to_import folder)
mmv ';*\=*' '#1#3.txt' (use -n to test before)

@Terrtia
Copy link
Member

Terrtia commented Feb 25, 2020

Hi @Nedfire2347 @src7 !
Thanks for the report !

The issue is related to the white-space in the path. (we use this separator in the Mixer module)

Fixed with 72fe8a2

@src7
Copy link

src7 commented Feb 25, 2020

Hi,

what about the ? and the = ?

@Terrtia
Copy link
Member

Terrtia commented Feb 25, 2020

you right !
I removed some special characters with bdf2fce

@src7
Copy link

src7 commented Feb 26, 2020

Nice

(it is a bit extreme but two files named &.txt and ?.txt can't coexist then ? Not a problem for me)

@src7
Copy link

src7 commented Feb 26, 2020

First of all, there is a new dependency to add : python3-magic

But is this output normal ?
Some files are not gzipped ?

import_dir/pastebin>>to_import/2020/01/09/xqjKa4qa.txt.gz
import_dir/pastebin>>to_import/2020/01/09/xqShrUPm.txt
import_dir/pastebin>>to_import/2020/01/09/xqRx2kKG.txt.gz
import_dir/pastebin>>to_import/2020/01/09/xqDegaHQ.txt
import_dir/pastebin>>to_import/2020/01/09/xpvqC3p5.txt.gz
import_dir/pastebin>>to_import/2020/01/09/xpmfG0Fx.txt.gz
import_dir/pastebin>>to_import/2020/01/09/xpi1fgiw.txt.gz
import_dir/pastebin>>to_import/2020/01/09/xpDbbCTK.txt.gz
import_dir/pastebin>>to_import/2020/01/09/xnw2wdy2.txt
import_dir/pastebin>>to_import/2020/01/09/xnvEzU17.txt.gz

@Terrtia Terrtia added the bug label Feb 27, 2020
@Terrtia
Copy link
Member

Terrtia commented Feb 27, 2020

Thanks for the feedback !
All the files to import need to be gzipped.
I removed the python3-magic dependency (already installed in the requirement).
The importer use the magic number to check if a file is gzip compressed.

Fixed with 873797d

Files with the same file-path but different content are renamed by the Global Module(with an uuidv4)

@Terrtia Terrtia closed this as completed Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants