Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filename of attachments are not decoded #26

Closed
ad-m opened this issue Mar 4, 2018 · 3 comments
Closed

Filename of attachments are not decoded #26

ad-m opened this issue Mar 4, 2018 · 3 comments
Assignees
Labels

Comments

@ad-m
Copy link
Contributor

ad-m commented Mar 4, 2018

Hello,

I am have following e-mail with following content:

...
/span></a></span></p></div></div></div></div></div></div></div></div></div>
</div></div>
</div><br></div>

--f4030435bb14695e4d05669b618a--
--f4030435bb14695e5105669b618c
Content-Type: application/pdf; 
        name="=?UTF-8?Q?Prokuratura_Rejonowa_Warszawa=2D=C5=9Ar=C3=B3dmie=C5=9Bcie_p=C3=B3=C5=82noc_sygn?=
        =?UTF-8?Q?=2E_2Ds=2E_137414_=2D_RSK_pracownik=C3=B3w_Skarbowych_NSZZ_Solidarno=C5=9B?=
        =?UTF-8?Q?=C4=87_=2D_Zarz=C4=85dzenie_o_odmowie_dopuszczenia_SOWP_do_udzia=C5=82u_w_?=
        =?UTF-8?Q?postepowaniu=2Epdf?="
Content-Disposition: attachment; 
        filename="=?UTF-8?Q?Prokuratura_Rejonowa_Warszawa=2D=C5=9Ar=C3=B3dmie=C5=9Bcie_p=C3=B3=C5=82noc_sygn?=
        =?UTF-8?Q?=2E_2Ds=2E_137414_=2D_RSK_pracownik=C3=B3w_Skarbowych_NSZZ_Solidarno=C5=9B?=
        =?UTF-8?Q?=C4=87_=2D_Zarz=C4=85dzenie_o_odmowie_dopuszczenia_SOWP_do_udzia=C5=82u_w_?=
        =?UTF-8?Q?postepowaniu=2Epdf?="
Content-Transfer-Encoding: base64
X-Attachment-Id: f_i53uo58b0

JVBERi0xLjIKJcjH0MRGCjQgMCBvYmoKPDwKL1R5cGUgL091dGxpbmVzCi9Db3VudCAwCj4+CmVu
ZG9iago1IDAgb2JqCjw8Ci9UeXBlIC9Gb250Ci9TdWJ0eXBlIC9UeXBlMQovTmFtZSAvRjAKL0Jh
c2VGb250IC9IZWx2ZXRpY2EKL0VuY29kaW5nIC9NYWNSb21hbkVuY29kaW5nCj4+CmVuZG9iago2
IDAgb2JqCjw8Ci9UeXBlIC9QYWdlCi9QYXJlbnQgMyAwIFIKL1Jlc291cmNlcyA4IDAgUgovTWVk
aWFCb3ggWyAwIDAgNTc2IDgyOS40NCBdCi9Db250ZW50cyA3IDAgUgo+PgplbmRvYmoKOSAwIG9i
ago8PAovVHlwZSAvWE9iamVjdAovU3VidHlwZSAvSW1hZ2UKL05hbWUgL0ltMAovV2lkdGggMTYw
MAovSGVpZ2h0IDIzMDQKL0JpdHNQZXJDb21wb25lbnQgMQovQ29sb3JTcGFjZSAvRGV2aWNlR3Jh
eQovRmlsdGVyIC9DQ0lUVEZheERlY29kZQovRGVjb2RlUGFybXMgPDwgL0sgLTEgL0NvbHVtbnMg
...

The decoded filename are useless:

=?UTF-8?Q?Prokuratura_Rejonowa_Warszawa=2D=C5=9Ar=C3=B3dmie=C5=9Bcie_p=C3=B3=C5=82noc_sygn?=\n\t=?UTF-8?Q?=2E_2Ds=2E_137414_=2D_RSK_pracownik=C3=B3w_Skarbowych_NSZZ_Solidarno=C5=9B?=\n\t=?UTF-8?Q?=C4=87_=2D_Zarz=C4=85dzenie_o_odmowie_dopuszczenia_SOWP_do_udzia=C5=82u_w_?=\n\t=?UTF-8?Q?postepowaniu=2Epdf?=

I suggest add something like:

import email

from mailparser import mailparser
import sys

filename = sys.argv[1]

mail = mailparser.parse_from_file_obj(open(filename, 'r'))

for attachment in mail.attachments:
    bin_text, encoding = email.header.decode_header(attachment['filename'])[0]
    print(bin_text.decode(encoding))

I think that the text of the file name should be returned, not the text of the raw header containing the file name.

@fedelemantuano
Copy link
Contributor

Very good point. There is a function in mailparser to do that, but I used it only for header:
https://github.com/SpamScope/mail-parser/blob/develop/mailparser/utils.py#L92
Thank you for notify me this bug.
I'm releasing a new version where your test filename now is:

Prokuratura Rejonowa Warszawa-Śródmieście północ sygn. 2Ds. 137414 - RSK pracowników Skarbowych NSZZ Solidarność - Zarządzenie o odmowie dopuszczenia SOWP do udziału w postepowaniu.pdf

fedelemantuano added a commit that referenced this issue Mar 5, 2018
@fedelemantuano
Copy link
Contributor

fedelemantuano added a commit that referenced this issue Mar 5, 2018
@ad-m
Copy link
Contributor Author

ad-m commented Mar 5, 2018

@fedelemantuano , thank you!

CC: @AgnieszkaZdanowicz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants