Always use iconv, don't try to start it. #33

Merged
merged 1 commit into from Nov 25, 2013

Conversation

Projects
None yet
4 participants
Collaborator

mworrell commented Feb 19, 2013

This enables a choice between iconv implementations.

Zotonic is using the eiconv NIF as a R15 compatible iconv implementation.

/cc @mmzeeman

Lol someone else made another iconv nif.

Collaborator

mworrell commented Feb 19, 2013

Yes, looks like you both started around the same time.

Not a big deal. Just a bit silly.

Collaborator

mworrell commented Feb 19, 2013

@mmzeeman is checking your implementation, it was unknown to him. We might decide to go with yours when it looks good to him :)

Collaborator

mworrell commented Feb 19, 2013

In that case I will change this pull request to use iconverl

Contributor

mmzeeman commented Feb 19, 2013

Looks like your version can encode and decode in chunks. Like it.

I understand. I guess it would be nice to standardize on the same code
since we both run this in production.

On Tue, Feb 19, 2013 at 2:41 PM, Maas-Maarten Zeeman <
notifications@github.com> wrote:

Looks like your version can encode and decode in chunks. Like it.


Reply to this email directly or view it on GitHubhttps://github.com/Vagabond/gen_smtp/pull/33#issuecomment-13793598.

Sounds good.

Owner

Vagabond commented Feb 19, 2013

Should we just pick an iconv implementation and just require that? I'm happy to ditch mine, since I don't have the time to fix it.

Contributor

mmzeeman commented Feb 19, 2013

When I started writing the nif I was amazed there wasn't an agreed upon api for encoders and decoders like python has. Yesterday evening I did some speed testing of my nif and came to the conclusion that you have to keep the chunk size around 128 kb at max. Otherwise the processing takes too long.

I absolutely agree. I did set a very high limit (128MiB) as to not get in the way of the programmer. Also because it is quite hard to split raw binary and feed it to iconv because you won't know where the character boundaries will be for some encodings. Ideally it should be possible to so streaming by calling using a continuation. Hopefully without requiring a resource only Erlang code. I will look into adding support for it in iconverl if you guys think it could be useful to you.

I guess this would probably make everyone happy?

By the way I added the iconv interface.
edescourtis/iconverl@e35783d

Updated the link.

Contributor

mmzeeman commented Feb 20, 2013

Indeed, you need to wrap iconv's c-struct and place it in #cd. Sort of what my nif does after open. Maybe we could work together on this?

Yes I agree completely lets work on it together.

I started reading a little more into doing this. It looks like the iconv_t is completely opaque (because several versions exist such as libiconv and the glibc version) and because we have can have stateful characters encodings such as ISO 2022 we need to keep the iconv_t value around. Which is very unfortunate because this resource can only live on one node as a result. So it can't be as simple as storing some offsets or state. It seems like we will be working off your version since it's halfway there.

I guess we should define an interface for this thing.

I was thinking something similar to disk_log:chunk.

Random ideas:

Low level nif function might be more involved and work on fixed size chunks < 128 KiB. But something like this might work at the lowest level exposed to Erlang applications.

I guess we could have a set of helpers for people who don't care to get involved with a complex call.

Contributor

mmzeeman commented Feb 20, 2013

Yes, it definitely needs helper functions. It is indeed handy to have a
low level nif api, and then use erlang on top of it to make it
convenient to use.

I was also peeking at parsetools:token for interface ideas, the idea is
roughly the same. And looking at how other languages solve this (python
codecs, http://docs.python.org/2/library/codecs.html).

Maybe the interface can become a more generic interface for encoding
decoding. And maybe even chain them. I'm thinking about unzipping and
mapping from gb2312 to utf in one go. That maybe a bit to fancy for now
though.

Maas

Random ideas:

Low level nif function might be more involved and work on fixed size
chunks < 128 KiB. But something like this might work at the lowest level
exposed to Erlang applications.

chunk(Cd, TextFoldFun, Continuation) -> chunk_ret()

Cd = cd()
TextFoldFun = function(Acc1) -> {Acc2, iodata()}
Acc1 = term()
Acc2 = term()
Continuation = start | continuation()
continuation = {InputOffset, Rest}
chunk_ret() = {Continuation2 :: continuation, ConvertedText}
Rest = iodata()
ConvertedText = iodata()

I guess we could have a set of helpers for people who don't care to get
involved with a complex call like this.


Reply to this email directly or view it on GitHub
#33 (comment).

https://github.com/edescourtis/iconverl/tree/streaming

I have the low level access to iconverl for streaming now.

Still will need lots of work before it's easy to use.

Collaborator

mworrell commented Nov 25, 2013

As iconverl has now the same API I propose to merge this.

All ok?

mworrell merged commit 01f7aa6 into Vagabond:master Nov 25, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment