Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readline() method #13

Closed
coreyhuinker opened this issue Feb 14, 2015 · 11 comments
Closed

readline() method #13

coreyhuinker opened this issue Feb 14, 2015 · 11 comments
Assignees

Comments

@coreyhuinker
Copy link

This would be useful for extracting the header from a csv.

@piskvorky
Copy link
Owner

I agree. Can you implement it?

@coreyatmoat
Copy link

I'll give it a shot.

@piskvorky
Copy link
Owner

@coreyatmoat what's your progress?

@coreyatmoat
Copy link

Sadly no. A while back a co-worker wrote https://gist.github.com/coreyatmoat/c961f99dfb0cfabdda54 as a sort of work around, but I haven't had time to figure out how it'd fit into your framework.

@Mgutjahr
Copy link

+1

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 10, 2016

I'm having a look at this now, since one of my projects needs this to work too :)

Is there any reason why we need to maintain independent pointers for read and __iter__? The default implementation of readline shares the file pointer with read (and other I/O operations), to the best of my knowledge.

@piskvorky
Copy link
Owner

piskvorky commented Jun 11, 2016

Thanks @mpenkov .

The reason for the "independent pointer" was related to difficulties with buffering IIRC, but was only technical (CC @ziky90 ). That behaviour is not a part of the API contract (I don't think it's even documented).

If we can make a single pointer work, that's fine, even preferable.

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 11, 2016

@piskvorky OK. I think I've got it working. Have a look here: https://github.com/mpenkov/smart_open/commit/aca3a18358afb5f42b5f7e5c1355e8ff93a9bcea

I made a separate branch (readline) for it, since an open pull request (#75) is currently using the clone's master. This branch also contains the gzip changes.

What do you think is the cleanest way of getting this into your repo? The simplest I can think of:

  1. Close Resolve issue #12 (unable to read gz from S3) #75
  2. Open a new pull request from my new readline branch

Another way is:

  1. You guys merge Resolve issue #12 (unable to read gz from S3) #75 whenever you're ready
  2. I update my clone
  3. I make a new pull request for the readline stuff

Let me know if there's a simpler or more convenient way.

@piskvorky
Copy link
Owner

Great stuff! @tmylk please review.

Re. branches: it's your call. If splitting the changes into two separate PRs is too complicated, doing both in one branch is fine with me. We appreciate your work, this is some much needed functionality.

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 12, 2016

@piskvorky @tmylk I've authored a request to pull from my readline to your master (#76). Let me know if you need anything else from me.

tmylk pushed a commit that referenced this issue Jun 27, 2016
- Bundle gzipstream to enable streaming of gzipped content from S3
- Update gzipstream to avoid deep recursion
- Implement readline for S3
- Add pip requirements.txt
@mpenkov
Copy link
Collaborator

mpenkov commented Jul 25, 2016

@piskvorky @tmylk I think we can close this now. 78c461e resolved this.

@tmylk tmylk closed this as completed Sep 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants