Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements streaming CSVs. #3038

Merged
merged 5 commits into from May 23, 2014

Conversation

@craigmcnamara
Copy link
Contributor

commented Apr 2, 2014

Allows unlimited downloads of CSV data using ActiveRecord batch APIs and ActionController response streaming.

Works on Rails 3.2+

Craig McNamara added 3 commits Apr 2, 2014
Craig McNamara
Implements streaming CSVs.
Allows unlimited downloads of CSV data using ActiveRecord batch APIs and ActionController response streaming.

Works on Rails 3.2+
@romansklenar

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2014

👍 Looks great!

Craig McNamara added 2 commits Apr 3, 2014
@craigmcnamara

This comment has been minimized.

Copy link
Contributor Author

commented Apr 3, 2014

Ok, I'm finished with this pull for real. I just downloaded 650,000+ records from one of my production systems. The stream starts right away now that I have the right response headers in place.

@seanlinsley

This comment has been minimized.

Copy link
Member

commented Apr 8, 2014

One problem I noticed with this: if I manually stop the download, Rails just keeps chugging away generating that CSV.

@craigmcnamara

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2014

Do you know how to detect in Rails if a client disconnects? For the sake of enabling business analysts to work with ActiveAdmin that's a risk that I think is reasonable. I'm going to see what can be done about that, but is that the only barrier to merging?

@craigmcnamara

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2014

After reading rack and rails I don't know that there is a way to detect a cancelled download and stop the response. I could add some config options to enforce a max_csv_result if you're worried about creating a URL of death.

@zorab47

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2014

Another trade-off to note is Rails' batch finders, like find_each, always return records sorted by the table's primary key. I think it is worthwhile to allow larger CSV files with smaller memory requirements. Although, it may not be clear to some users why the CSV isn't rendering with the same sort order as its corresponding view.

@joallard

This comment has been minimized.

Copy link

commented Apr 9, 2014

Much related to #2449

But wow, thanks so much for implementing this.

@joallard

This comment has been minimized.

Copy link

commented Apr 14, 2014

I still get timed out in Heroku for large batches (>30k). Is Transfer-Encoding: chunked set/needed?

Edit: Just tried that, and I got an empty response.

@craigmcnamara

This comment has been minimized.

Copy link
Contributor Author

commented Apr 15, 2014

@joallard Is your query eager loading things? I was able to get a response of 650k records with no problem. What web server are you running on? Is there some kind of response buffering on?

@joallard

This comment has been minimized.

Copy link

commented Apr 16, 2014

I found it: it wasn't an H12 Heroku timeout, but rather the timeout in Unicorn. I added timeout 600 to my config/unicorn.rb.

Works like a charm!

@joallard

This comment has been minimized.

Copy link

commented Apr 23, 2014

What is the status? Is the issue that's been raised solvable?

@craigmcnamara

This comment has been minimized.

Copy link
Contributor Author

commented Apr 28, 2014

I'm not clear on what needs to be done to get this merged.

@codejoust

This comment has been minimized.

Copy link

commented May 5, 2014

Would be great if this gets merged.

@craigmcnamara

This comment has been minimized.

Copy link
Contributor Author

commented May 20, 2014

@seanlinsley Is there something I can do to make this more acceptable? Perhaps a setting to limit max_records?

@seanlinsley

This comment has been minimized.

Copy link
Member

commented May 23, 2014

This can't be cleanly merged because of #3097. I resolved the conflict locally with these changes:

$ git diff
diff --git a/lib/active_admin/resource_controller/streaming.rb b/lib/active_admin/resource_controller/streaming.rb
index c07dffa..ba99b0f 100644
--- a/lib/active_admin/resource_controller/streaming.rb
+++ b/lib/active_admin/resource_controller/streaming.rb
@@ -20,9 +20,15 @@ module ActiveAdmin

         protected

-        def csv_line(resource, columns)
+        def csv_line(resource, columns, options)
           columns.map do |column|
-            call_method_or_proc_on resource, column.data
+            s = call_method_or_proc_on resource, column.data
+
+            if options[:encoding] && s.respond_to?(:encode!)
+              s.encode! options[:encoding], options[:encoding_options]
+            else
+              s
+            end
           end
         end

@@ -37,7 +43,7 @@ module ActiveAdmin
           self.response_body = Enumerator.new do |csv|
             csv << CSV.generate_line(columns.map(&:name), options)
             collection.find_each do |resource|
-              csv << CSV.generate_line(csv_line(resource, columns), options)
+              csv << CSV.generate_line(csv_line(resource, columns, options), options)
             end
           end
         end

@seanlinsley seanlinsley merged commit 29b5019 into activeadmin:master May 23, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
seanlinsley added a commit that referenced this pull request May 23, 2014
Merge pull request #3038
Conflicts:
	app/views/active_admin/resource/index.csv.erb
@seanlinsley

This comment has been minimized.

Copy link
Member

commented May 23, 2014

It would be nice to be able to both set a hard limit as a developer, & choose to download only a certain number of records as a user (#2626), but this PR is a great improvement on its own. Thanks so much for this ❤️

@brandnewmedia

This comment has been minimized.

Copy link

commented May 23, 2014

+1 thank you!

@shekibobo

This comment has been minimized.

Copy link
Contributor

commented May 23, 2014

🍻

@tychio

This comment has been minimized.

Copy link

commented Jun 25, 2014

+1, when to release on RubyGems?

@kushkella

This comment has been minimized.

Copy link

commented Jul 9, 2014

I just migrated from ActiveAdmin 0.6 to master branch. Although this CSV export is really good, but it assumes that the primary key will be included in the collection. In my current setup I have customised the collection in a way that primary key cannot be included. Is there a way we can use pagination scopes to get the same effect as find_each. And use the config.sort_order with the pagination.

@seanlinsley

This comment has been minimized.

Copy link
Member

commented Jul 10, 2014

@kushkella off the top of my head I'm not sure what might work. Would you mind sharing the code / explaining why you can't include the primary key?

@seanlinsley

This comment has been minimized.

Copy link
Member

commented Jul 10, 2014

Also instead of commenting on a merged pull request it would be better to create a new ticket.

@kushkella

This comment has been minimized.

Copy link

commented Jul 10, 2014

@seanlinsley sorry for commenting on the old pull request. Please look at the new issue and let me know if there is a way to solve this problem. #3269

@jhblacklock

This comment has been minimized.

Copy link

commented Aug 3, 2015

Loving this commit! It works great locally; cool idea. Starts streaming the file instead of serving the csv from the web server.

For production on heroku I get H12 errors trying to download around 10K records with children.

Are there any reasons this would not work on Heroku currently? I'll be happy to do any research and provide any other info about my implementation.

activeadmin (1.0.0.pre)
Rails 4.1.0

@joallard

This comment has been minimized.

Copy link

commented Aug 4, 2015

@jhblacklock Sounds like the timeout value in Unicorn. That, or stop having children download your CSVs and have it done by an adult!

@jhblacklock

This comment has been minimized.

Copy link

commented Aug 5, 2015

@joallard That is definitely a problem of mine (children) but they don't seem to be the issue I'm having with streaming CSVs ^_^.

My unicorn timeout value is hardcoded at 600. I also run unicorn local and do not experience any issues.

The initial web request hits in 10ms but the server doesn't initiate the download before the server times out in 30 seconds.

2015-08-05T22:07:00.792698+00:00 app[web.1]: Started GET "/admin/newcomers.csv" for 97.77.148.58 at 2015-08-05 22:07:00 +0000
2015-08-05T22:07:00.852062+00:00 app[web.1]: Completed 200 OK in 10ms (ActiveRecord: 4.1ms)
2015-08-05T22:07:00.837900+00:00 app[web.1]: Processing by Admin::NewcomersController#index as CSV
2015-08-05T22:07:30.774222+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=GET path="/admin/newcomers.csv" host=admin-ttr-challenge.herokuapp.com request_id=cf4050f3-f162-4717-b64a-205f0be0fb07 fwd="97.77.148.58" dyno=web.1 connect=1ms service=30000ms status=503 bytes=0

My csv block is lengthy but I wouldn't think the timeout would occur on the first iteration.

csv do
    [:first_name, :last_name, :num_of_children, :email].each do |col|
      column col
    end
    column(:event_name) { |newcomer| newcomer.event.name if newcomer.event }
    column(:event_date) { |newcomer| newcomer.event.starts_time if newcomer.event }
    column(:business_name) { |newcomer| newcomer.event.user.business_name if newcomer.event && newcomer.event.user }
    column(:host_first_name) { |newcomer| newcomer.event.user.first_name if newcomer.event && newcomer.event.user }
    column(:host_last_name) { |newcomer| newcomer.event.user.last_name if newcomer.event && newcomer.event.user }
    column(:newcomer_address_name) { |newcomer| newcomer.address.name if newcomer.address }
    column(:newcomer_address_one) { |newcomer| newcomer.address.address_one if newcomer.address }
    column(:newcomer_address_two) { |newcomer| newcomer.address.address_two if newcomer.address }
    column(:newcomer_city) { |newcomer| newcomer.address.city if newcomer.address }
    column(:newcomer_state) { |newcomer| newcomer.address.state if newcomer.address }
    column(:newcomer_zip) { |newcomer| newcomer.address.zip if newcomer.address }
    column(:children) { |newcomer| newcomer.children.pluck(:name, :gender, :age).reject(&:blank?).join(", ") if newcomer.children }
  end
@jhblacklock

This comment has been minimized.

Copy link

commented Aug 18, 2015

Yar. I feel sheepish. 😊

My Procfile was not registering with the heroku deploy so it was defaulting to webrick. One process = not working.

Thanks for everything!

@joallard

This comment has been minimized.

Copy link

commented Aug 19, 2015

Methinks we should document this issue seeing how frequent it is

@seanlinsley

This comment has been minimized.

Copy link
Member

commented Sep 1, 2015

It's also important to mention that if an exception occurs during the stream, the stack trace will be appended to the resulting file, but the request will only end after your web server times out. That's not necessarily a problem in production (and it might be very hard to fix), but in development it can make debugging much more difficult. Which is why I created #3535

BTW @jhblacklock you can use syntax highlighting to make your code easier to read: https://guides.github.com/features/mastering-markdown/#GitHub-flavored-markdown

@seanlinsley

This comment has been minimized.

Copy link
Member

commented Sep 1, 2015

Actually, it is a problem in production, because the exception doesn't bubble up. So the only way you'll find out is by the users of your app reporting that their CSV exports don't include all the records they should.

@Fivell

This comment has been minimized.

Copy link
Member

commented Jun 13, 2016

@seanlinsley , I faced with this problem, have no logs to find out why csv was not downloaded.

@haraldkrischner

This comment has been minimized.

Copy link

commented Sep 1, 2016

This PR might be old, but we are running into the same issue as @seanlinsley , so is there any approch to it?

@luiszacheu

This comment has been minimized.

Copy link

commented Oct 31, 2016

I downloaded active admin 1.0.0.pre4, but would like to know how to implement the export CSV streaming? Tks

@lucky1223

This comment has been minimized.

Copy link

commented Jul 4, 2019

@seanlinsley Thanks for your implementing. but I have some questions.
I had built the Activeadmin with rails and I have 20k records. I hope to download this with CSV. But this happens the timeout error. I deployed this on Heroku. How can I fix this?
Please help me.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.