Skip to content

Conversation

craigmcnamara
Copy link
Contributor

Allows unlimited downloads of CSV data using ActiveRecord batch APIs and ActionController response streaming.

Works on Rails 3.2+

Craig McNamara added 3 commits April 1, 2014 23:48
Allows unlimited downloads of CSV data using ActiveRecord batch APIs and ActionController response streaming.

Works on Rails 3.2+
@romansklenar
Copy link
Contributor

👍 Looks great!

@craigmcnamara
Copy link
Contributor Author

Ok, I'm finished with this pull for real. I just downloaded 650,000+ records from one of my production systems. The stream starts right away now that I have the right response headers in place.

@seanlinsley
Copy link
Contributor

One problem I noticed with this: if I manually stop the download, Rails just keeps chugging away generating that CSV.

@craigmcnamara
Copy link
Contributor Author

Do you know how to detect in Rails if a client disconnects? For the sake of enabling business analysts to work with ActiveAdmin that's a risk that I think is reasonable. I'm going to see what can be done about that, but is that the only barrier to merging?

@craigmcnamara
Copy link
Contributor Author

After reading rack and rails I don't know that there is a way to detect a cancelled download and stop the response. I could add some config options to enforce a max_csv_result if you're worried about creating a URL of death.

@zorab47
Copy link
Contributor

zorab47 commented Apr 8, 2014

Another trade-off to note is Rails' batch finders, like find_each, always return records sorted by the table's primary key. I think it is worthwhile to allow larger CSV files with smaller memory requirements. Although, it may not be clear to some users why the CSV isn't rendering with the same sort order as its corresponding view.

@joallard
Copy link

joallard commented Apr 9, 2014

Much related to #2449

But wow, thanks so much for implementing this.

@joallard
Copy link

I still get timed out in Heroku for large batches (>30k). Is Transfer-Encoding: chunked set/needed?

Edit: Just tried that, and I got an empty response.

@craigmcnamara
Copy link
Contributor Author

@joallard Is your query eager loading things? I was able to get a response of 650k records with no problem. What web server are you running on? Is there some kind of response buffering on?

@joallard
Copy link

I found it: it wasn't an H12 Heroku timeout, but rather the timeout in Unicorn. I added timeout 600 to my config/unicorn.rb.

Works like a charm!

@joallard
Copy link

What is the status? Is the issue that's been raised solvable?

@craigmcnamara
Copy link
Contributor Author

I'm not clear on what needs to be done to get this merged.

@codejoust
Copy link

Would be great if this gets merged.

@craigmcnamara
Copy link
Contributor Author

@seanlinsley Is there something I can do to make this more acceptable? Perhaps a setting to limit max_records?

@seanlinsley
Copy link
Contributor

This can't be cleanly merged because of #3097. I resolved the conflict locally with these changes:

$ git diff
diff --git a/lib/active_admin/resource_controller/streaming.rb b/lib/active_admin/resource_controller/streaming.rb
index c07dffa..ba99b0f 100644
--- a/lib/active_admin/resource_controller/streaming.rb
+++ b/lib/active_admin/resource_controller/streaming.rb
@@ -20,9 +20,15 @@ module ActiveAdmin

         protected

-        def csv_line(resource, columns)
+        def csv_line(resource, columns, options)
           columns.map do |column|
-            call_method_or_proc_on resource, column.data
+            s = call_method_or_proc_on resource, column.data
+
+            if options[:encoding] && s.respond_to?(:encode!)
+              s.encode! options[:encoding], options[:encoding_options]
+            else
+              s
+            end
           end
         end

@@ -37,7 +43,7 @@ module ActiveAdmin
           self.response_body = Enumerator.new do |csv|
             csv << CSV.generate_line(columns.map(&:name), options)
             collection.find_each do |resource|
-              csv << CSV.generate_line(csv_line(resource, columns), options)
+              csv << CSV.generate_line(csv_line(resource, columns, options), options)
             end
           end
         end

@seanlinsley seanlinsley merged commit 29b5019 into activeadmin:master May 23, 2014
seanlinsley added a commit that referenced this pull request May 23, 2014
Conflicts:
	app/views/active_admin/resource/index.csv.erb
@seanlinsley
Copy link
Contributor

It would be nice to be able to both set a hard limit as a developer, & choose to download only a certain number of records as a user (#2626), but this PR is a great improvement on its own. Thanks so much for this ❤️

@brandnewmedia
Copy link

+1 thank you!

@shekibobo
Copy link
Contributor

🍻

@tychio
Copy link

tychio commented Jun 25, 2014

+1, when to release on RubyGems?

@kushkella
Copy link

I just migrated from ActiveAdmin 0.6 to master branch. Although this CSV export is really good, but it assumes that the primary key will be included in the collection. In my current setup I have customised the collection in a way that primary key cannot be included. Is there a way we can use pagination scopes to get the same effect as find_each. And use the config.sort_order with the pagination.

@seanlinsley
Copy link
Contributor

@kushkella off the top of my head I'm not sure what might work. Would you mind sharing the code / explaining why you can't include the primary key?

@seanlinsley
Copy link
Contributor

Also instead of commenting on a merged pull request it would be better to create a new ticket.

@kushkella
Copy link

@seanlinsley sorry for commenting on the old pull request. Please look at the new issue and let me know if there is a way to solve this problem. #3269

@jhblacklock
Copy link

Loving this commit! It works great locally; cool idea. Starts streaming the file instead of serving the csv from the web server.

For production on heroku I get H12 errors trying to download around 10K records with children.

Are there any reasons this would not work on Heroku currently? I'll be happy to do any research and provide any other info about my implementation.

activeadmin (1.0.0.pre)
Rails 4.1.0

@joallard
Copy link

joallard commented Aug 4, 2015

@jhblacklock Sounds like the timeout value in Unicorn. That, or stop having children download your CSVs and have it done by an adult!

@jhblacklock
Copy link

@joallard That is definitely a problem of mine (children) but they don't seem to be the issue I'm having with streaming CSVs ^_^.

My unicorn timeout value is hardcoded at 600. I also run unicorn local and do not experience any issues.

The initial web request hits in 10ms but the server doesn't initiate the download before the server times out in 30 seconds.

2015-08-05T22:07:00.792698+00:00 app[web.1]: Started GET "/admin/newcomers.csv" for 97.77.148.58 at 2015-08-05 22:07:00 +0000
2015-08-05T22:07:00.852062+00:00 app[web.1]: Completed 200 OK in 10ms (ActiveRecord: 4.1ms)
2015-08-05T22:07:00.837900+00:00 app[web.1]: Processing by Admin::NewcomersController#index as CSV
2015-08-05T22:07:30.774222+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=GET path="/admin/newcomers.csv" host=admin-ttr-challenge.herokuapp.com request_id=cf4050f3-f162-4717-b64a-205f0be0fb07 fwd="97.77.148.58" dyno=web.1 connect=1ms service=30000ms status=503 bytes=0

My csv block is lengthy but I wouldn't think the timeout would occur on the first iteration.

csv do
    [:first_name, :last_name, :num_of_children, :email].each do |col|
      column col
    end
    column(:event_name) { |newcomer| newcomer.event.name if newcomer.event }
    column(:event_date) { |newcomer| newcomer.event.starts_time if newcomer.event }
    column(:business_name) { |newcomer| newcomer.event.user.business_name if newcomer.event && newcomer.event.user }
    column(:host_first_name) { |newcomer| newcomer.event.user.first_name if newcomer.event && newcomer.event.user }
    column(:host_last_name) { |newcomer| newcomer.event.user.last_name if newcomer.event && newcomer.event.user }
    column(:newcomer_address_name) { |newcomer| newcomer.address.name if newcomer.address }
    column(:newcomer_address_one) { |newcomer| newcomer.address.address_one if newcomer.address }
    column(:newcomer_address_two) { |newcomer| newcomer.address.address_two if newcomer.address }
    column(:newcomer_city) { |newcomer| newcomer.address.city if newcomer.address }
    column(:newcomer_state) { |newcomer| newcomer.address.state if newcomer.address }
    column(:newcomer_zip) { |newcomer| newcomer.address.zip if newcomer.address }
    column(:children) { |newcomer| newcomer.children.pluck(:name, :gender, :age).reject(&:blank?).join(", ") if newcomer.children }
  end

@jhblacklock
Copy link

Yar. I feel sheepish. 😊

My Procfile was not registering with the heroku deploy so it was defaulting to webrick. One process = not working.

Thanks for everything!

@joallard
Copy link

Methinks we should document this issue seeing how frequent it is

@seanlinsley
Copy link
Contributor

It's also important to mention that if an exception occurs during the stream, the stack trace will be appended to the resulting file, but the request will only end after your web server times out. That's not necessarily a problem in production (and it might be very hard to fix), but in development it can make debugging much more difficult. Which is why I created #3535

BTW @jhblacklock you can use syntax highlighting to make your code easier to read: https://guides.github.com/features/mastering-markdown/#GitHub-flavored-markdown

@seanlinsley
Copy link
Contributor

Actually, it is a problem in production, because the exception doesn't bubble up. So the only way you'll find out is by the users of your app reporting that their CSV exports don't include all the records they should.

@Fivell
Copy link
Member

Fivell commented Jun 13, 2016

@seanlinsley , I faced with this problem, have no logs to find out why csv was not downloaded.

@haraldkrischner
Copy link

This PR might be old, but we are running into the same issue as @seanlinsley , so is there any approch to it?

@luiszacheu
Copy link

I downloaded active admin 1.0.0.pre4, but would like to know how to implement the export CSV streaming? Tks

@ManuelKu1223
Copy link

@seanlinsley Thanks for your implementing. but I have some questions.
I had built the Activeadmin with rails and I have 20k records. I hope to download this with CSV. But this happens the timeout error. I deployed this on Heroku. How can I fix this?
Please help me.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.