Skip to content
This repository has been archived by the owner on Nov 23, 2021. It is now read-only.

Ability to specify the model to import to. #19

Closed
weavermedia opened this issue May 24, 2014 · 8 comments
Closed

Ability to specify the model to import to. #19

weavermedia opened this issue May 24, 2014 · 8 comments

Comments

@weavermedia
Copy link

My app user must be able to import data into three different models, say Customer, Prospect and Lead. At the moment I have a MODEL_importer.rb class for each type, containing mostly the same code. The main difference is that they have different imports calls at the top of the file.

I've tried passing in the model name to import to within the params hash but it doesn't work. Maybe the imports call is run before the hash is processed?

Is there any way for me to reduce duplication and have one data_importer.rb class and pass things like class name, columns, etc in the params hash?

@gnapse
Copy link
Contributor

gnapse commented May 24, 2014

Interesting suggestion. We never had the situation of more than one model with exactly the same structure. This is currently not possible, but I don't see any problem in adapting the library to receive the model class as a parameter to the .import method call. I'll make some time next week to work on this and some of the other suggestions of yours. Once again, thanks for your contributions.

@gnapse
Copy link
Contributor

gnapse commented May 24, 2014

However, now that I think about it, there's a workaround. You can create a base DataImporter class, and then you create one subclass for each of the models:

class DataImporter < ActiveImporter::Base
  # column definitions here
end

class CustomerImporter < DataImporter
end

class ProspectImporter < DataImporter
end

class LeadImporter < DataImporter
end

Right now I'm not entirely sure that this will work, but I guess it will. At most, you'll need to include the imports ModelClass statement in each of the subclasses, but if I recall well, this is inferred from the importer class name, so it may even not be necessary.

Please try this and let me know how it went.

@weavermedia
Copy link
Author

I hit on the same workaround with subclassing. I even called my base class DataImporter :-)

It works fine this way.

Actually the way I'm using it is the base class defines all the common row_* methods for building the progress report to display to the user and the subclass files define all their own columns because they have slight naming differences.

I think it would possible to pass in the columns in the params and iterate through them in the base class but I haven't tried that yet.

After your last comment I did try leaving out the imports ModelClass line in the subclass files but I got an undefined method 'new' for nil:NilClass error on every row so I don't think the model name is inferred from the importer class name. Or maybe the DataImporter class is throwing it off because there's no Data model?

@gnapse
Copy link
Contributor

gnapse commented May 24, 2014

Well, if the column definitions vary from importer to importer, then in my opinion it makes no sense to try to achieve defining it all in a single importer, and attempt to pass the column definitions to the .import method. Column definitions are way too complex to attempt to pass them as parameters. That's what the importer DSL is for. In those cases, define each importer separately. Since column definitions are not the same, you have no code repetition problem, as I assumed initially. If the row event handlers are the same, you can always define them without repeating them, using the subclassing technique.

@weavermedia
Copy link
Author

You're right, that's what I've ended up with: common row handlers in a base class and column definitions in subclasses.

As you can see the base class holds the majority of the code so it's definitely worth doing this way.

Given that I could even omit the skip_rows_if line and just let my validations pick them up, the subclass could even end up just as a short list of column definitions. That's why I was considering passing them in as a hash from the .import call.

class CustomerImporter < DataImporter
  imports Customer

  column 'First', :first_name
  column 'Last', :last_name
  column 'Email', :email

  skip_rows_if { row['First'].blank? && row['Last'].blank? && row['Email'].blank? }
end
class DataImporter < ActiveImporter::Base

  attr_reader :rows_processed, :rows_imported, :rows_skipped, :errors, :summary

  on :import_started do
    @rows_processed = @rows_imported = @rows_skipped = 0
    @errors = @summary = ''
  end

  on :row_processed do
    @rows_processed += 1
  end

  on :row_success do
    @rows_imported += 1
  end

  on :row_error do |exception|
    @errors += "Error on row #{row_index} - #{exception.message}\n"
  end

  on :row_skipped do
    @rows_skipped += 1
  end

  on :import_finished do
    @summary += "#{@rows_processed} row(s) processed\n"
    @summary += "#{@rows_imported} row(s) imported successfully\n" 
    @summary += "#{@rows_skipped} blank row(s) skipped\n" if @rows_skipped > 0
    @summary += @errors
    # delete_s3_temp_file
    send_notification(@summary)
  end

  on :import_failed do |exception|
    send_notification("Fatal error while importing data: #{exception.message}")
  end

  private

  def send_notification(message)
    puts message
  end
end

@gnapse
Copy link
Contributor

gnapse commented May 24, 2014

After having a look at your code, allow me to make some suggestions.

First, the base importer class already keeps some counters, and it's my fault for not having documented these in any way, so it's understandable that you tried to replicate them.

Inside the code blocks, you can access the importer properties row_processed_count, row_success_count and row_error_count. And I've just noticed that the corresponding row_skipped_count is missing from the gem's code, so I will fix this during the weekend, to make it consistent, and you'll be able to avoid replicating this counting.

Also, earlier today I implemented the inheritable skip block definition, so you can already use this feature and define a single skip block in your base importer class. By the way, if you want to skip blank rows, you could define the condition without having to know the names of the columns, like show below:

class DataImporter < ActiveImporter::Base
  skip_rows_if do
    row.values.all?(&:blank?)
  end
end

This way it will work with whatever column definitions you have on any given subclass.

But even in the case where you would need to have some parametrization, you could achieve it by passing custom parameters to the importer. These can be passed to the .import method call, and you can access any custom parameter from within any code block in your importer, even those defined in superclasses. An example below:

class CustomerImporter < ActiveImporter::Base
  # required_columns parameter wrapped in a attr reader for convenience
  # and for providing a default value if the parameter was not passed
  def required_columns
    params[:required_columns] || []
  end

  skip_rows_if do
    required_columns.each do |column_name|
      return true if row[column_name].blank?
    end
    return false
  end
end

# Pass the parameter at the moment of invocation
CustomerImporter.import('file.xls', params: {
  required_columns: ['First name', 'Last name'],
})

And finally, if you would like to define these dynamic lists of required columns on each subclass, instead of having to repeat them on each invocation to .import, you could do the following:

class DataImporter < ActiveRecord::Importer
  def required_columns
    []
  end

  skip_rows_if do
    required_columns.each do |column_name|
      return true if row[column_name].blank?
    end
    return false
  end
end

class CustomerImporter < DataImporter
  def required_columns
    ['First name', 'Last name']
  end
end

class InvoiceImporter < DataImporter
  def required_columns
    ['Order number', 'Client name']
  end
end

I haven't tested any of these, but I'm pretty sure they'll work, now that skip_rows_if was made inheritable (see 94c74be).

@gnapse
Copy link
Contributor

gnapse commented May 24, 2014

By the way, would you care to close this issue if you consider it solved?

@weavermedia
Copy link
Author

Thanks for the tips. It's great to know that there are counters already defined.

When you document them you should also mention row_index which I don't think is included in the docs yet. I needed when creating the error messages.

Also I like the idea of defining required_columns but I'd have to be careful with balancing user feedback and automation. I wouldn't just skip over rows that didn't have all the required fields without warning the user afterwards in the summary.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants