Better way to skip extraneous lines at the start? #890

rorourke-iot · 2018-01-03T15:42:12Z

I'm doing this example in LINQPad. CsvHelper is pulled in as a Nuget package with the appropriate namespaces added to the code.

I have a set of files I'm reading. Example content included in code sample. The files contain data at the front of which should be skipped. I was using a custom CSV processor which allowed me to specify how many lines to skip before starting processing on the file. Using CsvHelper, I'm trying to handle these lines. Below is my current take on it. But you can see I originally had a set of explicit reads. I'd initially thought about seeing if the Read method could take an int param to specify a number of consecutive reads (this might still be a good idea, but seems unnecessary for my specific case). I felt this was brittle as I may want to have other comments or content at the top of the files.

I could also comment the "version" line, but this convention is prevalent in other data files (non-CSV) used in the application. I don't want to change this without a good reason.

Is the approach below my best option for handing this content?

void Main()
{
  var data = @"; Gen 1.5 Package Repository List
; this file contains a list of packages with version and release

version=1
package,action,data
hotfix-monitorix-lighttpd-2.5.3-1.el6.noarch,remove,
monitorix-lighttpd-2.5.2-1.el6.noarch,remove,
openssl,removearch,i686
powerctl,removeolder,2.8.0-2.NS.el6
bash-4.1.2-15.el6_5.2.x86_64.rpm,install,
cmulogd-1.6.0-2.el6.pse.x86_64.rpm,install,";
  
  using (var reader = new StringReader(data))
  using (var csv = new CsvReader(reader))
  {
    csv.Configuration.RegisterClassMap<RpmRepoMap>();
    csv.Configuration.IgnoreBlankLines = true;
    csv.Configuration.AllowComments = true;
    csv.Configuration.Comment = ';';
    csv.Configuration.ShouldSkipRecord = content =>
    {
      if (content[0].StartsWith("version"))
        return true;
        
      if (content[0] == "package")
      {
        csv.ReadHeader();
        
        return true;
      }
      
      return false;
    };
    
//    csv.Read();
//    csv.Read();
//    csv.Read();
//    csv.Read();
//    csv.ReadHeader();
    csv.GetRecords<RpmRepo>().Dump();
  }
}

class RpmRepo
{
  public string Package { get; set; }
  public string Action { get; set; }
  public string Data { get; set; }
}

class RpmRepoMap : ClassMap<RpmRepo>
{
  public RpmRepoMap()
  {
    Map(m => m.Package).Name("package");
    Map(m => m.Action).Name("action");
    Map(m => m.Data).Name("data");
  }
}

The text was updated successfully, but these errors were encountered:

JoshClose · 2018-01-03T22:32:26Z

If it works for all files you'll be reading, it seems fine to me.

Here are a couple other ways you could do it.

// Skip 4 rows.
for (var i = 0; i < 4; i++) 
{
    csv.Read();
}

// Skip until version= is found.
while (csv.Read())
{
    if (csv.Context.Record[0].StartsWith("version="))
    {
        csv.Read();
        csv.ReadHeader();
        break;
    }
}

JoshClose added the question label Jan 3, 2018

JoshClose closed this as completed Jun 13, 2018

conficient mentioned this issue Jul 3, 2019

Reading from a csv with a double header line #1321

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better way to skip extraneous lines at the start? #890

Better way to skip extraneous lines at the start? #890

rorourke-iot commented Jan 3, 2018 •

edited

JoshClose commented Jan 3, 2018

Better way to skip extraneous lines at the start? #890

Better way to skip extraneous lines at the start? #890

Comments

rorourke-iot commented Jan 3, 2018 • edited

JoshClose commented Jan 3, 2018

rorourke-iot commented Jan 3, 2018 •

edited