Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Header Substring #1039

Closed
No1e opened this issue May 22, 2018 · 12 comments
Closed

Support for Header Substring #1039

No1e opened this issue May 22, 2018 · 12 comments

Comments

@No1e
Copy link

No1e commented May 22, 2018

Hello,

Is there a way to configure CSVReader to take only portion of CSV header?

I am using generic way for parsing, like this:

CsvReader csvReader = new CsvReader(reader, csvConfig)
foreach (var assetRecord in csvReader.GetRecords<dynamic>())

In the CSV file, in the header I have something like this Name1(2,3) | Name2(4,2)...

So, instead taking Name1(1,3), I would like to take part of column name up to brackets Name1.

Is this possible?

Regards,
Novak

@JoshClose
Copy link
Owner

void Main()
{
	using (var stream = new MemoryStream())
	using (var writer = new StreamWriter(stream))
	using (var reader = new StreamReader(stream))
	using (var csv = new CsvReader(reader))
	{
		writer.WriteLine("Id(1),Name(2)");
		writer.WriteLine("1,one");
		writer.WriteLine("2,two");
		writer.Flush();
		stream.Position = 0;
		
		csv.Configuration.PrepareHeaderForMatch = header => header.Substring(0, header.IndexOf("("));
		csv.GetRecords<dynamic>().ToList().Dump();
	}
}

@No1e
Copy link
Author

No1e commented May 31, 2018

Cool thanks.

Is there any way to improve performance for this? I have csv file that contains close to 100K records. And on mobile device it takes 23 for this:
csv.GetRecords().ToList()

@JoshClose
Copy link
Owner

JoshClose commented May 31, 2018

Don't use dynamic. There is an issue in to speed it up, but right now it's pretty slow. Use a normal class or just calling GetField<int>("Field") is really fast.

@No1e
Copy link
Author

No1e commented Jun 1, 2018

Ok. But the problem is that I have to handle that in generic way, so I can not use normal class not calling GetField<int>("Field") or similar.

Do you have on the road map performance improvements for this use case?

@JoshClose
Copy link
Owner

You should be able to handle reading in a generic way easier using GetField than using dynamic. Once you have the dynamic object, how are you getting the data from the properties?

@No1e
Copy link
Author

No1e commented Jun 1, 2018

The thing is that because of the requirements, the whole model is completely generic, name of the fields are not known, can be anything, whatever user puts in the CSV file. That is the reason why I am using dynamic. Later on I access each property from dynamic object, which I also do not like very much, but currently I do not have other solution.

@JoshClose
Copy link
Owner

Can you give an example of how you're getting the data out of the dynamic object? Maybe I can translate that into using CsvHelper directly to do it instead, to speed it up.

@No1e
Copy link
Author

No1e commented Jun 3, 2018

Sure I can. Here is how it looks:

foreach (var record in csvReader.GetRecords<dynamic>()) { foreach (var recordProperty in record) {

Please note that calling Get records takes approx. 23 seconds.

@JoshClose
Copy link
Owner

If your'e doing GetRecords (plural), that is pulling the entire file into memory.

Try doing it like this. It should be a lot faster.

void Main()
{
	using (var stream = new MemoryStream())
	using (var writer = new StreamWriter(stream))
	using (var reader = new StreamReader(stream))
	using (var csv = new CsvReader(reader))
	{
		writer.WriteLine("Id(1),Name(2)");
		writer.WriteLine("1,one");
		writer.WriteLine("2,two");
		writer.Flush();
		stream.Position = 0;
		
		csv.Configuration.PrepareHeaderForMatch = header => header.Substring(0, header.IndexOf("("));
		csv.Read();
		csv.ReadHeader();
		while (csv.Read())
		{
			for (var i = 0; i < csv.Context.HeaderRecord.Length; i++)
			{
				var field = csv.GetField(i);
			}
		}
	}
}

public class Test
{
	public int Id { get; set; }
	public string Name { get; set; }
}

public class TestMap : ClassMap<Test>
{
	public TestMap()
	{
		Map(m => m.Id);
		Map(m => m.Name);
	}
}

@augustoproiete
Copy link
Contributor

If you're doing GetRecords (plural), that is pulling the entire file into memory.

@JoshClose I'm probably missing something, but wouldn't the example code above, calling Read manually yourself, produce the same memory usage than calling GetRecords, given that GetRecords streams the records to the caller?

And with that said, in both cases you don't actually have the entire file into memory at any one point, as previous records get GC'ed...

@JoshClose
Copy link
Owner

Sorry, you are correct. It yields records and there is no ToList or anything.

The speed improvements come from not using dynamic.

@No1e
Copy link
Author

No1e commented Jun 18, 2018

Hi Josh.

Thanks. I give it a try and get back to you with results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants