-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BCP based load to TPROC-H Schema for SQL Server #593
Comments
Hello, I work on the same team as Krithika Satish and I am using those PRs as templates to alter the TPROC-H loading performance. The template from those PRs are working well for this load process to with one issue. The address field for the supplier and customer tables is created with this command: This command will allow addresses to be created with commas in it. Which causes issues with BCP. The current solution we have is to run the command: |
Hi, We shouldn't change the data as that is what is derived from the TPC-H specification. However, I would be really surprised if BCP didn't work with commas in it, as I would think this would be a fairly typical requirement. Looking at the documentation: https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-ver16&redirectedfrom=MSDN It looks like the "-t" flag is used to set the delimiter and so far for TPROC-C we are setting a delimiter as a comma i.e. -t ","
Therefore, for data with commas in it be able to change the delimiter for something else. As an example of what do see the commit here: a0e96c0 In the procedures creating the data you could set a variable called delimiter and then do upvar 2 in the bcp procedure to reference the setting for a particular table being loaded and then this delimiter would be used in the bcp procedure as follows:
To be sure it is set you could also use the following so if delimiter is not set then a comma is used by default. upvar 2 delimiter delimiter For manual bulk loads I have previously used a pipe delimiter "|" https://www.hammerdb.com/docs/ch13s04.html |
Would it make sense instead to use pipes '|' for all temporary files in the TPCH load? That would simplify the codebase a little bit. |
The bcp utility has some restrictions to date formatting that are causing an issue. The function Currently I made a copy on the function If this is an adequate solution, does this new function belong in tpchcommon-1.0.tm or in mssqlsolap.tcl as this is the only place in the codebase where this function is needed? |
Yes, certainly if this is easier for you, then you can use a pipe delimiter for TPROC-H. A comma was only used as that is what was used for the TPROC-C BCP build. |
Good question and solution. Let's keep it in tpchcommon-1.0.tm so it is next to the original function for reference and maybe just add a very brief comment above why it is needed for a reminder in future. |
Pull Requests #587 and #592 have added a BCP based load to the TPROC-C schema that improves loading performance. The same approach could be added to the TPROC-H schema to also improve performance there, using these pull requests as a template for the changes that need to be added.
The text was updated successfully, but these errors were encountered: