Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't read numeric fields (again) #26

Closed
ghost opened this issue Apr 6, 2016 · 6 comments
Closed

can't read numeric fields (again) #26

ghost opened this issue Apr 6, 2016 · 6 comments

Comments

@ghost
Copy link

ghost commented Apr 6, 2016

My data table is recorded in SQL Server 2008 database and it contains only the types listed below:

  • INT
  • DECIMAL(10,5)
  • VARCHAR(150)

However, when trying to run the script below an error occurs with the type of data (numerical). I solved the problem inefficiently turning all the table fields in char type, which does not allow the best performance rsqlserver package (see about the performance, according to the test here for the numerical data rsqlserver package is much faster, https://github.com/agstudy/rsqlserver/wiki/benchmarking).

Although I have researched a lot about, I found nothing about how to resolve this problem, do not even exist comments for the issue #22, and I believe it is a similar problem to mine.

So I would like to correct this error and gain in performance.

My code:

Pacakge

library(rsqlserver)     

Driver

drv  <- dbDriver("SqlServer")   

My connection

conn <- dbConnect(drv, 
                  host     = "my_ip",
                  dbname   = "mydb", 
                  user     = "my_user", 
                  password = "my_pass") 

Data reading attempt

df <- dbGetQuery(conn, "select * FROM my_table")

Error message

Error in (function ()  : 
  clr_obj_ms_convert_to_SEXP: COM variant type code 14 unsupported. Returning NULL
Error in clrCall(res.Dict, "get_Item", Cnames[i]) : 
  Type:    RDotNet.EvaluationException
Message: Error in (function ()  : 
  clr_obj_ms_convert_to_SEXP: COM variant type code 14 unsupported. Returning NULL

Method:  RDotNet.ProtectedPointer evaluateCall(IntPtr)
Stack trace:
   at RDotNet.Function.evaluateCall(IntPtr call)
   at RDotNet.Function.createCallAndEvaluate(IntPtr argument)
   at RDotNet.Function.InvokeOrderedArguments(SymbolicExpression[] args)
   at Rclr.RDotNetDataConverter.ConvertObject(Object obj) in c:\Users\per202\AppData\Local\Temp\Rtmp8UJrfL\R.INSTALL23944e8d7aab\rClr\src\ClrFacade\RDotNetDataConverter.cs:line 558
   at Rclr.RDotNetDataConverter.ConvertToSexp(Object obj) in c:\Users\per202\AppData\Local\Temp\Rtmp8UJrfL\R.INSTALL23944e8d7aab\rClr\src\ClrFacade\RDotNetDataConverter.cs:line 368
   at Rclr.RDotNetDataConverter.ConvertAll(Object[] objects, Func`2 converter) in c:\Users\per202\AppData\Local\Temp\Rtmp8UJrfL\R.INSTALL23944e8d7aab\rClr\src\ClrFacade\RDotNetDataConverter.cs:line 387
   at Rclr.RDotNetD 
@ghost ghost changed the title can't read numeric fields (again) #26 can't read numeric fields (again) Apr 6, 2016
@wibeasley
Copy link
Contributor

Hi @davidfjr3, looking through the rClr.c code, the limitation appears to be in the clr_obj_ms_convert_to_SEXP function. (Sorry I don't know how to reference line numbers with codeplex's web interface.)

It looks like the default clause is catching your condition, which makes sense because I don't see the numeric/decimal condition supported.

Two questions

  1. Are you sure you want SQL Server's numeric data type in R? Notice it's a very different animal than R's numeric data type. The equivalent data type in SQL Server is float. The round() function used in the benchmarking vignette you referenced is producing the equivalent of a SQL Server float.
  2. If the answer is yes above, I guess I don't understand how you're supporting it in R that's not already in a character format. Could

@ghost
Copy link
Author

ghost commented Apr 6, 2016

Hi @wibeasley,

I really believed that to read the data of the SQL database server in numeric format would be as fast as we could see in benchmarking vignette.

Your tip to convert the fileds char into a float type worked very well .. and from there reading the numerical data from SQL Server to the R reduced the reading time by 65%.

Thank you for your very helpful participation.

@wibeasley
Copy link
Contributor

@davidfjr3, no problem. I've really benefited from this package, and I'm happy to help.

Is this field representing money or something that needs to be extremely precise? If not, consider keeping the data types as floats in the database.

Did you loose any/much precision? Theoretically if the values aren't well represented in powers-of-two. If it's a concern, consider uploading both versions to the same table and subtract the difference (as DECIMAL/NUMERIC types) to see if you're losing anything.

Like you, I'm surprised there was a substantial speed difference. I had assumed that the bcp utility was mostly text based underneath. But now that you point it out, I see the vignette's vertical axes change for the horizontal facets. I'm not sure the difference is big enough for me to change from my preferred data type choice for each variable, but I'm glad I'm more aware of it.

@ghost
Copy link
Author

ghost commented Apr 7, 2016

Hi, @wibeasley

no, I should not have extreme precision.

Now, let me show a summary of the conditions of my data and show the scenario that I got this performance gain:

  • Table recorded on the SQL server: 5,070,840 lines vs 87 columns, which 78 are of the numeric type and only 9 of char type.
  • Solution 1: After transforming all fields to char type, to read my table directly from the SQL Server I got a time of 21 minutes, line code:
    • df <- dbGetQuery(conn, "select * FROM my_table")
  • Solution 2 (it is your solution to turn the numerical to float for the 78 numeric fields in my case): now, to read directly from the SQL Server my table I got a time of 7 minutes (66% more efficient), same line code:
    • df <- dbGetQuery(conn, "select * FROM my_table")
  • Solution 3 (extra): I wrote a .csv file directly from SQL server via bcp command in less than 2.5 minutes. Then to put within the R I used the function data.table::fread (1.5 minutes) .. so with this solution except the inconvenience of generating externally to the R .csv I got an approximate total time of 4 minutes (one 80% reduction in time in relation to "Solution 1" and a reduction of 42% compared to the "Solution 2").

ruaridhw pushed a commit to ruaridhw/rsqlserver that referenced this issue Dec 22, 2016
@ghost
Copy link

ghost commented Dec 27, 2017

VT_DECIMAL is the decimal type that may be missing in rClr.c. That being said, you can write a stored procedure that will automatically convert any decimal column to a float and then call it from R similar to

df=dbGetQuery(connection, "exec sp_RSqlServer_Select @table='Time_Series', @Schema='dbo'")

You also could add @where parameter as well as parameters to order the dataset, etc.

The first block related to VT_DECIMAL should be

case VT_DECIMAL:
	rVals = (double*)malloc(sizeof(double)*n);
	if (pobj->decVal.sign == 128)
		rVals[0] = -(pobj->decVal.Lo32)*pow(.1, pobj->decVal.scale);
	else
		rVals[0] = (pobj->decVal.Lo32)*pow(.1, pobj->decVal.scale);
	result = make_numeric_sexp(n, rVals);
	free(rVals);
	break;

The block related to VT_ARRAY|VT_DECIMAL might be a modest modification of the code below, but I'd have to get rsqlserver configured in Visual Studio to figure it out. I tried replacing rClr's ClrFacade.dll, but that was not enough.

case VT_ARRAY | VT_DECIMAL :
get_array_variant(pobj, &array, &n, &uBound);
rVals = (double*)malloc(sizeof(double)*n);
for(long i = 0; i < n ; i++ ) {
SafeArrayGetElement(array, &i, &(rVals[i]));
}
result = make_numeric_sexp(n, rVals);
free(rVals);
break;

@ruaridhw
Copy link
Collaborator

That's a good option if you must run the transformation in-database.

As a result of the performance comparisons above, the rsqlserver function dbBulkWrite was added to be the fastest way to dump tables with erroneous data types as character types to CSV before reading in with fread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants