Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vm和dbengine接口数据格式 #40

Closed
zhaoyiping0622 opened this issue Nov 20, 2020 · 14 comments
Closed

vm和dbengine接口数据格式 #40

zhaoyiping0622 opened this issue Nov 20, 2020 · 14 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@zhaoyiping0622
Copy link
Collaborator

zhaoyiping0622 commented Nov 20, 2020

vm传入数据:

[len(2),flag(1),null(1),string(len)]+

括号中代表字节数

dbengine返回数据

len(4)[len(2),flag(1),null(1),string(len)]+
@zhaoyiping0622 zhaoyiping0622 added the documentation Improvements or additions to documentation label Nov 20, 2020
@zhaoyiping0622
Copy link
Collaborator Author

flag表示数据类型
1 为 int
2 为 double
3 为 string
4 为 null

@EzioZz
Copy link
Contributor

EzioZz commented Nov 20, 2020

还得区分哪些是key吧。这个哪一层来做?

@zhaoyiping0622
Copy link
Collaborator Author

?我只知道可以通过获取cursor的元信息来得到这个 但是vm不知道哪个是key
我觉得你btree顺手做了吧 反正就提供一个结点的key接口就行了 还不用去查元信息

@EzioZz
Copy link
Contributor

EzioZz commented Nov 20, 2020

我对外暴露的接口就是key,data分开的。

@zhaoyiping0622
Copy link
Collaborator Author

我对外暴露的接口就是key,data分开的。

dbengine传给我的也是key data分开的啊

@EzioZz
Copy link
Contributor

EzioZz commented Nov 20, 2020

那就是dbengine来处理吧。把你传进去的数据里的key找出来。

@zhaoyiping0622
Copy link
Collaborator Author

将 dbengine 返回数据的第一个len修改为4个字节

@QingQiz
Copy link
Owner

QingQiz commented Nov 20, 2020

顺便把wiki里的更新了?

@zhaoyiping0622
Copy link
Collaborator Author

这玩意我又想改了

现在整数和浮点数都是直接字符串化以在dbengine和vm之间传递,这样我觉得会有几个问题:

  1. 内存占用大:以int为例,长度可以在1-11之间,平均占用6个字节左右,double内存占用更大,例如1/3,0.333333333有多少个3取决于精度
  2. 不利于内存对齐:因为长度是可变的
  3. 精度问题:对于double类型,一个数字有多种合法的写法,例如 1e5 1e+5 100000 100000.0 100000.0000,这些写法转换成double我不知道会不会有精度问题,另外,例如0.3333333 如果二者不统一保存到小数点后多少位,很有可能导致字符串还原到数据出现失真的情况,进而导致在去重、排序时出现误差
  4. 二者传递后需要将字符串转换为相应数据类型,会有大量的乘除法,带来开销

所以,我想将传递字符串统一为传递数据类型在内存中的布局,即:

  • 对于int,占用4个字节,分别为 x>>24 x>>16&0xff x>>8&0xff x&0xff
  • 对于double,占用8个字节,分别为 (*(unsigned long long*)&x)>>56 (*(unsigned long long*)&x)>>48&0xff 以此类推

@zhaoyiping0622
Copy link
Collaborator Author

这样的写法 可能会带来内存对齐的问题 需要研究一下

@zhaoyiping0622
Copy link
Collaborator Author

内存占用大这个可以先省略

@zhaoyiping0622
Copy link
Collaborator Author

@EzioZz @QingQiz @osmium18452 来讨论讨论

@osmium18452
Copy link
Contributor

那就是dbengine来处理吧。把你传进去的数据里的key找出来。

啊?我怎么知道哪个是key哪个是数据啊?

@osmium18452
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants