Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix download and upload large file bugs #321

Merged
merged 12 commits into from
Aug 21, 2017

Conversation

gongweibao
Copy link
Collaborator

Fix #313
Fix #296

Copy link
Collaborator

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great that you take time to implemented this very important feature!

When reviewing this PR, it's a little hard for me to figure out which code is for client and which code is for server. Maybe using better package name could help (e.g., client, service).
pfsmodule provide little information: what does it do? The code is under path repo/go/filemanager, the path already tell us it's for "pfs". "module" don't have much information: a package is by definition a module.

I think for us who don't speak English natively, naming in English is very hard. But it's very very important for code clarity. Hope we can improve together in naming!

@@ -30,10 +28,16 @@ type ChunkMeta struct {
Len int64 `json:"len"`
}

// ToString pack a info tring of ChunkMeta.
func (m *ChunkMeta) ToString() string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a requirement, but could consider using the Stringer interface here: https://tour.golang.org/methods/17 .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该改名字为String()。Done。


const (
// ReadOnly means read only.
ReadOnly = os.O_RDONLY
Copy link
Collaborator

@helinwang helinwang Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of defining os.O_RDONLY, ... again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个本意是想对用户的输入做限制,确实没有必要。Done。

}

func getChunkData(m ChunkParam) (*Chunk, error) {
t := fmt.Sprintf("%s/api/v1/pfs/storage/chunks", Config.ActiveConfig.Endpoint)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better put /api/v1/pfs/storage/chunks as const, it's used in many places.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

赞。确实应该早点这么做了。Done。

}
log.V(2).Infof("diff chunkMeta:%#v\n", diffMeta)
// ColorError print red ERROR before message.
func ColorError(format string, a ...interface{}) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider changing to https://github.com/sirupsen/logrus, it supports colored [ERROR] message by default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对现在的日志改动有点大额。。。
我另外开一个PR来解决Color日志的问题?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯,这个merge之后再开PR解决吧。

// ToURLParam encodes variables to url encoding parameters.
func (p *Chunk) ToURLParam() url.Values {
func (p *ChunkParam) ToURLParam() url.Values {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a requirement, but it seems a lot of effort to write ToURLParam and ParseChunkParam manually for each type that we want to pass over the wire. If we use json, the json encoder / decoder can do these automatically for us, and always correctly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是这样的,REST API中get方法标准的做法是body中没有任何的内容,也就是通过URL本身即可完成访问。而URL中大家用的最多的还是URL Parameters,好像很少把JSON串Encode到URL Parameters中。
不知道我的这个理解是否正确?

Copy link
Collaborator

@helinwang helinwang Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

谢谢!明白了,确实需要转成URL parameters。可以考虑写一个类似于json.Marshal和json.Unmarshal的函数。通过reflection自动encode / decode一个结构体。我搜了一下没找到开源的,自己写应该不难。这个repo有些相关,但其好像多做了很多东西,可以作为参考:https://github.com/tonto/qparams
不是这个PR的merge blocker哈。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

恩。这个一直想做,后来主线业务多把这个给耽误了。
不过现在想想,没有类似JSON.Unmarshal的好的开源,我要是搞出来会不会增加一下我们PaddlePadle的Star的数目?
有点意思,周末先搞这个一下。

Copy link
Collaborator

@helinwang helinwang Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gongweibao 赞,如果想让其他人用上,放成一个小repo比较方便(写到paddle里面别人要用的话依赖太大了),这个很独立,可以考虑自己建一个repo开源这个项目:)

@@ -135,14 +130,13 @@ func RunCp(cmd *CpCmd) error {
}

if err != nil {
fmt.Printf("%#v\n", err)
//fmt.Printf("%#v\n", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No commented out code please :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

part, error := partReader.NextPart()
if error == io.EOF {
break
m, err := r.GetChunkMeta(offset, size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems m and c are both calculated based on the file that is already on disk (already uploaded to server). But I thought the purpose of chunking is to avoid uploading file to server?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetChunkMata获取的只是ChunkMeta(要实际的取数据才能计算),只传输Meta信息,不传输data信息。

}

// Read loads filedata to io.Writer.
func (f *FileHandle) Read(w io.Writer, offset, len int64) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially it's calling os. CopyN, maybe name it as CopyN is more accurate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有道理。赞。Done。
不过下边还有一个CopyN的?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不好意思!我说错了,还是Read和下面的Write合适。。。

// Open file to read ,write or read-write.
// if flag == WriteOnly or flag == ReadAndWrite, this function will
// attempt to create a sized file on remote if it does't exist.
func (f *RFileHandle) Open(path string, flag int, size int64) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little confusing that this function is called Open but it does not open a file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是想把RFileHandleFileHandle的接口保持统一。这样在调用的时候接口层感觉不到差别。RFileHandleClose接口目前也是空的。
(如果RFileHandle用的是长连接,确实会打开一个文件,现在确实没有。)
@helinwang 有什么好的命名建议没有?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFileHandle中的R是什么意思不仔细看这个类型的实现无法明白,感觉一个类型的名字还是要尽量表示清楚这个类型是干啥的。考虑用RemoteFile

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if flag == WriteOnly ||
flag == ReadAndWrite {

cmd := TouchCmd{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.OpenFile in few lines below already "touch" the file, why we still need to do "touch" here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touch的语义这里是说:如果有这个文件且大小等于指定的size则没有任何动作;否则,创建一个大小等于size的文件出来。

Copy link
Collaborator

@helinwang helinwang Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI并没有touch这个command,把touch抽象成一个cmd,用Run执行增加了理解代码的难度:现在读者需要追好多步才能明白,如果这里直接调用了CreateSizedFile,看起来就很清晰。感觉我们最好减少不必要的抽象。

另外touch作为一个特别有名的词,大家的理解好像是如果不存在就创建,而没有创建制定大小的文件的意思。复用了一个有名的词,而做的事情又不一样,可能让可读性降低了。

我看到了还有remoteTouch,是不是现在上传一个1G的文件要remote touch一下,然后在一块一块的传上去?
另一个可以考虑的实现方式是直接上传,服务器收到请求的时候直接seed到对应的offset写到临时文件夹下面的文件里,写完了再mv到指定地址。貌似不需要先创建好整个文件,再往里面写?(传到临时文件夹是为了防止上传其实失败了,文件仍然在那里,用户看到了不小心用了坏的文件。)
步骤越多就有越多的东西需要考虑,先touch一下是不是不一定必要?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

赞这个

CLI并没有touch这个command,把touch抽象成一个cmd,用Run执行增加了理解代码的难度:现在读者需要追好多步才能明白,如果这里直接调用了CreateSizedFile,看起来就很清晰。感觉我们最好减少不必要的抽象。
另外touch作为一个特别有名的词,大家的理解好像是如果不存在就创建,而没有创建制定大小的文件的意思。复用了一个有名的词,而做的事情又不一样,可能让可读性降低了。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另一个可以考虑的实现方式是直接上传,服务器收到请求的时候直接seed到对应的offset

这里有两个原因:
1.文件创建完以后seek到文件大小之外是不行的,越界了。
2.这样的好处是可以多线程并发。

写到临时文件夹下面的文件里,写完了再mv到指定地址。

放到临时文件夹中是一个好方法。我记录一下。

@gongweibao
Copy link
Collaborator Author

gongweibao commented Aug 17, 2017

关于client server pfsmodule的命名。
确实有点土。

一开始client的package是有的,只是后来发现多数的命令服务端和客户端都是需要,很多的工作在做网络通信,把命令的参数传递过来过去,client的远程命令对server来说其实还是本地,反过来也是一样。
pfsmoudle的意思是clientserver的通用部分。
目前看来,能放到client package文件upload.go download.go,只是这样一来pfsmoudleclient 将会相互的引用。

我觉得根本的问题可能在于网络通信的接口没有抽象好,导致看上去有点层次不清楚。

这个我也没有想好,helin能否给些建议?

@@ -22,6 +22,7 @@ func getToken(uri string, body []byte) ([]byte, error) {

// Token fetch and caches the token for current configured user
func Token(config *config.SubmitConfig) (string, error) {
return "gongwb", nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete the hard code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是用来测试的
多谢,后边统一去掉。Done。

@gongweibao gongweibao changed the title Fix download and upload large file bugs [WIP] Fix download and upload large file bugs Aug 17, 2017
@helinwang
Copy link
Collaborator

关于client server pfsmodule的命名。
确实有点土。
一开始client的package是有的,只是后来发现多数的命令服务端和客户端都是需要,很多的工作在做网络通信,把命令的参数传递过来过去,client的远程命令对server来说其实还是本地,反过来也是一样。
pfsmoudle的意思是client和server的通用部分。
目前看来,能放到client package文件upload.go download.go,只是这样一来pfsmoudle和client将会相互的引用。
我觉得根本的问题可能在于网络通信的接口没有抽象好,导致看上去有点层次不清楚。
这个我也没有想好,helin能否给些建议?

@gongweibao 原来的地方无法回复,在这里回复下吧:
这个改动太大了,不适合跟这个PR一起进去,不然等伟宝忙完眼前的事情,再想一想如何把结构条清晰吧。

if err != nil {
return err
}
// ToString packs info of Chunk
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToString -> String

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done。

@helinwang
Copy link
Collaborator

helinwang commented Aug 17, 2017

这些都不是这个PR必须要改的建议哈:
代码中http handler有很多地方是重复的error handle,可以参考here

@helinwang helinwang closed this Aug 17, 2017
@helinwang
Copy link
Collaborator

不好意思,点错了,不小心关掉了。

@helinwang helinwang reopened this Aug 17, 2017
@@ -30,10 +30,16 @@ type ChunkMeta struct {
Len int64 `json:"len"`
}

// ToString pack a info tring of ChunkMeta.
Copy link
Collaborator

@helinwang helinwang Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToString -> String
可以考虑搜索以下代码中的ToString字符串,需要的都可以String。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

惭愧。Done。


defer f.Close()

c, err := f.ReadChunk(p.Offset, p.ChunkSize)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里并没有read chunk(读chunk的内容),只是读了meta,考虑改成LoadMeta?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在FileHandle中加了接口。Done。

Offset: p.Offset,
Checksum: c.Checksum,
Len: c.Len,
}, err
Copy link
Collaborator

@helinwang helinwang Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

前面考虑改成

if err == io.EOF {
  err = nil
}

if err != nil {
  return nil, err
}

不然EOF的时候return了meta和error.EOF(成功了err却不是nil)。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我这个地方我也考虑过,
语义上来说,我们想取得Offset位置和ChunkSize大小的ChunkMeta,虽然得到了,但是ChunkSize却不是我们指定的,所以,取得了同时也要要告知调用者:虽然得到了但是却碰到了io.EOF。

这也是Go本身的Read, io.CopyN等函数的设计原则。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

明白了!


if len > defaultMaxChunkSize || len < defaultMinChunkSize {
return nil, errors.New(StatusBadChunkSize)
if len(resp.Err) == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if resp.Err == "" {更容易理解。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

赞。Done。

part, error := partReader.NextPart()
if error == io.EOF {
break
m, errm := r.GetChunkMeta(offset, size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉GetChunkMeta没必要返回EOF了,可以err是EOF的时候就设成nil。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#321 (comment)
这里回复了。

}
log.V(2).Infoln("remote chunk info:" + m.String())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.V(2).Infoln("remote chunk info:", m)是一样的效果,因为m实现了fmt.Stinger接口。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

赞。我越来越理解Interface的巧妙了。Done。

Copy link
Collaborator

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please follow up with several small refactor PR to fix the issues that we have discussed.

@gongweibao gongweibao merged commit 30cd080 into PaddlePaddle:develop Aug 21, 2017
@gongweibao gongweibao deleted the fixcptimeout branch August 21, 2017 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants