Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support registry resource #154

Merged
merged 5 commits into from
Jun 22, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 65 additions & 2 deletions doc/usage_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,21 @@ Usage: paddlecloud <flags> <subcommand> <subcommand args>

Subcommands:
commands list all command names
delete Delete the specify resource.
file Simple file operations.
get Print resources
help describe subcommands and their syntax
kill Stop the job. -rm will remove the job from history.
logs Print logs of the job.
registry Add registry secret on paddlecloud.
submit Submit job to PaddlePaddle Cloud.

Subcommands for PFS:
cp uoload or download files
ls List files on PaddlePaddle Cloud
mkdir mkdir directoies on PaddlePaddle Cloud
rm rm files on PaddlePaddle Cloud


Use "paddlecloud flags" for a list of top-level flags
```
Expand Down Expand Up @@ -123,13 +132,30 @@ scp -r my_training_package/ user@tunnel-server:/mnt/hdfs_mulan/idl/idl-dl/mypack
- 提交基于V1 API的训练任务

```bash
paddlecloud submit -jobname my-paddlecloud-job -cpu 1 -gpu 0 -memory 1Gi -parallelism 10 -pscpu 1 -pservers 3 -psmemory 1Gi -passes 1 -topology trainer_config.py /pfs/[datacenter_name]/home/[username]/ctr_demo_package
paddlecloud submit -jobname my-paddlecloud-job \
-cpu 1 \
-gpu 0 \
-memory 1Gi \
-parallelism 10 \
-pscpu 1 \
-pservers 3 \
-psmemory 1Gi \
-passes 1 \
-topology trainer_config.py /pfs/[datacenter_name]/home/[username]/ctr_demo_package
```

- 提交基于V2 API的训练任务

```bash
paddlecloud submit -jobname my-paddlecloud-job -cpu 1 -gpu 0 -memory 1Gi -parallelism 10 -pscpu 1 -pservers 3 -psmemory 1Gi -passes 1 -entry "python trainer_config.py" /pfs/[datacenter_name]/home/[username]/ctr_demo_package
paddlecloud submit -jobname my-paddlecloud-job \
-cpu 1 \
-gpu 0 \
-memory 1Gi \
-parallelism 10 \
-pscpu 1 \
-pservers 3 \
-psmemory 1Gi \
-entry "python trainer_config.py" /pfs/[datacenter_name]/home/[username]/ctr_demo_package
```

参数说明:
Expand All @@ -146,6 +172,43 @@ paddlecloud submit -jobname my-paddlecloud-job -cpu 1 -gpu 0 -memory 1Gi -parall
- `-passes`:执行训练的pass个数
- `package`:HDFS 训练任务package的路径

### 使用自定义的Runtime Docker Image
runtime Docker Image是实际被Kubernetes调度的Docker Image,如果在某些情况下需要自定义属于某个任务的Docker Image可以通过以下方式
- 自定义Runtime Docker Image
```bash
git clone https://github.com/PaddlePaddle/cloud.git && cd cloud/docker
./build_docker.sh {PaddlePaddle production image} {runtime Docker image}
docker push {runtime Docker image}
```
- 使用自定义的runtime Docker Image来运行Job
```bash
paddlecloud submit -image {runtime Docker image} -jobname ...
```

- 使用私有registry的runtime Docker image
- 在PaddleCloud上添加registry认证信息
```bash
paddlecloud registry \
-username {your username}
-password {your password}
-server {your registry server}
-name {your registry name}
```
- 使用私有registry提交任务
```bash
paddlecloud submit \
-image {runtime Docker image} \
-registry {your registry name}
```
- 查看所有的registry
```bash
paddlecloud get registry
```
- 删除指定的registry
```bash
paddlecloud delete registry
```

## 查看任务状态

用户可以查看任务、任务节点、用户空间配额的当前状态。
Expand Down
2 changes: 1 addition & 1 deletion docker/build_docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ docker run --rm -it -v $PWD:/cloud $base_image \
#Build Docker Image
cat > Dockerfile <<EOF
FROM ${base_image}
RUN pip install -U kubernetes && apt-get install -y iputils-ping
RUN pip install -U kubernetes && apt-get update -y && apt-get install -y iputils-ping
ADD ./paddle_k8s /usr/bin
ADD ./k8s_tools.py /root/
ADD ./python/dist/pcloud-0.1.1-py2-none-any.whl /tmp/
Expand Down
10 changes: 6 additions & 4 deletions go/cmd/paddlecloud/paddlecloud.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,12 @@ func main() {
subcommands.Register(&paddlecloud.GetCommand{}, "")
subcommands.Register(&paddlecloud.KillCommand{}, "")
subcommands.Register(&paddlecloud.SimpleFileCmd{}, "")
subcommands.Register(&pfsmod.LsCmd{}, "")
subcommands.Register(&pfsmod.CpCmd{}, "")
subcommands.Register(&pfsmod.RmCmd{}, "")
subcommands.Register(&pfsmod.MkdirCmd{}, "")
subcommands.Register(&paddlecloud.RegistryCmd{}, "")
subcommands.Register(&paddlecloud.DeleteCommand{}, "")
subcommands.Register(&pfsmod.LsCmd{}, "PFS")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the second argument "PFS" mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second argument of subcommands.Register is group, if we add a subcommand with a specify a group name, these subcommands will be explained with a group name before, such as:

Subcommands:
	commands         list all command names
	delete           Delete the specify resource.
	file             Simple file operations.
...

Subcommands for PFS:
	cp               uoload or download files
	ls               List files on PaddlePaddle Cloud
...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

subcommands.Register(&pfsmod.CpCmd{}, "PFS")
subcommands.Register(&pfsmod.RmCmd{}, "PFS")
subcommands.Register(&pfsmod.MkdirCmd{}, "PFS")

flag.Parse()
ctx := context.Background()
Expand Down
52 changes: 52 additions & 0 deletions go/paddlecloud/delete.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
package paddlecloud

import (
"context"
"flag"
"fmt"
"os"

"github.com/google/subcommands"
)

// DeleteCommand do job killings
type DeleteCommand struct {
}

// Name is subcommands name
func (*DeleteCommand) Name() string { return "delete" }

// Synopsis is subcommands synopsis
func (*DeleteCommand) Synopsis() string { return "Delete the specify resource." }

// Usage is subcommands usage
func (*DeleteCommand) Usage() string {
return `delete registry [registry-name]
`
}

// SetFlags registers subcommands flags
func (p *DeleteCommand) SetFlags(f *flag.FlagSet) {
}

// Execute kill command
func (p *DeleteCommand) Execute(_ context.Context, f *flag.FlagSet, _ ...interface{}) subcommands.ExitStatus {
if f.NArg() != 2 {
f.Usage()
return subcommands.ExitFailure
}
if f.Arg(0) == RegistryCmdName {
name := f.Arg(1)
r := RegistryCmd{SecretName: KubeRegistryName(name)}
err := r.Delete()
if err != nil {
fmt.Fprintf(os.Stderr, "error delete registry: %v\n", err)
return subcommands.ExitFailure
}
fmt.Fprintf(os.Stdout, "registry: [%s] is deleted\n", name)
} else {
f.Usage()
return subcommands.ExitFailure
}
return subcommands.ExitSuccess
}
34 changes: 32 additions & 2 deletions go/paddlecloud/get.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ func (*GetCommand) Synopsis() string { return "Print resources" }

// Usage is subcommands usage
func (*GetCommand) Usage() string {
return `get [jobs|workers [jobname]|quota]:
return `get [jobs|workers|registry [jobname]|quota]:
Print resources.
`
}
Expand All @@ -47,6 +47,8 @@ func (p *GetCommand) Execute(_ context.Context, f *flag.FlagSet, _ ...interface{
jobs()
} else if f.Arg(0) == "quota" {
quota()
} else if f.Arg(0) == "registry" {
registry()
} else if f.Arg(0) == "workers" {
if f.NArg() != 2 {
f.Usage()
Expand Down Expand Up @@ -91,7 +93,35 @@ func workers(jobname string) error {
w.Flush()
return nil
}

func registry() error {
respBody, err := utils.GetCall(utils.Config.ActiveConfig.Endpoint+"/api/v1/registry/", nil)
if err != nil {
fmt.Fprintf(os.Stderr, "err getting registry secret: %v\n", err)
return err
}
var respObj interface{}
err = json.Unmarshal(respBody, &respObj)
if err != nil {
return err
}
items := respObj.(map[string]interface{})["msg"].(map[string]interface{})["items"].([]interface{})
w := tabwriter.NewWriter(os.Stdout, 0, 0, 3, ' ', 0)
if len(items) >= 0 {
fmt.Fprintf(w, "ID\tNAME\tDATA\n")
}
idx := 0
for _, r := range items {
metadata := r.(map[string]interface{})["metadata"].(map[string]interface{})
name := RegistryName(metadata["name"].(string))
if len(name) != 0 {
cTime := metadata["creation_timestamp"].(string)
fmt.Fprintf(w, "%d\t%s\t%s\n", idx, name, cTime)
idx++
}
}
w.Flush()
return err
}
func jobs() error {
respBody, err := utils.GetCall(utils.Config.ActiveConfig.Endpoint+"/api/v1/jobs/", nil)
if err != nil {
Expand Down
121 changes: 121 additions & 0 deletions go/paddlecloud/registry.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
package paddlecloud

import (
"context"
"encoding/json"
"errors"
"flag"
"fmt"
"os"
"strings"

"github.com/PaddlePaddle/cloud/go/utils"
"github.com/golang/glog"
"github.com/google/subcommands"
)

const (
// RegistryCmdName is subcommand name
RegistryCmdName = "registry"
// RegistryPrefix is the prefix for Kubernetes secret name
RegistryPrefix = "pcloud-registry"
)

// RegistryCmd is Docker registry secret information
type RegistryCmd struct {
SecretName string `json:"name"`
Username string `json:"username"`
Password string `json:"password"`
Server string `json:"server"`
}

// Name is the subcommand name
func (r *RegistryCmd) Name() string { return RegistryCmdName }

// Synopsis is the subcommand's synopsis
func (r *RegistryCmd) Synopsis() string { return "Add registry secret on paddlecloud." }

// Usage is the subcommand's usage
func (r *RegistryCmd) Usage() string {
return `registry <options> [add|del]:
`
}

// SetFlags registers subcommands flags.
func (r *RegistryCmd) SetFlags(f *flag.FlagSet) {
f.StringVar(&r.SecretName, "name", "", "registry secret name")
f.StringVar(&r.Username, "username", "", "your Docker registry username")
f.StringVar(&r.Password, "password", "", "your Docker registry password")
f.StringVar(&r.Server, "server", "", "your Docker registry Server")
}
func (r *RegistryCmd) addRegistrySecret() error {
jsonString, err := json.Marshal(r)
if err != nil {
return err
}
glog.V(10).Infof("Add registry secret: %s to %s\n", jsonString, utils.Config.ActiveConfig.Endpoint+"/api/v1/registry/")
respBody, err := utils.PostCall(utils.Config.ActiveConfig.Endpoint+"/api/v1/registry/", jsonString)
if err != nil {
return err
}
var respObj interface{}
if err = json.Unmarshal(respBody, &respObj); err != nil {
return err
}
// FIXME: Return an error if error message is not empty. Use response code instead
errMsg := respObj.(map[string]interface{})["msg"].(string)
if len(errMsg) > 0 {
return errors.New(errMsg)
}
return nil
}

// Delete the specify registry
func (r *RegistryCmd) Delete() error {
jsonString, err := json.Marshal(r)
if err != nil {
return err
}
glog.V(10).Infof("Delete registry secret: %s to %s\n", jsonString, utils.Config.ActiveConfig.Endpoint+"/api/v1/registry/")
respBody, err := utils.DeleteCall(utils.Config.ActiveConfig.Endpoint+"/api/v1/registry/", jsonString)
if err != nil {
return err
}

var respObj interface{}
if err = json.Unmarshal(respBody, &respObj); err != nil {
return err
}
// FIXME: Return an error if error message is not empty. Use response code instead
errMsg := respObj.(map[string]interface{})["msg"].(string)
if len(errMsg) > 0 {
return errors.New(errMsg)
}
return nil
}
func (r *RegistryCmd) Execute(_ context.Context, f *flag.FlagSet, _ ...interface{}) subcommands.ExitStatus {
if r.SecretName == "" || r.Username == "" || r.Password == "" || r.Server == "" {
f.Usage()
return subcommands.ExitFailure
}
r.SecretName = strings.Join([]string{RegistryPrefix, r.SecretName}, "-")
err := r.addRegistrySecret()
if err != nil {
fmt.Fprintf(os.Stderr, "add registry secret failed: %s\n", err)
return subcommands.ExitFailure
}
return subcommands.ExitSuccess
}

// KubeRegistryName add a prefix for the name
func KubeRegistryName(name string) string {
return RegistryPrefix + "-" + name
}

// RegistryName is registry secret name for PaddleCloud
func RegistryName(name string) string {
if strings.HasPrefix(name, RegistryPrefix) {
return name[len(RegistryPrefix)+1 : len(name)]
}
return ""
}
4 changes: 4 additions & 0 deletions go/paddlecloud/submit.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ type SubmitCmd struct {
Topology string `json:"topology"`
Datacenter string `json:"datacenter"`
Passes int `json:"passes"`
Image string `json:"image"`
Registry string `json:"registry"`
}

// Name is subcommands name.
Expand Down Expand Up @@ -59,6 +61,8 @@ func (p *SubmitCmd) SetFlags(f *flag.FlagSet) {
f.StringVar(&p.Entry, "entry", "", "Command of starting trainer process. Defaults to paddle train")
f.StringVar(&p.Topology, "topology", "", "Will Be Deprecated .py file contains paddle v1 job configs")
f.IntVar(&p.Passes, "passes", 1, "Pass count for training job")
f.StringVar(&p.Image, "image", "", "Runtime Docker image for the job")
f.StringVar(&p.Registry, "registry", "", "Registry secret name for the runtime Docker image")
}

// Execute submit command.
Expand Down
1 change: 1 addition & 0 deletions paddlecloud/paddlecloud/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
url(r"^api/v1/workers/", paddlejob.views.WorkersView.as_view()),
url(r"^api/v1/quota/", paddlejob.views.QuotaView.as_view()),
url(r"^api/v1/file/", paddlejob.views.SimpleFileView.as_view()),
url(r"^api/v1/registry/", paddlejob.registry.RegistryView.as_view()),
]

urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
Expand Down
3 changes: 2 additions & 1 deletion paddlecloud/paddlejob/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
from paddle_job import PaddleJob
__all__ = ["PaddleJob"]
import registry
__all__ = ["PaddleJob", "registry"]
Loading