Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(portal): 在适配器为 K8S 的时候,提交作业页面增加输入框:镜像地址 #954

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open
5 changes: 5 additions & 0 deletions .changeset/dry-clouds-complain.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@scow/grpc-api": minor
---

submitJob 增加参数 extraOptions,类型为map<string, string>,暂时只支持k8sImageUrl属性
7 changes: 7 additions & 0 deletions .changeset/tricky-schools-taste.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
"@scow/portal-server": minor
"@scow/portal-web": minor
"@scow/docs": minor
---

在适配器为 K8S 的时候,提交作业页面增加输入框:镜像地址
3 changes: 3 additions & 0 deletions apps/portal-server/src/clusterops/api/job.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ export interface JobTemplate {
errorOutput?: string;
memory?: string;
comment?: string | undefined;
extraOptions: {
[key: string]: string;
}
}

export interface ListJobTemplatesRequest {
Expand Down
3 changes: 3 additions & 0 deletions apps/portal-server/src/clusterops/job.ts
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,9 @@ export interface JobMetadata {
submitTime: string;
workingDirectory: string;
memory?: string;
extraOptions: {
[key: string]: string;
}
}

export const jobOps = (cluster: string): JobOps => {
Expand Down
7 changes: 4 additions & 3 deletions apps/portal-server/src/services/job.ts
Original file line number Diff line number Diff line change
Expand Up @@ -156,10 +156,9 @@ export const jobServiceServer = plugin((server) => {
},

submitJob: async ({ request, logger }) => {
const { cluster, command, jobName, coreCount, gpuCount, maxTime, saveAsTemplate, userId,
const { cluster, command, jobName, coreCount, gpuCount, maxTime, saveAsTemplate, userId, extraOptions,
nodeCount, partition, qos, account, comment, workingDirectory, output, errorOutput, memory } = request;


const client = getAdapterClient(cluster);
if (!client) { throw clusterNotFound(cluster); }

Expand All @@ -174,7 +173,8 @@ export const jobServiceServer = plugin((server) => {
const reply = await asyncClientCall(client.job, "submitJob", {
userId, jobName, account, partition: partition!, qos, nodeCount, gpuCount: gpuCount || 0,
memoryMb: Number(memory?.split("M")[0]), coreCount, timeLimitMinutes: maxTime,
script: command, workingDirectory, stdout: output, stderr: errorOutput, extraOptions: [],
script: command, workingDirectory, stdout: output, stderr: errorOutput,
extraOptions: Object.values(extraOptions).length ? Object.values(extraOptions) : [],
}).catch((e) => {
const ex = e as ServiceError;
const errors = parseErrorDetails(ex.metadata);
Expand Down Expand Up @@ -205,6 +205,7 @@ export const jobServiceServer = plugin((server) => {
output,
errorOutput,
memory,
extraOptions,
};

const clusterOps = getClusterOps(cluster);
Expand Down
3 changes: 2 additions & 1 deletion apps/portal-server/tests/job/jobTemplate.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ const jobInfo = {
"output": "job.%j.out",
"errorOutput": "job.%j.err",
"memory": "750MB",
"extraOptions": {},
};
const templateId = "testJob-1111";

Expand Down Expand Up @@ -101,7 +102,7 @@ it("delete job template", async () => {
cluster, userId, templateId,
});

expect(templateInfo?.template).toBeObject();
expect(templateInfo?.template?.jobName).toBe("testJob");

await asyncUnaryCall(client, "deleteJobTemplate", {
cluster, userId, templateId,
Expand Down
1 change: 1 addition & 0 deletions apps/portal-web/src/apis/api.mock.ts
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ export const mockApi: MockApi<typeof api> = {
output: "job.%j.out",
errorOutput: "job.%j.err",
workingDirectory: "/nfs/jobs/123",
extraOptions: {},
},
}),

Expand Down
1 change: 1 addition & 0 deletions apps/portal-web/src/i18n/en.ts
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,7 @@ export default {
+ "the job submission or execution will fail.",
output: "Standard Output File",
errorOutput: "Error Output File",
k8sImageUrl: "Image Url",
totalNodeCount: "Total Nodes: ",
totalGpuCount: "Total GPU Cards: ",
totalCoreCount: "Total CPU Cores: ",
Expand Down
1 change: 1 addition & 0 deletions apps/portal-web/src/i18n/zh_cn.ts
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,7 @@ export default {

output: "标准输出文件",
errorOutput: "错误输出文件",
k8sImageUrl: "镜像地址",
totalNodeCount: "总节点数:",
totalGpuCount: "总GPU卡数:",
totalCoreCount: "总CPU核心数:",
Expand Down
17 changes: 16 additions & 1 deletion apps/portal-web/src/pageComponents/job/SubmitJobForm.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ interface JobForm {
output: string;
errorOutput: string;
save: boolean;
k8sImageUrl: string | undefined;
}

// 生成默认工作名称,命名规则为年月日-时分秒,如job-20230510-103010
Expand Down Expand Up @@ -85,7 +86,7 @@ export const SubmitJobForm: React.FC<Props> = ({ initial = initialValues, submit

const submit = async () => {
const { cluster, command, jobName, coreCount, gpuCount, workingDirectory, output, errorOutput, save,
maxTime, nodeCount, partition, qos, account, comment } = await form.validateFields();
maxTime, nodeCount, partition, qos, account, comment, k8sImageUrl } = await form.validateFields();

setLoading(true);

Expand All @@ -95,6 +96,7 @@ export const SubmitJobForm: React.FC<Props> = ({ initial = initialValues, submit
gpuCount,
maxTime, nodeCount, partition, qos, comment,
workingDirectory, save, memory, output, errorOutput,
extraOptions: k8sImageUrl ? { k8sImageUrl } : {},
} })
.httpError(500, (e) => {
if (e.code === "SCHEDULER_FAILED") {
Expand Down Expand Up @@ -373,6 +375,19 @@ export const SubmitJobForm: React.FC<Props> = ({ initial = initialValues, submit
<Input />
</Form.Item>
</Col>

{clusterInfoQuery.data?.clusterInfo.scheduler.name === "k8s" ? (
<Col span={24}>
<Form.Item
label={t(p("k8sImageUrl"))}
name="k8sImageUrl"
rules={[{ pattern: /^\S+$/, message: "请输入正确的镜像地址" }]}
>
<Input />
</Form.Item>
</Col>
) : ""}

<Col className="ant-form-item" span={12} sm={6}>
{t(p("totalNodeCount"))}{nodeCount}
</Col>
Expand Down
1 change: 1 addition & 0 deletions apps/portal-web/src/pages/api/job/getJobTemplate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ export const JobTemplate = Type.Object({
output: Type.Optional(Type.String()),
errorOutput: Type.Optional(Type.String()),
comment: Type.Optional(Type.String()),
extraOptions: Type.Record(Type.String(), Type.String()),
});
export type JobTemplate = Static<typeof JobTemplate>;

Expand Down
4 changes: 3 additions & 1 deletion apps/portal-web/src/pages/api/job/submitJob.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ export const SubmitJobInfo = Type.Object({
memory: Type.Optional(Type.String()),
comment: Type.Optional(Type.String()),
save: Type.Boolean(),
extraOptions: Type.Record(Type.String(), Type.String()),
});

export type SubmitJobInfo = Static<typeof SubmitJobInfo>;
Expand Down Expand Up @@ -72,7 +73,7 @@ export default route(SubmitJobSchema, async (req, res) => {

if (!info) { return; }

const { cluster, command, jobName, coreCount, gpuCount, maxTime, save,
const { cluster, command, jobName, coreCount, gpuCount, maxTime, save, extraOptions,
nodeCount, partition, qos, account, comment, workingDirectory, output, errorOutput, memory } = req.body;

const client = getClient(JobServiceClient);
Expand Down Expand Up @@ -102,6 +103,7 @@ export default route(SubmitJobSchema, async (req, res) => {
output,
errorOutput,
saveAsTemplate: save,
extraOptions,
})
.then(async ({ jobId }) => {
await callLog(
Expand Down
1 change: 1 addition & 0 deletions apps/portal-web/src/pages/jobs/submit.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ export const SubmitJobPage: NextPage<Props> = requireAuth(() => true)(
output: template.output,
errorOutput: template.errorOutput,
save: false,
k8sImageUrl: template.extraOptions.k8sImageUrl,
}));
} else {
return undefined;
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/deploy/config/cluster-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ crossClusterFileTransfer:
enabled: true
# 传输节点的地址(ip地址:端口号)
transferNode: localhost:22222

```

## 注意
Expand Down
2 changes: 2 additions & 0 deletions protos/portal/job.proto
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ message JobTemplate {
optional string output = 11;
optional string error_output = 12;
optional string comment = 13;
map<string, string> extra_options = 14;
}

message ListAllJobsResponse {
Expand Down Expand Up @@ -151,6 +152,7 @@ message SubmitJobRequest {
optional string memory = 15;
optional string comment = 16;
bool save_as_template = 17;
map<string, string> extra_options = 18;
}

// NOT_FOUND: cluster is not found
Expand Down