-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
basic_cas.json pointless re_uploads #665
Comments
Oh this is very interesting and not something I'd ever thought would be an issue. The problem is that in the We require a We have a couple options here:
1 is much easier to implement quickly, 2 requires a bit of thought, since we expose special APIs for workers to interact with on the I think for now we can do 1 then create a ticket to do 2 later. @lukts30, thanks for pointing this out. If you are blocked by this here's a config that should get you over this hurdle for now (untested, so please let me know if it doesn't work). I don't believe there's any significant happy-path performance penalties for this config: {
"stores": {
"AC_MAIN_STORE": {
"memory": {
"eviction_policy": {
// 100mb.
"max_bytes": 100000000,
}
}
},
"FILESYSTEM_STORE": {
"filesystem": {
"content_path": "/tmp/nativelink/data-worker-test/content_path-cas",
"temp_path": "/tmp/nativelink/data-worker-test/tmp_path-cas",
"eviction_policy": {
// 10gb.
"max_bytes": 10000000000,
}
}
},
"WORKER_FAST_SLOW_STORE": {
"fast_slow": {
// "fast" must be a "filesystem" store because the worker uses it to make
// hardlinks on disk to a directory where the jobs are running.
"fast": {
"ref_store": {
"name": "FILESYSTEM_STORE"
}
},
"slow": {
"ref_store": {
// Also forward to the same filesystem store.
"name": "FILESYSTEM_STORE"
}
}
}
}
},
"schedulers": {
"MAIN_SCHEDULER": {
"simple": {
"supported_platform_properties": {
"cpu_count": "minimum",
"memory_kb": "minimum",
"network_kbps": "minimum",
"disk_read_iops": "minimum",
"disk_read_bps": "minimum",
"disk_write_iops": "minimum",
"disk_write_bps": "minimum",
"shm_size": "minimum",
"gpu_count": "minimum",
"gpu_model": "exact",
"cpu_vendor": "exact",
"cpu_arch": "exact",
"cpu_model": "exact",
"kernel_version": "exact",
"OSFamily": "priority",
"container-image": "priority",
// Example of how to set which docker images are available and set
// them in the platform properties.
// "docker_image": "priority",
}
}
}
},
"workers": [{
"local": {
"worker_api_endpoint": {
"uri": "grpc://127.0.0.1:50061",
},
"cas_fast_slow_store": "WORKER_FAST_SLOW_STORE",
"upload_action_result": {
"ac_store": "AC_MAIN_STORE",
},
"work_directory": "/tmp/nativelink/work",
"platform_properties": {
"cpu_count": {
"values": ["16"],
},
"memory_kb": {
"values": ["500000"],
},
"network_kbps": {
"values": ["100000"],
},
"cpu_arch": {
"values": ["x86_64"],
},
"OSFamily": {
"values": [""]
},
"container-image": {
"values": [""]
},
// Example of how to set which docker images are available and set
// them in the platform properties.
// "docker_image": {
// "query_cmd": "docker images --format {{.Repository}}:{{.Tag}}",
// }
}
}
}],
"servers": [{
"name": "public",
"listener": {
"http": {
"socket_address": "0.0.0.0:50051"
}
},
"services": {
"cas": {
"main": {
"cas_store": "FILESYSTEM_STORE"
}
},
"ac": {
"main": {
"ac_store": "AC_MAIN_STORE"
}
},
"execution": {
"main": {
"cas_store": "FILESYSTEM_STORE",
"scheduler": "MAIN_SCHEDULER",
}
},
"capabilities": {
"main": {
"remote_execution": {
"scheduler": "MAIN_SCHEDULER",
}
}
},
"bytestream": {
"cas_stores": {
"main": "FILESYSTEM_STORE",
}
}
}
}, {
"name": "private_workers_servers",
"listener": {
"http": {
"socket_address": "0.0.0.0:50061"
}
},
"services": {
"experimental_prometheus": {
"path": "/metrics"
},
// Note: This should be served on a different port, because it has
// a different permission set than the other services.
// In other words, this service is a backend api. The ones above
// are a frontend api.
"worker_api": {
"scheduler": "MAIN_SCHEDULER",
},
"admin": {}
}
}],
"global": {
"max_open_files": 512
}
} |
@blakehatch, do you want to take a swing at this? |
What if you just change the no-op store to a ref store and point both fast and slow to the same file store? Just realised that's what you suggested in your config. That seems reasonable. |
The reason I don't want to do this as an official suggestion is because if anything except workers use that |
@allada yes I’ll take a crack at approach 1: |
Since commit f9f7908 I see the following behavior:
I have only tested with basic_cas.json unsure what the status of the other config are.
Noticed this originally with buck2 but could reproduce the problem with bazel as well.
With: 2a89ce6 (GOOD)
With: f9f7908 (BAD)
Notice: IP traffic sent: 1.0G
The text was updated successfully, but these errors were encountered: