diff --git a/README.md b/README.md index cb09ad8a..b4ea5638 100644 --- a/README.md +++ b/README.md @@ -48,7 +48,7 @@ - [Registering your wallet](#registering-your-wallet) - [Running a Miner](#running-a-miner) - [Running a Validator](#running-a-validator) -- [New Releases](#new-releases) +- [Releases](#releases) - [Troubleshooting](#troubleshooting) - [Troubleshooting Subtensor](#troubleshooting-subtensor) - [License](#license) @@ -372,28 +372,9 @@ pm2 start neurons/validator.py \ > NOTE: to access the wandb UI to get statistics about the miners, you can click on this [link](https://wandb.ai/eclipsevortext/subvortex-team) and choose the validator run you want. -## New Releases +## Releases -When a new version of the subnet is released, each miner/validatior have to be updated. - -> Be sure you are in the SubVortex directory - -Get the lastest version of the subnet - -``` -git pull -``` - -Install the dependencies - -``` -pip install -r requirements.txt -pip install -e . -``` - -Restart miners/validators if running them in your base environment or restart pm2 by executing `pm2 restart all` if you are using pm2 as process manager. - -> NOTE: to access the wandb UI to get statistics about the miners, you can click on this [link](https://wandb.ai/eclipsevortext/subvortex-team) and choose the validator run you want. +- [Release-2.1.0](./scripts/release/release-2.1.0/RELEASE-2.1.0.md) ## Troubleshooting diff --git a/scripts/redis/docs/redis-backup.md b/scripts/redis/docs/redis-backup.md new file mode 100644 index 00000000..9f17bd5c --- /dev/null +++ b/scripts/redis/docs/redis-backup.md @@ -0,0 +1,77 @@ +This guide provides step-by-step instructions for creating and restoring a dump in Redis. Redis is an open-source, in-memory data structure store used as a database, cache, and message broker. Dumps are a way to back up and restore data in Redis. + +
+ +Table of Contents +--- + +- [Create a dump](#create-a-dump) +- [Restore a dump](#restore-a-dump) + +
+ +## Creating a Redis Dump + +To create a dump of your Redis database, follow these steps: + +1. **Connect to Redis**: Open a terminal or command prompt and connect to your Redis instance using the `redis-cli` command: + + ```bash + redis-cli -a $(sudo grep -Po '^requirepass \K.*' /etc/redis/redis.conf) + ``` + +2. **Create the Dump**: Use the `SAVE` command to create a dump of the current database. This command saves the dataset to a file called `dump.rdb` in the Redis data directory. + + ```bash + SAVE + ``` + +3. **Create the Dump**: Exit redis environment + + ```bash + exit + ``` + +4. **Make a copy**: Copy the dump file `dump.rdb` in `/var/lib/redis` and make a copy of it + + ```bash + sudo cp /var/lib/redis/dump.rdb /var/lib/redis/dump.bak.rdb + ``` + +5. **Verify the Dump**: Check that the copy of the dump file (`dump.bak.rdb`) has been created in `/var/lib/redis`. + ```bash + ls /var/lib/redis + ``` + +## Restoring a Redis Dump + +To restore a dump in Redis, follow these steps: + +1. **Stop the Redis Server**: If Redis is running, stop the Redis server: + + ```bash + sudo systemctl stop redis-server.service + ``` + +2. **Replace the Dump File**: Replace the existing `dump.rdb` file in the Redis data directory with the dump file you want to restore. + +3. **Start the Redis Server**: Start the Redis server again. + + ```bash + sudo systemctl start redis-server.service + ``` + +4. **Verify the Restoration**: Connect to Redis using `redis-cli` and verify that the data has been restored correctly: + + ```bash + redis-cli + KEYS * + ``` + + This command will display all keys in the database, confirming that the restoration was successful. + +## Additional Notes + +- It's important to ensure that Redis is stopped before replacing the dump file to avoid data corruption. + +For more information about Redis and its commands, refer to the [Redis Documentation](https://redis.io/documentation). diff --git a/scripts/release/release-2.1.0/RELEASE-2.1.0.md b/scripts/release/release-2.1.0/RELEASE-2.1.0.md new file mode 100644 index 00000000..bb913f13 --- /dev/null +++ b/scripts/release/release-2.1.0/RELEASE-2.1.0.md @@ -0,0 +1,179 @@ +This guide provides step-by-step instructions for the release 2.1.0. + +Previous Release: 2.0.0 + +
+ +--- + +- [Validator](#validators) + - [Rollout Process](#validator-rollout-process) + - [Rollback Process](#validator-rollback-process) +- [Miner](#miner) + - [Rollout Process](#miner-rollout-process) + - [Rollback Process](#miner-rollback-process) +- [Additional Resources](#additional-resources) + +--- + +
+ +# Validator + +## Rollout Process + +1. **Backup Database**: Before starting the rollout process, backup your database using the [Backup Guide](../../redis/docs/redis-backup.md#create-a-dump). + +2. **Upgrade Subnet**: Check if you are on main or on a tag + + ```bash + git branch -vvv + ``` + + You will see something similar + + ```bash + # If you are on a tag + * (HEAD detached at v0.2.4) d6e233a Merge pull request #13 from eclipsevortex/release/0.2.4 + + # If you are on main + * main 13e555e [origin/main] Merge pull request #19 from eclipsevortex/release/2.0.0 + ``` + + > IMPORTANT
+ > The \* tell you your active branch. It has to be hear on the tag on the main branch. + + If you are on a tag branch, checkout main + + ```bash + git checkout main + ``` + + Otherwise/Then, get the latest version of the subnet + + ```bash + git pull + ``` + + Then, install the dependencies + + ```bash + pip install -r requirements.txt + pip install -e . + ``` + +3. **Restart validator**: Restart your validator to take the new version into the new version + + ```bash + pm2 restart validator-92 + ``` + +4. **Check logs**: Check the validator logs to see if you see some `New Block` + ```bash + pm2 logs validator-92 + ``` + +
+ +## Rollback Process + +If any issues arise during or after the rollout, follow these steps to perform a rollback: + +1. **Rollback Database**: Rollback the database by running in **SubVortex** directory + + ```bash + python3 ./scripts/migrations/release-2.1.0/migration.py --run-type rollback + ``` + + You should see + + ```bash + 2024-03-29 22:08:27.867 | INFO | Loading database from localhost:6379 + 2024-03-29 22:08:27.901 | INFO | Rollback starting + 2024-03-29 22:08:27.907 | INFO | Rollback done + 2024-03-29 22:08:27.908 | INFO | Checking rollback... + 2024-03-29 22:08:27.910 | INFO | Rollback checked successfully + ``` + + If any issue, restore your backup database using the [Backup Guide](../../migrations/backup.md#restore-a-dump). + +2. **Downgrade Subnet**: Get the tags + + ```bash + git fetch --tags + ``` + + Check tag v2.0.0 exist + + ```bash + git tag + ``` + + Checkout the tag + + ```bash + git checkout tags/v2.0.0 + ``` + + you will see + + ``` + Note: switching to 'tags/v0.2.4'. + + You are in 'detached HEAD' state. You can look around, make experimental + changes and commit them, and you can discard any commits you make in this + state without impacting any branches by switching back to a branch. + + If you want to create a new branch to retain commits you create, you may + do so (now or later) by using -c with the switch command. Example: + + git switch -c + + Or undo this operation with: + + git switch - + + Turn off this advice by setting config variable advice.detachedHead to false + + HEAD is now at d6e233a Merge pull request #13 from eclipsevortex/release/0.2.4 + ``` + + Then install the dependencies + + ```bash + pip install -r requirements.txt + pip install -e . + ``` + +3. **Restart validator**: Restart your validator to take the new version into the new version + + ```bash + pm2 restart validator-92 + ``` + +4. **Check logs**: Check the validator logs to see if you see some `New Block` + ```bash + pm2 logs validator-92 + ``` + +
+ +# Miner + +## Rollout Process + +There is no rollout for miners. + +## Rollback Process + +There is no rollback for miners. + +
+ +# Additional Resources + +- [Backup Guide](../../redis/docs/redis-backup.md): Detailed instructions for backing up and restoring your database. + +
+ +For any further assistance or inquiries, please contact [**SubVortex Team**](https://discord.com/channels/799672011265015819/1215311984799653918) diff --git a/scripts/release/release-2.1.0/migration.py b/scripts/release/release-2.1.0/migration.py new file mode 100644 index 00000000..8c7a24d7 --- /dev/null +++ b/scripts/release/release-2.1.0/migration.py @@ -0,0 +1,93 @@ +import asyncio +import argparse +import bittensor as bt +from redis import asyncio as aioredis + +from subnet.shared.utils import get_redis_password +from subnet.shared.checks import check_environment + + +def check_redis(args): + try: + asyncio.run(check_environment(args.redis_conf_path)) + except AssertionError as e: + bt.logging.warning( + f"Something is missing in your environment: {e}. Please check your configuration, use the README for help, and try again." + ) + + +def rollout(): + bt.logging.info("No rollout") + + +async def rollback(args): + try: + bt.logging.info( + f"Loading database from {args.database_host}:{args.database_port}" + ) + redis_password = get_redis_password(args.redis_password) + database = aioredis.StrictRedis( + host=args.database_host, + port=args.database_port, + db=args.database_index, + password=redis_password, + ) + + bt.logging.info("Rollback starting") + async for key in database.scan_iter("selection:*"): + await database.delete(key) + bt.logging.info("Rollback done") + + bt.logging.info("Checking rollback...") + count = 0 + async for key in database.scan_iter("selection:*"): + count += 1 + if count == 0: + bt.logging.info("Rollback checked successfully") + else: + bt.logging.error( + f"Check rollback failed! You still have {count} keys to remove." + ) + + except Exception as e: + bt.logging.error(f"Error during rollback: {e}") + + +async def main(args): + if args.run_type == "rollout": + rollout() + else: + await rollback(args) + + +if __name__ == "__main__": + try: + parser = argparse.ArgumentParser() + parser.add_argument( + "--run-type", + type=str, + default="rollout", + help="Type of migration you want too execute. Possible values are rollout or rollback)", + ) + parser.add_argument( + "--redis_password", + type=str, + default=None, + help="password for the redis database", + ) + parser.add_argument( + "--redis_conf_path", + type=str, + default="/etc/redis/redis.conf", + help="path to the redis configuration file", + ) + parser.add_argument("--database_host", type=str, default="localhost") + parser.add_argument("--database_port", type=int, default=6379) + parser.add_argument("--database_index", type=int, default=1) + args = parser.parse_args() + + asyncio.run(main(args)) + except KeyboardInterrupt: + print("KeyboardInterrupt") + except ValueError as e: + print(f"ValueError: {e}") diff --git a/subnet/validator/challenge.py b/subnet/validator/challenge.py index cc8cf376..97ba54ef 100644 --- a/subnet/validator/challenge.py +++ b/subnet/validator/challenge.py @@ -10,6 +10,7 @@ AVAILABILITY_FAILURE_REWARD, LATENCY_FAILURE_REWARD, DISTRIBUTION_FAILURE_REWARD, + RELIABILLITY_WEIGHT_FAILURE_REWARD, AVAILABILITY_WEIGHT, LATENCY_WEIGHT, RELIABILLITY_WEIGHT, @@ -17,7 +18,7 @@ ) from subnet.shared.subtensor import get_current_block from subnet.validator.event import EventSchema -from subnet.validator.utils import ping_and_retry_uids +from subnet.validator.utils import ping_and_retry_uids, get_next_uids, ping_uid from subnet.validator.localisation import get_country from subnet.validator.bonding import update_statistics from subnet.validator.state import log_event @@ -30,6 +31,20 @@ CHALLENGE_NAME = "Challenge" +DEFAULT_PROCESS_TIME = 5 + + +async def check_miner_availability(self, uid: int): + # Check the miner + availble = False + + try: + # Ping the miner - miner and subtensor are unique so we consider a failure if one or the other is not reachable + availble = await ping_uid(self, uid) + except Exception: + availble = False + + return availble async def handle_synapse(self, uid: int): @@ -40,6 +55,13 @@ async def handle_synapse(self, uid: int): country = get_country(ip) bt.logging.debug(f"[{CHALLENGE_NAME}][{uid}] Subtensor country {country}") + # Check miner is available + available = await check_miner_availability(self, uid) + if available == False: + bt.logging.warning(f"[{CHALLENGE_NAME}][{uid}] Miner is not reachable") + return available, country, DEFAULT_PROCESS_TIME + + # Check the subtensor is available process_time = None try: # Create a subtensor with the ip return by the synapse @@ -70,10 +92,10 @@ async def handle_synapse(self, uid: int): bt.logging.trace( f"[{CHALLENGE_NAME}][{uid}] Verified ? {verified} - val: {validator_block}, miner:{miner_block}" ) - except Exception as err: + except Exception: verified = False - process_time = 5 if process_time is None else process_time - bt.logging.warning(f"[{CHALLENGE_NAME}][{uid}] Verified ? False") + process_time = DEFAULT_PROCESS_TIME if process_time is None else process_time + bt.logging.warning(f"[{CHALLENGE_NAME}][{uid}] Subtensor not verified") return verified, country, process_time @@ -100,7 +122,8 @@ async def challenge_data(self): ) # Select the miners - uids, _ = await ping_and_retry_uids(self, k=10) + validator_hotkey = self.metagraph.hotkeys[self.uid] + uids = await get_next_uids(self, validator_hotkey, k=10) bt.logging.debug(f"[{CHALLENGE_NAME}] Available uids {uids}") # Initialise the rewards object @@ -121,6 +144,8 @@ async def challenge_data(self): reliability_scores = [] distribution_scores = [] + bt.logging.info(f"[{CHALLENGE_NAME}] Computing uids scores") + # Compute the score for idx, (uid, (verified, country, process_time)) in enumerate( zip(uids, responses) @@ -172,8 +197,10 @@ async def challenge_data(self): bt.logging.debug(f"[{CHALLENGE_NAME}][{uid}] Latency score {latency_score}") # Compute score for reliability - reliability_score = await compute_reliability_score( - uid, self.database, hotkey + reliability_score = ( + await compute_reliability_score(uid, self.database, hotkey) + if verified + else RELIABILLITY_WEIGHT_FAILURE_REWARD ) reliability_scores.append(reliability_score) bt.logging.debug( @@ -183,7 +210,7 @@ async def challenge_data(self): # Compute score for distribution distribution_score = ( compute_distribution_score(idx, responses) - if responses[idx][2] is not None + if verified and responses[idx][2] is not None else DISTRIBUTION_FAILURE_REWARD ) distribution_scores.append((uid, distribution_score)) @@ -254,7 +281,9 @@ async def challenge_data(self): 1 - alpha ) * self.moving_averaged_scores.to(self.device) event.moving_averaged_scores = self.moving_averaged_scores.tolist() - bt.logging.trace(f"[{CHALLENGE_NAME}] Updated moving avg scores: {self.moving_averaged_scores}") + bt.logging.trace( + f"[{CHALLENGE_NAME}] Updated moving avg scores: {self.moving_averaged_scores}" + ) # Display step time forward_time = time.time() - start_time diff --git a/subnet/validator/config.py b/subnet/validator/config.py index 7101f07a..1d374b79 100644 --- a/subnet/validator/config.py +++ b/subnet/validator/config.py @@ -246,7 +246,7 @@ def add_args(cls, parser): "--wandb.run_step_length", type=int, help="How many steps before we rollover to a new run.", - default=360, + default=720, ) parser.add_argument( "--wandb.notes", diff --git a/subnet/validator/score.py b/subnet/validator/score.py index 53b4de91..89706a47 100644 --- a/subnet/validator/score.py +++ b/subnet/validator/score.py @@ -21,8 +21,12 @@ async def compute_reliability_score(uid, database, hotkey: str): await database.hget(stats_key, "challenge_successes") or 0 ) challenge_attempts = int(await database.hget(stats_key, "challenge_attempts") or 0) - bt.logging.trace(f"[{uid}][Score][Reliability] # challenge attempts {challenge_attempts}") - bt.logging.trace(f"[{uid}][Score][Reliability] # challenge succeeded {challenge_successes}") + bt.logging.trace( + f"[{uid}][Score][Reliability] # challenge attempts {challenge_attempts}" + ) + bt.logging.trace( + f"[{uid}][Score][Reliability] # challenge succeeded {challenge_successes}" + ) # Step 2: Normalization normalized_score = wilson_score_interval(challenge_successes, challenge_attempts) @@ -33,7 +37,9 @@ async def compute_reliability_score(uid, database, hotkey: str): def compute_latency_score(idx, uid, validator_country, responses): initial_process_times = [response[2] for response in responses] bt.logging.trace(f"[{uid}][Score][Latency] Process times {initial_process_times}") - bt.logging.trace(f"[{uid}][Score][Latency] Process time {initial_process_times[idx]}") + bt.logging.trace( + f"[{uid}][Score][Latency] Process time {initial_process_times[idx]}" + ) # Step 1: Get the localisation of the validator validator_localisation = get_localisation(validator_country) @@ -54,7 +60,7 @@ def compute_latency_score(idx, uid, validator_country, responses): location["longitude"], ) - scaled_distance = distance / MAX_DISTANCE + scaled_distance = distance / MAX_DISTANCE if distance > 0 else 0 tolerance = 1 - scaled_distance process_time = process_time * tolerance if process_time else 5 @@ -84,7 +90,11 @@ def compute_latency_score(idx, uid, validator_country, responses): score = relative_latency_scores[idx] bt.logging.trace(f"[{uid}][Score][Latency] Relative score {score}") - normalized_score = (score - min_score) / (max_score - min_score) + normalized_score = ( + (score - min_score) / (max_score - min_score) + if max_score - min_score > 0 + else 0 + ) return normalized_score diff --git a/subnet/validator/utils.py b/subnet/validator/utils.py index ade23698..789d3e20 100644 --- a/subnet/validator/utils.py +++ b/subnet/validator/utils.py @@ -148,6 +148,26 @@ async def get_available_query_miners( return get_pseudorandom_uids(self, muids, k=k) +async def ping_uid(self, uid): + """ + Ping a list of UIDs to check their availability. + Returns a tuple with a list of successful UIDs and a list of failed UIDs. + """ + try: + response = await self.dendrite( + self.metagraph.axons[uid], + bt.Synapse(), + deserialize=False, + timeout=5, + ) + + return response.dendrite.status_code == 200 + except Exception as e: + bt.logging.error(f"Dendrite ping failed: {e}") + + return False + + async def ping_uids(self, uids): """ Ping a list of UIDs to check their availability. @@ -220,4 +240,52 @@ async def ping_and_retry_uids( f"Insufficient successful UIDs for k: {k} Success UIDs {successful_uids} Failed UIDs: {failed_uids}" ) - return list(successful_uids)[:k], failed_uids \ No newline at end of file + return list(successful_uids)[:k], failed_uids + + +async def get_next_uids(self, ss58_address: str, k: int = 4): + # Get the list of uids already selected + uids_already_selected = await get_selected_miners(self, ss58_address) + bt.logging.debug(f"get_next_uids() uids already selected: {uids_already_selected}") + + # Get the list of available uids + uids = await get_available_query_miners(self, k=k, exclude=uids_already_selected) + bt.logging.debug(f"get_next_uids() uids:{uids}") + + # Get the k uids requested + uids_selected = list(set(uids) - set(uids_already_selected)) + + # If no uids available we start again + if len(uids_selected) < k: + uids_already_selected = [] + + # Complete the selection with k - len(uids_selected) elements + # We always to have k miners selected + new_uids_selected = await get_available_query_miners(self, k=k) + uids_selected = uids_selected + new_uids_selected[:k - len(uids_selected)] + + bt.logging.debug(f"get_next_uids() uids selected: {uids_selected}") + + # Store the new selection in the database + selection_key = f"selection:{ss58_address}" + selection = ",".join(str(uid) for uid in uids_already_selected + uids_selected) + await self.database.set(selection_key, selection) + bt.logging.debug(f"get_next_uids() new uids selection stored: {selection}") + + return uids_selected + + +async def get_selected_miners(self, ss58_address: str): + selection_key = f"selection:{ss58_address}" + + # Get the uids selection + value = await self.database.get(selection_key) + if value is None: + bt.logging.debug(f"get_selected_miners() no uids") + return [] + + # Get the uids already selected + uids_str = value.decode("utf-8") if isinstance(value, bytes) else value + uids = [int(uid) for uid in uids_str.split(",")] + + return uids