-
Notifications
You must be signed in to change notification settings - Fork 10
servers.txt server "Claiming" #4
Comments
good idea. ill look at it when i have some time |
got the changes up here |
I like it but we probably need a way to choose just a few servers. Instead of spanning the whole list. |
yeah, that'd probably be a good plan. might try something like having it pick random ones, then add them to a 'selected.txt' if one hasnt already been generated. |
I think the best case would be to pick different IPs based on the city... it wouldnt make sense to just scan Moscow IPs? ... here are the top 5 cities the IPs i loaded are from .... i can add more detail to another file where we can iterate through random IPs from the 5 cities... Moscow |
I have a rudimentary script (legit just a while loop) that ive been using and just using |
I wonder if it possible to have a public document that the script could reach out to and grab an IP that has the least allocations. Load balancing in principal. |
its possible, assuming we wont ever use all of the IP's, just pull/push IP's in use from something in a cloud, or just add a check in/out system and you'd need to prove that youre actually using it... |
could use curl w/ php & a sql db on a vps or something. initially the script would curl & request ip's and the server would return the ones w/ least amount of users. then every loop(could use timestamps to make sure it doesnt run too often and overload the server ie only run if time is greater than..) it would send the list of ips its using along w/ an id. if the server doesnt get a POST w/ that id containing certian ip for x time, it sets the user counter back by 1? so we'd need each script to have a unique id like a hash or something.. could generate a hash from /dev/urandom input, then save to id.txt or something so it keeps the same id? but we could have the problem of the server getting DDOS'd idk either way it would be hard to actually prove ip's are being used |
Yeah, security is always going to be an issue. If we restrict within a login, then those have to be managed, and down the rabbit hole we go. Maybe we can just have another thread where we just respond with IPs we are working on and ask for everyone to just grab randomly. For example, I just went and grabbed like 2500-2600 on the list. |
I hate to say it, but decentralization is key, and the only place I know that this is in blockchain..... |
Wait cant we just use one of those real time document editing apps that runs over IPFS? that way a bunch of people can all claim IPs at the same time and only those with the IPNS or hash will have access |
Also could make a link editable google spreadsheet. gross. |
That would be too close to being centralized I thought |
The IPFS idea isn't the spreadsheet is. But honestly it's just a list of IPs its kinda hard for anyone to submit a takedown reason to google. That and google isn't on too great terms with Russia. |
added option for user to choose how many random servers to put in selected_servers.txt, heres the merge ready branch. |
just saw morrowc's pull request uses i did come across an somewhat minor issue with the ITER variable that is present in both @morrowc 's and my code |
for your filesystem problem(s) I think you may consider something like: this gives quite a spread on directories to fill with files, is easy to create on-demand (loop of loop of loop) and spreads your files created over 363636 possible end directories. you can mechanically create the path in the working while-loop easily as well, using something like: to get 'random' enough data to build the directory / path. expanding a bit: you get a new directory to put on the file each run... |
@morrowc I think that it would be better to have it be time based (unix or a more standard year-month-day-hour) so you know what you're looking at and you can debug issues etc (remember the human!) |
humans are fallible, depend on machines for all of this. |
_checkPath() { Creates all paths required in the working directory.for one in {a..z} $(seq 0 9); do makes the directories properly. |
PR #17 has the above change and the random directory bits. |
i think i got a fix for the ITER & COMP_ITER variables!
only con(i can think of) is if people want a fresh start they need to delete the save files @LogoiLab if you dont want the entire iteration mumbo jumbo, i can make pull request w/ just this, so we have persisting |
So this is solved by PR#15 and PR#16? we can close this since we merged? |
i guess claiming serversthe problem: we wanted to make sure every server gets traced or as close to it as possible. claiming servers problem solvedNOTE:(every time its random, so if you stop the script then start it again, you might have completely different servers)
|
Perhaps the question to ask here is: "What is the ITER/COMP_ITER supposed to provide you?" If you want to make sure the traceroute data has a known time sequence, then add the time (unix timestamp for instance) to the filename. The actual number of times you've been over any particular IP (the iteration number, or number of iterations) is not important in the filename, when you can: or similar... Or, really: "Parse the files into a database, deal with the stats from there"
The problem is not clobbering, it's that very few filesystems behave well when you put large numbers of files into the directory. Flat file systems are never a good idea if you need to scan them later (tar them, list them, stat all the files, etc). All systems that generate lots (or could generate lots) of files split the files out over many, many (this exact same hashed mechanism) directories, because the performance of the filesystem degrades significantly as the number of files in the directory grows large.
the empty directories aren't important, you care about the files, you'll iterate over the files in the filesystem and pull data from them as required. If you want less hash/splay, then just make 2 levels not 3, but really that isn't important here. What is important is not killing your system when you fill the directory with 40k files/etc.
yes, filenames are immaterial, save (perhaps) the timestamp.
|
i had a whole right up, nicely formatted.. i would recommend testing and debugging before you PR tho |
I keep on saying that the filestructure should be human-readable. Even if no one should ever read it theres no point to need to randomize everything, because at somepoint SOMEONE will want to read it for whatever reason, and its not like it hurts us to do this |
At Fri, 15 Feb 2019 15:36:00 +0000 (UTC),
gidoBOSSftw5731 <notifications@github.com> wrote:
[1 <text/plain; UTF-8 (7bit)>] I keep on saying that the
filestructure should be human-readable. Even if no one should ever
read it theres no point to need to randomize everything, because at
somepoint SOMEONE will want to read it for whatever reason, and its
not like it hurts us to do this
that is wrong headed.
There are machines for this, and tooling to find files.
|
saw this mentioned on reddit by /u/turn-down-for-what
what if we had the script iterate through the servers?
the structure would look something like
and for servers.txt - it'll only use lines that dont have a '#' in them, so you can add comments(ie, the pools and explaination (and the # can be anywhere in the line))
i get that it wouldnt be as many traces per minute per sever, but it might get a better overall image?
and if we had enough people running it, i feel like itd have pretty good coverage.
pros:
cons:
ive got it implemented and it seems to be working fine, ill get it up on my fork and you can see if you like it, if so i can submit a pull request
might be worth only having some people run it this way? idk
The text was updated successfully, but these errors were encountered: