Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow collection of server statistical data #196

Merged
merged 12 commits into from Jul 28, 2018

Conversation

albertosottile
Copy link
Member

@albertosottile albertosottile commented Jul 25, 2018

Because of the recent increase in the update rate of Syncplay, we would like to have some data on our currently user base.

This PR introduces a new server argument --stats-db-file [file] that creates a SQLite database in [file] and stores some anonymous information in it.

We only store non-sensitive and non-personal data that are already being transmitted by the clients in the current protocol. Specifically, for every user in the server at the time of stats collection, we store (EDIT):

  • server's Unix timestamp at the collection time
  • user's Syncplay client version

Data collection is scheduled to happen every hour. Nothing else is acquired nor stored in the snapshot.

Thanks to SQLite, full concurrency is supported and multiple servers can use the same DB file at the same time. For further safety, a fixed delay is introduced before the first collection, and its amount is based on the server's port number. (EDIT): twisted.enterprise.adbapi is employed to ensure asynchronous consistency.

@@ -445,7 +475,12 @@ def getName(self):
return self._name

def getVersion(self):
return self._connector.getVersion()
pattern = r'\A[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}\Z'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to do that since it's prepared statement anyway you can save anything pretty much (there could be length check at most, but rather that should be connector's job)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this can be removed and also that a check, if any, should be done in _connector. I'll remove this.

if statsDbFile is not None:
self.statsDbHandle = self._connectToStatsDb(statsDbFile)
logDelay = 5*(int(self.port)%10 + 1)
reactor.callLater(logDelay, self._scheduleClientSnapshot, self.statsDbHandle, self.port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of scheduling this instead of running the code on user join?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is for it to occur at set intervals from being started to provide a snapshot, and to have a delay to avoid multiple servers all writing at exactly the same time if they are started at the same time.

def _connectToStatsDb(self, dbPath):
conn = sqlite3.connect(dbPath)
c = conn.cursor()
c.execute('create table if not exists clients_snapshots (snapshot_time integer, port integer, version string, room_index integer, play_status integer)')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice if all the DB stuff:

  • connection
  • schema creating
  • queries

were put into a separate class (service)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try. Should I keep this new class in server.py or move it to a new file and module? I would advise for the former since all this is server-related only.

playStatus = room.isPlaying()
for watcher in room.getWatchers():
content = (snapshotTime, int(portNumber), watcher.getVersion(), idx, playStatus, )
c.execute("INSERT INTO clients_snapshots VALUES (?, ?, ?, ?, ?)", content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of saving room state?
Room index is non-deterministic as well and can change within multiple executions on the same dictionary (unless python 3.7).
You'll log multiple users multiple times without any real way to distinguish between them.
Also loging the port makes very litte sense and you should have a separate file for every instance you run (https://www.sqlite.org/faq.html#q5 http://howfuckedismydatabase.com/sqlite/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of room-idx just store the room-name hash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write to single SQLite database with multiple processes - you just need to handle SQLITE_BUSY and wait for other inserts to finish.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, sqlite3 handles SQLITE_BUSY by waiting for 5 seconds and then retrying. That's why a port-based fixed delay was introduced. Having a single SQLite database makes handling and analyses much easier, in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Room index can be used to get a sense of the typical size of a session, which can help inform development/design decisions (e.g. default size of 'list of who is watching what' section of client)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If encountering SQLITE_BUSY has potential delay of 5 seconds, won't that risk stalling entire server process for 5 second during each iteration of this loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that only sqlite3.commit locked the database, but probably I was wrong. In any case, I'll try to replace this loop of execute with a list and a single sqlite3.executescript at the end of the loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further discussion, @Et0h and I decided to get rid of room_index and play_status for now. The only recorded stats will be the current timestamp, the client's version, and the server port, needed to understand server's load while keeping only one database file.

@@ -185,6 +194,17 @@ def setPlaylistIndex(self, watcher, index):
else:
watcher.setPlaylistIndex(room.getName(), room.getPlaylistIndex())

def _connectToStatsDb(self, dbPath):
conn = sqlite3.connect(dbPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you should implement non-blocking db connection:
https://howto.lintel.in/asynchronous-db-operations-twisted/
(haven't tested)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that but, quoting the FAQ you linked before, "SQLite allows multiple processes to have the database file open at once". So, I do not think we should rely on twisted for this.

@@ -185,6 +192,45 @@ def setPlaylistIndex(self, watcher, index):
else:
watcher.setPlaylistIndex(room.getName(), room.getPlaylistIndex())

class StatsRecorder(object):
def __init__(self):
self._dbPool = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking: I'd name this _connection

if self._dbPool is not None:
self._dbPool.close()

def startRecorder(self, dbpath, roomManager, delay):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking I'd move dbpath, roomManager into __init__

print("--- Error in initializing the stats database. Server Stats not enabled. ---")

def _initDatabase(self):
dbpool = adbapi.ConnectionPool("sqlite3", self._dbPath, check_same_thread=False)
Copy link
Contributor

@Uriziel Uriziel Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea for separate class was to cover "Single Responsibility Principle" which moves all stuff related to sqlite to a separate class, thus making abstraction layer between business code (which is "log this and that data") and app logic (which is "use database to store data").
So something like this:

class Database(NameForChange):
    __init__(dbPath): pass
    connect(): pass
   addVersionLog(version, time): pass
    _createSchema(): pass

which later on can be extended by adding function like fetchUser(username) etc. that all relates to fetching data from data layer (i.e. sqlite)

Copy link
Contributor

@Uriziel Uriziel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep up good work

@Uriziel
Copy link
Contributor

Uriziel commented Jul 26, 2018

Reposting my last comment as it got squashed by one of the changes (my bad.)

My idea for separate class was to cover "Single Responsibility Principle" which moves all stuff related to sqlite to a separate class, thus making abstraction layer between business code (which is "log this and that data") and app logic (which is "use database to store data").
So something like this (all the names were chosen at random without a second thought given to them):

class Database(NameForChange):
    __init__(dbPath): pass
    connect(): pass
   addVersionLog(version, time): pass
    _createSchema(): pass

which later on can be extended by adding function like fetchUser(username) etc. that all relates to fetching data from data layer (i.e. sqlite)

@Et0h Et0h merged commit b3a7ed8 into Syncplay:master Jul 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants