Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements #232

Closed
lorenzopicoli opened this issue Apr 17, 2023 · 15 comments
Closed

Performance improvements #232

lorenzopicoli opened this issue Apr 17, 2023 · 15 comments

Comments

@lorenzopicoli
Copy link

Hello, thank you so much for the project!

I'm running the server for me and a few friends. I've noticed that the performance when checking a large time period is not really amazing.
I did an extended history import with 144189 songs. 4 other friends did the same with the similar amounts on average. If I check my yearly stats it takes several seconds to even load the page while my machine is using 100% of its CPU. For all time it takes even longer than that, some times even more than a minute. I can see that this is the result of very long queries.

I've tried running it both on a raspberry pi 4 and on a Hetzner instance with similar results.

Queries like these are very slow to run:

db.collection_name.aggregate([
  { 
    $match: { 
      owner: ObjectId("6418b3970362c4b75eb3aa2c"), 
      blacklistedBy: { $exists: 0 }, 
      played_at: { 
        $gt: ISODate("2015-01-11T20:01:15.000Z"), 
        $lt: ISODate("2023-04-17T04:12:12.007Z") 
      } 
    } 
  }, 
  { 
    $project: { 
      year: { $year: { date: "$played_at", timezone: "America/Toronto" } }, 
      month: { $month: { date: "$played_at", timezone: "America/Toronto" } }, 
      day: { $dayOfMonth: { date: "$played_at", timezone: "America/Toronto" } }, 
      week: { $week: { date: "$played_at", timezone: "America/Toronto" } }, 
      hour: { $hour: { date: "$played_at", timezone: "America/Toronto" } }, 
      id: 1 
    } 
  }, 
  { $lookup: { from: "tracks", localField: "id", foreignField: "id", as: "track" } }, 
  { $unwind: "$track" }, 
  { $group: { _id: { year: "$year" }, count: { $sum: "$track.duration_ms" } } }, 
  { $sort: { "_id.year": 1 } }
])

I'll probably take a look into these issues myself at some point, but I'm not super comfortable with NoSQL databases so I'm not sure I'll get good results. Are there any plans from the maintainer to tackle these performance issues?

@Yooooomi
Copy link
Owner

Hey! I'm sorry you're having performance issues. The query you use as an example can't get much smaller.
The statistics are built on demand that's why some require a bit more compute.
Which version of mongo are you using?

@lorenzopicoli
Copy link
Author

Yeah I can come back at some point this week with some better benchmark on what queries are taking long. I'm not too familiar with NoSQL but maybe there are some indexes (or equivalents?) that we could add to improve these checks?

I'm still on version 4. I saw that it was upgraded to 6 in the past few days. I'll give it a try and see if magically fixes it hahaha

@Yooooomi
Copy link
Owner

Since upgrading is not officially documented, you should follow the comment at #223 (comment)

@HeyBanditoz
Copy link

Is this something that could be improved if we just treat the data as relational? And use a SQL database instead, like Postgres.

@lorenzopicoli
Copy link
Author

I've upgraded to mongo 6 and it's still very slow when loading all data. It does run faster because I can't run mongo on my raspberry pi so I've moved to a better machine.

I love this project and would love to help contribute whenever I get some time. Would moving away from NoSQL be something you'd be open to @Yooooomi?

@Yooooomi
Copy link
Owner

Hello. I'm glad mongo 6 is a bit more performant.
As for the sql question, I actually have no idea how it would perform. I feel like mongo might not be the best choice there but I don't know what could be better. I chose mongo years ago when I had a little bit less knowledge. It allows very easy data manipulation and construction. I'm afraid that with Postgres things would not be easy at all.
Things are to be continued for sure but this will probably be a big deal.

@HeyBanditoz
Copy link

HeyBanditoz commented Apr 22, 2023

Maybe indexes would work? It seems mongodb supports them: https://www.mongodb.com/docs/manual/indexes/
Would you add the db scripts to the migrations folder to add them?

Would definitely be an easier option than shifting entire DBMSes

@Yooooomi
Copy link
Owner

I am already using indexes. If I was not queries would take several minutes instead of 5 seconds. I'll investigate a bit on the subject.

@Karakazz
Copy link

I just upgraded to mongodb 6.0 and performance improved by a lot.
I had to run the follwing inside the mongo container after upgrading to 5.0.16:
db.adminCommand( { setFeatureCompatibilityVersion: "6.0" } )

After that I was able to upgrade the db with no issues to 6.0

@Yooooomi
Copy link
Owner

Upgrade is done automatically, you didn't need to do the command with the last nightly.

@lorenzopicoli
Copy link
Author

After a week running on mongo v6 I can confirm that there's some performance improvement. Mongo CPU usage is still really high and requires a good host to be able to load all time stats, but it's usable for now

@Karakazz
Copy link

Karakazz commented Apr 28, 2023

Upgrade is done automatically, you didn't need to do the command with the last nightly.

I'm currently utilizing a version that has undergone some significant modifications and is based on a previous release, which didn't update automatically. I tried it though! :D

The performance has significantly improved after a week of testing. Currently, my "account" contains 173k songs out of a total of approximately half a million songs, and loading "All" on "All Stats" now only takes around 20 seconds, as compared to a couple of minutes previously.

@filcuk
Copy link

filcuk commented Jul 27, 2023

Also consider SQLite as an option.
It's often greatly underestimated and should be simple to implement.
Even Wordpress is switching to it.

@Yooooomi
Copy link
Owner

Hello,
I think SQLite is nowhere near the performances of MongoDB for this kind of treatment. Wordpress does not switch to it but allows websites created in wordpress to use it which is a huge difference.
I tried to migrate to postgresql in the past, but I feel like nosql is better for the moment. I am not myself a pro considering databases, but I feel like I need a database built for realtime computing and aggregation of data, which is not the case of postresql nor sqlite.

NB: all of this should be possible (certainly is) using a postgresql database, but boi I don't want to write those 200 lines queries.

@quentinguidee
Copy link
Contributor

quentinguidee commented Mar 2, 2024

Fixed by #348, #354 and #355 (release 1.8.0)

@Yooooomi Yooooomi closed this as completed Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants