Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update user listen history query to use new indexed table #2185

Merged
merged 9 commits into from
Jan 21, 2022

Conversation

isaacsolo
Copy link
Contributor

@isaacsolo isaacsolo commented Dec 21, 2021

Description

Query user_listening_history table directly instead of plays. This will enable pruning old plays. The only difference between the response should be deduplication.

Tests

Tested against prod snapshot. Confirmed de-duplication.

Remote box pointed to prod snapshot can take from 2 seconds to 30s. About the same as master pointed to prod snapshot.. This approach avoids querying plays directly but does query track. Not sure if there's a better way.

How will this change be monitored?

Manually checking /history.

@@ -371,7 +371,6 @@ def format_offset(args, max_offset=MAX_LIMIT):
def get_default_max(value, default, max=None):
if not isinstance(value, int):
return default
elif max is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think new linter asked for this change.

# bundle peripheral info into track results
tracks = populate_track_metadata(session, track_ids, tracks, current_user_id)

if args.get("with_users", False):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_users is always true and user data is already populated due to model relationship.

@isaacsolo isaacsolo marked this pull request as ready for review December 21, 2021 22:21
Copy link
Member

@raymondjacobson raymondjacobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we miss a step where we need to migrate existing user history into this table?

tracks = helpers.query_result_to_list(sorted_track_results)

# bundle peripheral info into track results
tracks = populate_track_metadata(session, track_ids, tracks, current_user_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can shave a round trip to the DB by just having populate_track_metadata grab the raw track themselves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah I'll see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't see a way we can avoid querying for track data. Do you see a potential optimization?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i think you're right here. In the future, probably we could rewrite populate_track_metadata to be join based rather than 2 round trips

"with_users": True,
}
track_history = get_track_history(get_tracks_args)
get_tracks_args = GetUserListeningHistoryArgs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while I agree with your change (that we should be using current user id), that's not how this worked before and it makes this endpoint semantically kind of weird because you could specify user_id and it effectively does nothing.

can we keep that existing behavior and pass both current_user_id and user_id through to the queries file?

To get the behavior you added, we could just do
if (user_id != current_user_id) return [] here

Copy link
Contributor Author

@isaacsolo isaacsolo Jan 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup will do 👍

@isaacsolo
Copy link
Contributor Author

Did we miss a step where we need to migrate existing user history into this table?

Like this? https://github.com/AudiusProject/audius-protocol/pull/2133/files

It's not in prod yet so I'm gonna wait until it's populated before merging this.

tracks = helpers.query_result_to_list(sorted_track_results)

# bundle peripheral info into track results
tracks = populate_track_metadata(session, track_ids, tracks, current_user_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i think you're right here. In the future, probably we could rewrite populate_track_metadata to be join based rather than 2 round trips

@isaacsolo isaacsolo merged commit f13f1dd into master Jan 21, 2022
@isaacsolo isaacsolo deleted the is-listen-history-api branch January 21, 2022 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants