Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we limit what's returned from the /data/edges API? #821

Closed
Sophira opened this issue Jul 15, 2014 · 15 comments
Closed

Should we limit what's returned from the /data/edges API? #821

Sophira opened this issue Jul 15, 2014 · 15 comments

Comments

@Sophira
Copy link
Contributor

Sophira commented Jul 15, 2014

I'm developing a suite of tools (for my own use only as of right now) to use the data APIs to get data. For some reason I decided it would be a good idea to try it on dw_news. I'm not entirely sure why, but there you go.

I don't have verbose output yet for exactly what stages took the most time, but as a whole, the fetching and processing took about a minute. I looked in my database afterwards to see how many edges had been entered:

mysql> select type, count(*) from account_edges where from_id = 5 or to_id = 5 group by type;
+--------+----------+
| type   | count(*) |
+--------+----------+
| watch  |    50000 |
| member |        4 |
| admin  |        2 |
+--------+----------+
3 rows in set (0.13 sec)

Yep, that's quite a lot.

Do we want to limit the amount of data that's sent from the APIs? I don't want to run the query again without permission, but I'm guessing that the biggest bottleneck would have been the downstream bandwidth. For my server that's about 1MB/s, meaning that if the bandwidth is indeed the limiting factor, then the implications are rather worrying.

May I make this query one more time and save the data to a file locally so that I can test things without having get the data over the network again?

@Sophira
Copy link
Contributor Author

Sophira commented Jul 15, 2014

Going by the data I have saved from my LJ version of these tools, LJ probably limits fdata.bml output to 5000 total edges. We, on the other hand, seem to be limiting to 50000 for each individual edge.

@rahaeli
Copy link
Contributor

rahaeli commented Jul 15, 2014

Limit of 5k should be good. Or, not return edges for accounts that have the thing that forces an empty friends list turned on, which i'm blanking on what it is ...

On Jul 15, 2014, at 2:50 PM, Sophira notifications@github.com wrote:

Going by the data I have saved from my LJ version of these tools, LJ probably limits fdata.bml output to 5000 total edges. We, on the other hand, seem to be limiting to 50000 for each individual edge.


Reply to this email directly or view it on GitHub.

@Sophira
Copy link
Contributor Author

Sophira commented Jul 15, 2014

I just checked the dw_news edges again to get more accurate data. It seemed to be quicker this time, possibly because data had been loaded into memcache from the previous run. Total time getting the data was 22.584s, of which only about 2s of that was actual data transfer, meaning that about 20 seconds was spent getting the data on the server end. The actual data transfer was also not as bad as I had thought, but still a lot (2.4MB) and worth worrying about.

I'll take a look to see what the empty friends list option is. It's probably still worth doing the 5000 limit too, though!

@afuna
Copy link
Member

afuna commented Jul 22, 2014

checking -- did you mean to pick this up or did you want to leave it someone else? And if you're picking this up, I'd be happy to accept it straight into the release branch.

@Sophira
Copy link
Contributor Author

Sophira commented Jul 22, 2014

Oh, you're right, I did mean to pick it up. Claimed!

[edit: Hmm. Did I do that wrongly?]

@zorkian
Copy link
Member

zorkian commented Jul 22, 2014

Er, this should respect %LJ::FORCE_EMPTY_SUBSCRIPTIONS which excludes the news account (and a few others) from returning this data in the profile page. This should also apply to any APIs we have: you should not be able to fetch this data for these blacklisted accounts.

Adding a limit otherwise is fine and generally good, too, but at least make it respect the blacklist.

@rahaeli
Copy link
Contributor

rahaeli commented Jul 22, 2014

that was the thing i was trying to remember!

@Sophira
Copy link
Contributor Author

Sophira commented Jul 22, 2014

Yeah, that's what I've been doing.

Sophira added a commit to Sophira/dw-free that referenced this issue Jul 22, 2014
Data obtained by /data/edges can be quite large; limit the data to 5000
edges for most edge types. Also, respect the
%LJ::FORCE_EMPTY_SUBSCRIPTIONS configuration option.

This patch also fixes a potential race condition and adds functionality
to the member_userids and member_of_userids functions to request a limit
to the number of userids returned.
@afuna afuna removed the type: issue label Jul 30, 2014
@zorkian
Copy link
Member

zorkian commented Jan 26, 2018

cc @momijizukamori for your API stuff as a consideration. Since this hasn't cropped up as A Problem in 4 years I'm going to close it wrt our current APIs, but the new stuff should definitely make sure we have limits in them (at least planned, not saying we need them to begin with).

@zorkian zorkian closed this as completed Jan 26, 2018
@Sophira
Copy link
Contributor Author

Sophira commented Jan 27, 2018

Out of curiosity, was there something wrong with my patch?

@zorkian
Copy link
Member

zorkian commented Jan 27, 2018

What patch? This is an issue, not a PR?

@alierak
Copy link
Member

alierak commented Jan 27, 2018

Look up in the thread, there's a referencing commit in her repo, not a PR.

@zorkian
Copy link
Member

zorkian commented Jan 27, 2018

I don't review every random commit on every random fork. If someone wants a review, they need to submit a PR.

@Sophira
Copy link
Contributor Author

Sophira commented Jan 27, 2018

Yeah, I apologise, sorry. I thought I had made a PR. I can make one now if wanted. I apologise.

@Sophira
Copy link
Contributor Author

Sophira commented Jan 27, 2018

Oh, I had (#856), and there were changes requested. Sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

6 participants