Skip to content
community resource for tennis research topics, ideas, and resources
Find file
Latest commit 5fdccd1 @JeffSackmann Merge pull request #2 from carl-bialik/patch-1
Update README.md
Failed to load latest commit information.
README.md Update README.md

README.md

Tennis Research Notes

A community resource for tennis research topics, ideas, and resources

About: I have more tennis research questions and ideas than I have time to pursue. (I suspect I'm not the only one.) People often ask me for suggestions for research topics. Maybe we can address both issues at once.

How this works: I'm going to share some of my notes, in the form of research topics and questions. If there is published work on the question, I may include a link or citation. Depending on how long the list gets, I may make some attempt to organize by category, maybe even splitting into multiple files. We'll see.

How you can help:

  • Add a topic/question
  • Add a link or citation to one of the existing topics or questions
  • Research something on the list. (And if you do so, add a link to your work.)

If you want to add something, please submit a pull request. As long as your submission is on topic (tennis analytics), doesn't duplicate something already on the list, and isn't overtly self-promotional or obnoxious, I'll accept it.

(These are numbered for convenience, not to imply any kind of ranking.)

Tennis Research Topics

  1. Are serve-and-volleys streakier than others?

  2. How often do betting odds favor the lower-ranked player? Is it more frequent on clay? Is it rare enough that using higher ranking as a proxy for market favorite is acceptable?

  3. Are breaks more frequent immediately after winning a set?

  4. Which players have the greatest variation in serve speed? (Among first serves, or between first and second serves.) Does it have any effect on, say, the success of second serves?

  5. Does winning a very long game (e.g. a game with 6+ deuces) influence the outcome of the next game, or represent some more general momentum shift?

  6. Can Elo ratings be improved to take surface into account?

  7. Can Elo ratings be improved to take missed time (due to injury, etc) into account?

  8. How much does Elo inflation affect ATP/WTA ratings? Can this be addressed for better era comparisons?

  9. How to integrate challenger (and other lower-level) results into Elo ratings?

  10. Do big servers (Isner/Karlovic) win more return points in tiebreaks (or when close in sets) than otherwise? Might be a proxy for whether they're tanking return games.

  11. Any way to use MCP data -- rally length, shot direction -- to come up with approximate distance run, by combining MCP data with distance run when distance is available?

  12. Who are the best "frontrunners?" Win% when up a break ... when up a set.

  13. Serve + 1 tactic -- who does it the most often? Success rates when attempting it (i.e. winners vs UFEs on third shot) ... relationship to other stats, more general success rate.

  14. Are there players who get a lot of service winners (or unreturned serves in general) but not a lot aces? Or are aces a reliable proxy for unreturned serves in general?

  15. When serving for set -- does failing to do so swing momentum in the other direction, esp if fail at 6-5 and then play tiebreak?

  16. "life change effect" for late first-time slam winners (like Wawrinka) -- do they play worse than expected for some period after the slam win?

  17. Can we get a more rounded perspective on surface speed using MCP data? (aces + service winners, serve+1, rally length)

  18. Which players best disguise serve direction? (Maybe quantify by comparing ace % to serve speed; best disguises are players with more aces than expected at any given speed.)

  19. What's the difference in success rate immediately after reaching tour level for older players vs younger players?

  20. which players are better/worse on more volatile points?

  21. effect of serve speeds -- slow serving doesn’t help, but how much does it hurt? is there a magic number for men? women? which players win the most service points despite slow serving, and vice versa? [see donald young, decent data from 2011 usopen]

  22. Dominance, as measured by % of available ranking points, or % of ranking points of entire top 100. Could also be used to measure best #2, #3, or best top 2, top 5, etc.

  23. Prob of comeback from one set down -- who is good/bad at coming back, good/bad at preventing it?

  24. Probability of withdrawals/retirements--more likely for underdogs, etc?

  25. Players who excel against higher-ranked opponents, less successful against lower-ranked opps.

  26. Is it increasing difficult to hold deeper into a set? Into a match?

  27. Are players really more likely to break after being broken themselves?

  28. Are players using challenges wisely? (use leverage index on challenge situations)

  29. Based on LEV, do double faults occur at crucial moments? Does this vary player to player?

  30. Win prob effect of missing first serve

  31. Hot hand, using point-by-point:

  32. aces
  33. double faults
  34. first serves
  35. points won
  36. points won in each court

  37. Deuce/ad court differences

  38. certain players with stronger differences?
  39. lefty/righty differences?
  40. ace / 1st sv% differences?

  41. AGING

  42. What is typical peak?
  43. Does peak vary with (a) player type (big servers vs. counterpunchers), (b) country of origin, (c) college experience / age of going pro
  44. If you break into top 100 (200? etc.) by age 18/19/20/21/etc., what is probability of reaching top 10/etc.?
  45. Of all players who reached top 10 (/5/1/etc.), how early/late did they break into top (200/100/50/etc.)
  46. Aging delayed for injury (less time on court)?
  47. Aging delayed for clay-courters?
  48. adjustments for height/weight? (may overlap with player type)

  49. Comparing scoring systems

  50. maximize average leverage index
  51. also maximize p(better player wins)
  52. consider time management aspect

  53. Prize money

  54. more/less predictive than ranking points?
  55. theory that players want to maximize points and/or prize money, means ranking points are consistently predictive despite quirks/changes in system?

  56. Quantifying volatility in rankings -- particular levels (mid-100) where rankings are particularly meaningless/volatile?

  57. Challenges

  58. Are players more likely to win the point after a successful challenge than they are for all other points in match with same serving situation?
  59. How about after unsuccessful. i.e. any carryover psychologically. I ask because I know of researchers looking at carryover effects from points in which a player wins with lucky net cords.
  60. If we have names of umps for each match, or court, maybe check if number of successful challenges is tied to particular umps (bad eyes, or hesitant to overrule) or courts (maybe some harder to see or call).

  61. Certain players less successful after first matchup with a given player?

  62. Or, is there an initial surge into the highest level, like Raonic or Dolgopolov?
  63. theory being, players get familiar with your game, they learn how to beat it

  64. Age distribution at different tournaments, particularly Challenger vs ATP

  65. Lefties

  66. prevalence over time
  67. most/least successful against them?
  68. tactics that are more/less successful?
  69. different aging patterns?
  70. more common player types or surface preferences among lefties?
  71. RH/LH doubles combos more successful?

  72. Lefty tactics -- are they different than RH tactics? Do they play differently vs other lefties? In what ways is Nadal a typical lefty?

  73. Clutch: using leverage index, see who does the best in high-leverage points

  74. when 2 players meet, and one wins x% of points on serve, while opponent’s opponents generally win y% of service points, what is result? It's probably not exactly (x+y)/2, and it depends if x or y is exactly the average for service-points won -- if it is, then the other should dominate the result. Can extend this to other stats -- ace percentage, double-fault rate, and, where available, winner and unforced-error rate. And maybe even if enough data, get more sophisticated -- does extent to which result falls between x and y depend on surface? On home court? On experience levels/age/ranking of both players? On how many times they’ve played?

  75. which players choose tournaments most strategically, playing in ones with easiest fields relative to points/prize money on offer? (and relatedly, which tournaments are best to play in, in those terms)

  76. relatedly which players optimize their wins most/least, for most bang for buck i.e. clustering wins at one tournament for more ranking points/prize money vs. a guy who always wins in first round, loses in second

  77. What would be the most exciting format, if tennis were to revamp its score system (think world team tennis)? E.g. How would series of best of 9 supertiebreaks work instead of sets? You’d probably want a system that maximizes expected median leverage over points, or 25th percentile leverage; while also delivering the better player the win some high enough percentage of the time?

  78. Relatedly, compare average leverage/volatility to other sports during average match

  79. Tweak the doubles ranking to take into account singles ranking. how much does one affect the other? What is optimal use of singles ranking to make dubs ranking more predictive, and vice versa?

  80. Separate the home-court advantage among courts that do and don’t have a challenge system

  81. Which tourneys have the most upsets, or most surprising results?

  82. Whether players get better at returning tough servers after the first or second time they play, or in second or third set

  83. betting lines, what influences them (some sort of regression model on rankings and other factors), any wacky ones that stand out and might be signs of funny business, or perhaps just low volume

  84. injury patterns, how they vary based on playing style, and who are biggest outliers

  85. guys who have the biggest variation between sets (bageled in one set, bagels opponent in next)

  86. Difficulty of winning masters 1000 vs. major

  87. time per point, game, set, match on various surfaces

  88. Does time per point decrease as careers go on (and guys shorten points to reduce stress on their bodies)

  89. controlling for everything else, once past first round of tourney is it better to have had a bye or not? [not sure how to control for everything else, because top eight seeds usually are ones with byes. Maybe compare how No. 9-16 seeds to at IW and Miami vs. at other Masters?]

  90. what's more predictive of success at majors, other tourneys on that surface; or other majors, regardless of surface; or most recent results

  91. break rates in doubles w or w/o no-ad scoring

  92. upset rates w/ no ad scoring, 10-pt tiebreak

  93. players who fade during matches, or get better during matches, in terms of score difference over successive sets

  94. which tournaments play most like clay, grass, how can we define each? particularly useful for categorizing various hard courts [what stats characterize them, in terms of length of points, ace rate, break rate]

  95. does home court matter more in decisive sets (third in best of 3, fifth in best of 5)?

  96. analyze tour schedule for average weather at outdoor locations -- is sked optimized to avoid extreme heat or cold and precipitation?

  97. categorize players into different styles by length of rallies, ace rate, aced rate, break rate, broken rate, and perhaps height/weight. then see how various groups match up -- is there some predictive power, above and beyond your rankings and H2H, by using player type?

  98. also based on categorizing players, is there a correlation between player type (or spot on aggressiveness spectrum) and consistency/inconsistency?

  99. are there fewer upsets among men in general than women? Perhaps rankings matter less for women (i.e. Serena)?

  100. percentage of sets that are 6-1 that have 2 breaks vs. 3, and same for 6-3 sets (1 break or 2)

  101. how close are games after a player goes up 1 or 2 breaks in a set than they were earlier in set? do players kinda tank the set, particularly if it doesn’t mean they lose the match

  102. who clusters return points most/least, is there overall clustering beyond what you’d expect from randomness (which would suggest returners are saving energy when down 30-0 or 40-0, or that servers are streaky with their serves)?

  103. How common are breaks at love, who has the most such breaks and the most times broken at love?

  104. deciding points percentages in doubles -- how often does return team win, and how does that compare to the same teams’ success overall in return games

  105. return points won for each player in doubles team, how that corresponds to who takes deciding pts -- is it the guy who usually wins more return point who returns, the one who is winning more that day, or is it random or qualitative who is chosen?

  106. hold percentages for first vs. second server in doubles

  107. mixed doubles: men/women hold percentages

  108. percentage of time server is broken when reaches deuce, man vs. woman, compare different players

  109. is there more (or less) than expected clustering of aces, double faults, winners, unforced errors, points won during matches

  110. hold percentages with new balls, with oldest balls

  111. how often after saving break points a player breaks in next game

  112. aces, double faults on break point, set point, match point, etc., do they differ from rates in rest of match for same match

  113. percentage of match points won on serve, on return -- and percentage of time loser held match points, and that holder of match points loses (can derive second from first)

  114. how often servers are broken in deciding game of set, is it more or less often than would be expected, who is best/worst at breaking or avoiding being broken in that game

  115. do players hit more or fewer unforced errors/winners when have to hold serve to stay in set?

  116. streakiest and steadiest players, game to game and set to set

  117. turning points in matches -- Is it possibly to define quantitatively such moments, and then does anything characterize them or are they random

  118. relatedly, do rain delays or injury delays or other sorts of delays look like breaking points/turning points or are any differences between parts of matches before and after such delays about what you’d expect

  119. effect of return errors/winners on subsequent serves, do they lead to more aces, dfs

  120. are consecutive points/games won at about the rate you’d expect by chance (simulate matches then see if actual ones are more/less random)

  121. ace/double fault rates at various scores

  122. fed is said to have the fewest unreturned returns. is that true? who does?

  123. do bk pt conversions on 30-40 vary depending on how score got there (was it 0-40 then 30-40, or 30-0 then 30-40)

  124. are service games closer later in set

  125. does winner/ufe percentage stay steady between sets

  126. are there more breaks later in matches -- are players starting to time serves?

  127. shots per unforced error/winner/forced error, how that varies per player

  128. winning percentage from various spots on court, various speed shots

  129. how certain players do after certain number of shots in rally

  130. percentage of balls in net, men vs. women (and net clearance)

  131. for various players, percentage of points lost from winners, vs. errors into net, long, wide

  132. is there bias in results of challenges for certain players or types of players, like big servers? i.e. do big servers have in serves called out more than others?

  133. advantages for players after beneficiary of luck -- hit line, or lucky net cord? do such players lose next point because of unconscious guilt? These guys found that happened to lucky winners of points http://www.sciencedirect.com/science/article/pii/S0022103110001241

  134. which kinds of shots force errors, and are winners

  135. can wind be guesstimated from data for a match?

  136. speed of serve, shots vs. success -- how does ue/w/pt won rate vary by speed of shot?

  137. success at net when brought to net by opponent’s short ball vs. choosing to come to net

  138. effect of return errors on subsequent serves, in terms of speed, placement

  139. challenges midpoint -- interrupting a point you’re still in: how often successful

  140. winning percentage on second serves after unsuccessful challenge on out call on first serves

  141. whether players are good at game theory, mixing up shots optimally based on how successful different ones are (like penalty kickers in soccer: http://www.slate.com/id/2144182/

  142. what percentage of serves are aces/unreturned/won in 3 shots/4, etc. how this varies by player, surface, and stage of match

  143. how common are service-return winners, how these vary by player, surface, and stage of match

  144. are there more overheads hit in first set of 5 setter vs. fifth set, are they more or less successful

  145. the notion that fewer breaks occur during the first few games in a match;

  146. do players change their approach after a double fault, taking something off their serve to get the next first serve in and do the receivers attack those points differently;

  147. is the same true at break point

  148. quantify the impact of the new atp 250 qualification draw rules (draw size reduced from 32 to 16, only two rounds need to be won, still 4 qualifiers into main draw) is it harder/easier now for youngsters to jump to atp level via qualies (see thiem 2014, kokkinakis 2015)? how often did players enter a quali-draw pre-2016 which would now not be able to compete (due to smaller draw size)? how often did these players successfully qualify? how did they perform in the main draw (compared to the other qualifiers, which would have also entered the 16-draw size quali draws)?

Something went wrong with that request. Please try again.