# API

## The Problem

One of the wrestlers I've followed over the years is Luke Zilverberg, 157 for SDSU. He place 8th in 2018, but was not particularly highly ranked in high school (at least, that's my understanding). I'd like go back over the various wrestling web sites and try mine past results - perhaps we might build a machine learning system that produce better rankings.


# The Open Mat

We'll start with a web site with a very simple API. We simply pass a text query by

In [24]:
query =  'https://news.theopenmat.com/?s=zilverberg'

This is then loaded as any web page:

In [25]:
from lxml import html
import requests
page = requests.get(query)
tree = html.fromstring(page.content)
print(tree)

<Element html at 0x124c57350>


We will most likely be interested in hyperlinks, so

In [26]:
link_nodes = tree.xpath('//a/@href')
print(link_nodes[0:6])

['https://www.facebook.com/TOMwrestling/', 'https://www.twitter.com/theopenmat', 'https://www.instagram.com/theopenmat/', 'https://news.theopenmat.com/feed/rss', 'https://news.theopenmat.com/', 'https://news.theopenmat.com/category/college-wrestling-news']


We will further want to limit our crawling to rankings.

In [36]:
import pandas
import re

for link in link_nodes:

     if re.search('.*rankings.*/[0-9]+$', link)!=None :
        print(link)
        page = requests.get(link)
        tree = html.fromstring(page.content)
        try:
             tbl = pandas.read_html(link)
             print(tbl)
        except:
             print("No tables at "+link)     

https://news.theopenmat.com/college-rankings/toms-ncaa-di-wrestling-team-rankings-tournament-dual-january-8th-2020/76100
[    Rank              Team  Tournament Points Previous  Unnamed: 4  Rank.1  \
0      1              Iowa              132.5        1         NaN       1   
1      2        Penn State               79.5        2         NaN       2   
2      3        Ohio State               56.0        3         NaN       3   
3      4         Minnesota               54.0        6         NaN       4   
4      5            Purdue               48.5       13         NaN       5   
5      6     Arizona State               46.5        3         NaN       6   
6      7    Oklahoma State               45.5        8         NaN       7   
7      7         Wisconsin               47.5        5         NaN       8   
8      9      Northwestern               42.0       10         NaN       9   
9     10          Nebraska               41.0        7         NaN      10   
10    11        Iowa

## FloWrestling
FloWresting has a similar API, but also allows us to specify a category (this will be accessible through a tab in the main page).

In [37]:
query = 'https://www.flowrestling.org/search?q=zilverberg&type=ranking'
page = requests.get(query)
tree = html.fromstring(page.content)
link_nodes = tree.xpath('//a/@href')
print(link_nodes[0:6])

['https://arena.flowrestling.org', '/articles', '/events', '/rankings', '/results', '/training']


In [38]:
for link in link_nodes:
    if (link.find('recruiting') != -1):
        link_url = 'https://www.flowrestling.org'+link
        page = requests.get(link_url)
        tree = html.fromstring(page.content)
        try:
             tbl = pandas.read_html(link_url)
             print(tbl)
        except:
             print("No tables at "+link_url)  

No tables at https://www.flowrestling.org/rankings/6194310-2018-recruiting-class-rankings/28105-2018-recruiting-class-rankings
No tables at https://www.flowrestling.org/rankings/6031443-2013-recruiting-class-rankings/14110-2013-recruiting-class-rankings
No tables at https://www.flowrestling.org/rankings/6031439-2012-recruiting-class-rankings/14106-2012-recruiting-class-rankings
No tables at https://www.flowrestling.org/rankings/6031438-2011-recruiting-class-rankings/14105-2011-recruiting-class-rankings
No tables at https://www.flowrestling.org/rankings/6031437-2010-recruiting-class-rankings/14104-2010-recruiting-class-rankings


Compare this with the results query (I’ve included some of the other API calls)

In [40]:
query = 'https://www.flowrestling.org/search?q=zilverberg&page=1&limit=10&sort=recent&type=result'
page = requests.get(query)
tree = html.fromstring(page.content)
link_nodes = tree.xpath('//a/@href')

['https://arena.flowrestling.org', '/articles', '/events', '/rankings', '/results', '/training', '/films', 'http://www.flosports.tv/work-with-us', 'http://www.flosports.tv/contact/', 'http://www.flosports.tv/faq/']


In [41]:
for link in link_nodes:
    if (link.find('championship') != -1):
        link_url = 'https://www.flowrestling.org'+link
        page = requests.get(link_url)
        tree = html.fromstring(page.content)
        try:
             tbl = pandas.read_html(link_url)
             print(tbl)
        except:
             print("No tables at "+link_url)  

No tables at https://www.flowrestling.org/results/6144120-2018-ncaa-championships/24035
No tables at https://www.flowrestling.org/results/6129591-big-12-championship-2018-ncaa-wrestling/23978


This example is a bit unfair - the pages returned from Flo don’t look like HTML (or anything ML-like, for that matter). There are some sites in the Exercises that have more accessible API, but, unfortunately, Flo is probably more complete.

Trackwrestling
www.trackwrestling.com also has an extensive data base, but it’s API is more cryptic - I’ll include it for illustation only.

In [44]:
query = 'https://www.trackwrestling.com/tw/membership/ViewProfile.jsp?TIM=1580939416681&twSessionId=zvsxsjdwnd&twId=2098662009'
page = requests.get(query)
tree = html.fromstring(page.content)
link_nodes = tree.xpath('//a/@href')
print(link_nodes)

['TWMemberList.jsp?TIM=1580960890358&twSessionId=zvsxsjdwnd&', 'MyTrackSignIn.jsp?TIM=1580960890358&twSessionId=zvsxsjdwnd&twId=2098662009', 'MyTrackSignIn.jsp?createAccount&wrestler&TIM=1580960890358&twSessionId=zvsxsjdwnd&', '../Login.jsp?TIM=1580960890358&twSessionId=zvsxsjdwnd', '../seasons/index.jsp?TIM=1580960890358&twSessionId=zvsxsjdwnd', '../PortalPlayer.jsp?TIM=1580960890358&twSessionId=zvsxsjdwnd&portalId=457091009', '../PortalPost.jsp?TIM=1580960890358&twSessionId=zvsxsjdwnd&portalId=457091009', 'ViewProfile.jsp?TIM=1580960890358&twSessionId=zvsxsjdwnd&twId=126104009', 'ViewProfile.jsp?TIM=1580960890358&twSessionId=zvsxsjdwnd&twId=9881107', 'ViewProfile.jsp?TIM=1580960890358&twSessionId=zvsxsjdwnd&twId=53400009', 'ViewProfile.jsp?TIM=1580960890358&twSessionId=zvsxsjdwnd&twId=1880111009', 'ViewProfile.jsp?TIM=1580960890358&twSessionId=zvsxsjdwnd&twId=994012096', 'MemberRankings.jsp?TIM=1580960890358&twSessionId=zvsxsjdwnd', 'ViewClub.jsp?TIM=1580960890358&twSessionId=zvsxsjd

# Exercises

# 1

Note that in our first example from The Open Mat, some of the links are of the form
`https://news.theopenmat.com/page/2?s=zilverberg` . Write a script to parse the result from
https://news.theopenmat.com/?s=zilverberg and iterate over these additional pages in turn.


# 2

Iterate over each wrestler in the `ncaa2018.csv` file, and write a script to crawl the web to find wrestling results or rankings (perhaps video?) for each wrestler.

# 3

Visit the web site [Quant Wrestling](https://quantwrestling.com/) and [The Intermat](https://intermatwrestle.com/rankings/college/)
Can you write code to load rakings for the 149 pound weight class, and to compare those rankings?
