author: Gabriel Octavio Lozano Pizón 

email: golozanop@unal.edu.co

# Making an ancestry tree based on thesis directors from mathematicians


Why? well, I found this web page https://genealogy.math.ndsu.nodak.edu/ and thought, wouldn't it be cool to see your mathematics departments ancestry?, wish there was that option!. But since there is none, I'll build it myself with the little to none knowledge I have in using anything I think on using!. But anyway, here we go!.




## Can I even make a single ancestry tree?

Before too much commitment, I want to verify I can make a simple ancestry tree. I found this https://plot.ly/python/tree-plots/ after my strenuous 5 min google search, which resulted in frustration since it was a too powerfull tool for what I really wanted. Luckily later on I found this: https://stackoverflow.com/questions/10379448/plotting-directed-graphs-in-python-in-a-way-that-show-all-edges-separately which is extremely simple!, just like I need it to be

In [13]:
from graphviz import Digraph

dot = Digraph()
dot.node('1111', 'Some guy')
dot.node('2222', 'Another Guy')
dot.node('3333', 'Weird Guy')
dot.edge('1111','2222')
dot.edge('1111','3333')
dot.node('4444','Random Girl')
dot.edge('4444','1111')

print(dot.source)
dot.render("ejemplo", view=True)

digraph {
	1111 [label="Some guy"]
	2222 [label="Another Guy"]
	3333 [label="Weird Guy"]
	1111 -> 2222
	1111 -> 3333
	4444 [label="Random Girl"]
	4444 -> 1111
}


'ejemplo.pdf'

The output is in pdf but to show it here im just converting it online and putting the image here! 

<img src='ejemplo.png'>

#  Can I do the parsing from the web page!

Is nice to be able to make the graphs, but now I need to get the data somehow! lets start with one of my teachers: PhD Juan Carlos Galvis. Now if you search it in the web I showed here is his page:

https://genealogy.math.ndsu.nodak.edu/id.php?id=120652

Now im thinking everybody has an unique ID by which I can access using more or less the same link. GREAT!. Now into the web scraping!(which is the first time I make it in my life, so please forgive me for my ignorance, because I know everything is wrong!)

In [33]:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
from collections import deque

In [3]:
def getAdvisorsID(id):
    url = 'https://genealogy.math.ndsu.nodak.edu/id.php?id='+id
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    idxs= [i for i, v in enumerate(soup.prettify().split('\n')) if 'Advisor' in v]
    idxs=list(map(lambda x: soup.prettify().split('\n')[x+1].split('id')[-1][1:-2], idxs))
    return idxs

In [35]:
def id2Name(id):
    url = 'https://genealogy.math.ndsu.nodak.edu/id.php?id='+id
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    return soup.prettify().split('\n')[6].split('-')[0].replace('  ',' ')

In [24]:
def representsInt(s):
    try: 
        int(s)
        return True
    except ValueError:
        return False

In [5]:
%%time
print(getAdvisorsID('120652'))
print(id2Name('120652'))

['33622']
  Juan Galvis 
CPU times: user 77.8 ms, sys: 3.83 ms, total: 81.7 ms
Wall time: 1.21 s


In [48]:
%%time
#profes = Juan Galvis, Victor Albis,Mauricio Bogoya, Carolina Neira, Zalamea, Lorenzo Acosta
# , Oswaldo Lezama, Agustin Moreno, Jhon Alexander Cruz, Becerra, Sandra Carolina, Serafin
# , Ignacio Mantilla
starters = ['120652','12363', '151407', '146590', '39024', '31210', '118463','186973', '178856', '163433'
           ,'170514','211371', '26541']
ids = deque(starters)
seen = set([])
seen_edges=set([])
dot = Digraph()
cnt = 0
while ids and cnt<700 :
    cnt+=1
    current = ids[0]
    if current not in seen:
        dot.node(current,id2Name(current))
        seen.add(current)
    parents = getAdvisorsID(current)
    ids.popleft()
    for parent in parents:
        if not representsInt(parent):
            continue
        if parent not in seen:
            ids.append(parent)
            if parent not in seen:
                dot.node(parent, id2Name(parent))
            seen.add(parent)
        if (parent,current) not in seen_edges:
            dot.edge(parent, current)
            seen_edges.add( (parent, current) )

print(dot.source)
dot.render("Universidad_Nacional_de_Colombia", view=True)

digraph {
	120652 [label="  Juan Galvis "]
	33622 [label="  Marcus Sarkis "]
	33622 -> 120652
	12363 [label="  Victor Albis"]
	6501 [label="  Robert MacRae "]
	6501 -> 12363
	151407 [label="  Mauricio Bogoya López "]
	9329 [label="  Manuel Elgueta "]
	9329 -> 151407
	146590 [label="  Carolina Neira Jimenez "]
	44584 [label="  Matthias Lesch "]
	44584 -> 146590
	146589 [label="  Sylvie Paycha "]
	146589 -> 146590
	39024 [label="  Fernando Zalamea "]
	38925 [label="  Ernest Manes "]
	38925 -> 39024
	31210 [label="  Lorenzo Acosta Gempeler "]
	31250 [label="  Artibano Micali "]
	31250 -> 31210
	118463 [label="  José Lezama "]
	85202 [label="  Zenon Borevich "]
	85202 -> 118463
	186973 [label="  Agustín Moreno Cañadas "]
	85176 [label="  Alexander Zavadskij "]
	85176 -> 186973
	178856 [label="  John Alexander Cruz Morales "]
	36975 [label="  Martin Guest "]
	36975 -> 178856
	178857 [label="  Hokuto Uehara "]
	178857 -> 178856
	163433 [label="  Edward Becerra "]
	58456 [label="  Bernardo Ur

'Universidad_Nacional_de_Colombia.pdf'

# The End

I hope you liked my little 6 hour project!, it was fun to make! now go and check all your teachers teachers. 

If you have any doubt on how to modify something email me at : golozanop@unal.edu.co