# GeoPlot of DH2018 Presenters

## GOAL:
- Create a table of presenters suitable for mapping
- with columns for: 1) Presenter, 2) Institution, 3) Presentation Title

### Note1:
- Some presenters' institutions are identified by footnote number; footnotes are not in order, so you need to use regex to split the presenter/footnote number into groups and match with the corresponding institution/footnote

### Note2:
- The presenter is used as the index (unit of measurement). BUT some presenters gave more than one paper... So, you could do an additional layer of analysis and aggregate the presentation titles from the same author/institution into one cell. OR you could plot the presentation titles instead of the people. OR the institution.... etc. etc.
- One way to tinker with different arrangements (without creating endless new files) is with pandas dataframes (See Example: NS2018-DF notebook)

In [3]:
#Import Libraries
from bs4 import BeautifulSoup
import requests
import re
import csv

#Create Soup
r  = requests.get("https://www.conftool.pro/dh2018/index.php?page=browseSessions&print=head&doprint=yes&presentations=show")
soup = BeautifulSoup(r.content, "lxml")
#print(soup.prettify())

#create file, write header
with open('NSFinal.csv', 'a') as csvfile:
    fieldnames = ['Name', 'Institution', 'Title']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()

#For each unit in the schedule
for slot in soup.find_all("td", {"class": "whitebg topline_printonly leftline_printonly left"}):
    #print(titles)
    
    #Check if empty
    if slot.find("p", {"class": "paper_title"}):
        
        #Get Author
        authors = slot.find("p", {"class": "paper_author"})
        #authors2 = authors.get_text()
        #print(authors2)
        
        #Get Title
        title = slot.find("p", {"class": "paper_title"})
        title2 = title.get_text()
        
        #If author institution identified via footnote... do this:
        if authors.find("sup"):
            authors2 = authors.get_text()
            #print(authors2)
            a_list = authors2.split(', ')
            #print(a_list)
            for a in a_list:
                #print(a)
                n = re.search('([a-zA-Z]\D*)(\d)', a)
                name = n.group(1)
                num = n.group(2)
                #print(name)
                #print(num)
                
                #Get Institutions
                unis = slot.find("p", {"class": "paper_organisation"})
                unis2 = unis.get_text()
                #print(unis2)
                unis3 = unis2.split(';')
                #print(unis3)
                
                for u in unis3:
                    u = re.search('.*(\d):\s([A-Z].*)', u)
                    u_num = u.group(1)
                    u_name = u.group(2)
                    #print(u_num)
                    #print(u_name)
                    
                    #If the Institution footnote number matches the Name footnote... do this:
                    if u_num == num:
                        #print("Match!")
                        #print(name)
                        #print(u_name)
                        #print(title2)
                        
                        #Write to .csv file
                        with open('NSFinal.csv', 'a') as csvfile:
                            fieldnames = ['Name', 'Institution', 'Title']
                            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
                            #writer.writeheader()
                            writer.writerow({
                            "Name" : name,
                            "Institution" : u_name,
                            "Title" : title2})
                            csvfile.close()
            
        else:
            authors2 = authors.get_text()
            #print(authors2)
            a_list = authors2.split(', ')
            #print(a_list)
            for a in a_list:
            
                #Get Institutions
                unis = slot.find("p", {"class": "paper_organisation"})
                u1 = unis.get_text()
                u2 = re.search('([A-Z].*)', u1)
                u_name = u2.group(1)
                #print(u_name)

                #Get Name
                a2 = re.search('([A-Z].*)', a)
                name = a2.group(1)
                
                """
                final = {
                "Name" : name,
                "Institution" : u_name,
                "Title" : title2,
                }
                print(final)
                """

                #Write to .csv file
                with open('NSFinal.csv', 'a') as csvfile:
                    fieldnames = ['Name', 'Institution', 'Title']
                    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
                    #writer.writeheader()
                    writer.writerow({
                    "Name" : name,
                    "Institution" : u_name,
                    "Title" : title2})
                    csvfile.close()

# Create a Dataframe (Index by Title, Name or Institution)

In [5]:
import pandas as pd
df = pd.read_csv("NSFinal.csv")
df.head()

Unnamed: 0,Name,Institution,Title
0,Leif Isaksen,"University of Exeter, United Kingdom",Getting to Grips with Semantic and Geo-annotat...
1,Gimena del Río Riande,Consejo Nacional de Investigaciones Científica...,Getting to Grips with Semantic and Geo-annotat...
2,Romina De León,Consejo Nacional de Investigaciones Científica...,Getting to Grips with Semantic and Geo-annotat...
3,Nidia Hernández,Consejo Nacional de Investigaciones Científica...,Getting to Grips with Semantic and Geo-annotat...
4,Elisa Beshero-Bondar,"University of Pittsburgh at Greensburg, United...",An introduction to encoding and processing tex...


In [16]:
#Number of Unique Presentations
print(len(df.Title.unique()))

#Number of Presenters
print(len(df.Name.unique()))

#Number of Institutions
print(len(df.Institution.unique()))

97
358
213


In [35]:
#Create a new dataframe merging rows with the same title
df_a = df.groupby(['Title'])['Name'].apply(lambda x: '; '.join(x.astype(str))).reset_index()
df_b = df.groupby(['Title'])['Institution'].apply(lambda x: '; '.join(x.astype(str))).reset_index()
df_new = pd.merge(df_a, df_b, on='Title')
df_new

Unnamed: 0,Title,Name,Institution
0,4 Ríos: una construcción transmedia de memoria...,Elder Manuel Tobar Panchoaga,Orgánica Digital
1,A Corpus Approach to Manuscript Abbreviations ...,Alpo Honkapohja,"University of Edinburgh, United Kingdom"
2,Abundance and Access: Early Modern Political L...,Elizabeth Williamson,"University of Exeter, United Kingdom"
3,"Alfabetización Digital, Prácticas y Posibilida...",Gimena del Rio Riande; Paola Ricaurte Quijano;...,CONICET (Consejo Nacional de Investigaciones C...
4,An introduction to encoding and processing tex...,Elisa Beshero-Bondar; Martina Scholger; Elli M...,"University of Pittsburgh at Greensburg, United..."
5,Archiving Small Twitter Datasets for Text Anal...,Ernesto Priego,"City, University of London, United Kingdom"
6,Archivos Abiertos y Públicos para el Postconfl...,Stefania Gallini,"Universidad Nacional de Colombia, Colombia"
7,Authorship Attribution Variables and Victorian...,David L. Hoover,"New York University, United States of America"
8,Balanceándonos entre la aserción de la identid...,Gunnar Eyal Wolf Iszaevich,"Instituto de Investigaciones Económicas UNAM, ..."
9,Beyond Image Search: Computer Vision in Wester...,Leonardo Laurence Impett; Peter Bell; Benoit A...,"EPFL, Switzerland and University of Cambridge;..."


In [37]:
#Create a new dataframe merging rows with the same Institution
df_1 = df.groupby(['Institution'])['Name'].apply(lambda x: '; '.join(x.astype(str))).reset_index()
df_2 = df.groupby(['Institution'])['Title'].apply(lambda x: '; '.join(x.astype(str))).reset_index()
df_new2 = pd.merge(df_1, df_2, on='Institution')
df_new2

Unnamed: 0,Institution,Name,Title
0,"Academy of Motion Picture Arts & Sciences, Uni...",Teague Schneiter; Brendan Coates,Indexing Multilingual Content with the Oral Hi...
1,Activista en lenguas indígenas / Indigenous La...,Janet Chávez Santiago,Tramando la palabra
2,"Aix-Marseille University, IrAsia",Christian Henriot,Bridging Cultures Through Mapping Practices: S...
3,Ashesi University College,Kajsa Hallberg Adu,Global Perspectives On Decolonizing Digital Pe...
4,Austrian Academy of Sciences,Tanja Wissik,Innovations in Digital Humanities Pedagogy: Lo...
5,"BNF, France",Michel Jacobson,The Impact of FAIR Principles on Scientific Co...
6,Bard College,Maria Sachiko Cecire,Experimental Humanities
7,"Biblioteca Nacional de colombia, Colombia",Tália Méndez Mahecha; Javier Beltrán; Stephani...,Si Las Humanidades Digitales Fueran Un Círculo...
8,"Boston College, United States of America",Chelcie Rowell,Precarious Labor in the Digital Humanities
9,"Boston University, United States of America",Vika Zafrin,"Justice-Based DH, Practice, and Communities"
