# Use of Semantic Scholar library
**Author:** Christian Byron  **Date:** 30-Mar-21

This demo extracts data from the [Semantic Scholar API](https://api.semanticscholar.org/) using the python library `semanticscholar`.

---

#### Step 1 - Create a list of Digital Object Identifiers (DOI)
- [x] Later read this in from a file (possibly scraping from a bibtex format)
- [ ] Need to cater for different formats of the bibtex fields (DOI vs doi)

In [5]:
import re

DOI_List = []

BibTex_File = open('temp.bib', 'r') 

for BibTex_Line in BibTex_File:
    if BibTex_Line[:6] == '   DOI' :
        DOI_Search = re.search('{(.+?)}', BibTex_Line)
        if DOI_Search: DOI_List.append(DOI_Search.group(1))
        
BibTex_File.close()

#### Step 2 - Loop through the list and store the citation data from Semantic Scholar

In [6]:
import semanticscholar as sch

papers = []

for DOI in DOI_List:
    paper = sch.paper(DOI, timeout=2)
    papers.append(paper)
    

#### Step 3 - Loop through the resulting citation data and produce a table
- [ ] Need to handle case where schemantics scholar returns null as a aritcle cannot be found (eg recent publication)

In [7]:
from IPython.display import HTML, display

display(HTML('<table><tr><th>DOI</th><th>Title</th><th>Citation Count</th><tr>{}</tr></table>'.format(
                '</tr><tr>'.join(
                    '<td>{}{}{}{}{}</td>'.format(paper['doi'], '</td><td>', paper['title'] , 
                                                 '</td><td>', len(paper['citations']) )
                for paper in papers)
            )))

DOI,Title,Citation Count
10.1109/AINA.2016.94,Energy Considerations for Continuous Group Activity Recognition Using Mobile Devices: The Case of GroupSense,9
10.1186/s13174-019-0103-1,GARSAaaS: group activity recognition and situation analysis as a service,3
10.1145/3295747,GroupSense,0
10.1007/978-3-030-30645-8_26,Mask Guided Fusion for Group Activity Recognition in Images,1
10.1145/3316615.3316722,Cooperative Hierarchical Framework for Group Activity Recognition: From Group Detection to Multi-activity Recognition,1
10.1109/CVPR.2019.00808,Convolutional Relational Machine for Group Activity Recognition,19
10.1007/s00521-016-2346-0,Constrained self-organizing feature map to preserve feature extraction topology,2
10.1109/MDM.2014.62,A Framework for Continuous Group Activity Recognition Using Mobile Devices: Concept and Experimentation,6
10.1109/COMPSAC.2016.202,Temporal Dependency Rule Learning Based Group Activity Recognition in Smart Spaces,8
10.1016/J.AUTCON.2019.102886,Two-step long short-term memory method for identifying construction activities through positional and attentional cues,13


#### Lessons Learnt

- Ways to open an file and read it line by line, in contrast to reading in every line into a list or string
- Beginning in using regular expressions is relatively straigthtforward
- Displaying HTML and table is potentially useful
- Debugging in Jupyter is painful - need to look at other ways to write python initally 

In [None]:
# for i in range(0,len(papers)-1):
#        print(f"{i} - {papers[i]['title']}")

#print(papers[54])
print(DOI_List)