# First Step: Round 1

This queries Google Scholar for the three 'seed' papers, collects the first 300 'related articles' from them, and filters out the unique 'Well Cited Papers' from them, saving the result as a spreadsheet for dual coding.

Note that it requires a SerpAPI.com API key; a free two-week trial account is currently available.

In [1]:
# !pip install google-search-results
from serpapi import GoogleScholarSearch, GoogleSearch
import pandas as pd
import numpy as np
from ScholarUtils import GetPapers, GetPaper, WellCitedPapers, InitScholar, RelatedQuery
# Reload ScholarUtils every time before executing code 
%reload_ext autoreload
%autoreload 2

In [2]:
InitScholar("APIKey.yaml")

In [3]:
initialPaperSearches=[
    'Software engineering for security: a roadmap PT Devanbu',
    'You Get Where You’re Looking For: The Impact of Information Sources on Code Security, Acar et al',
    'A Survey on Developer-Centred Security, by Tahaei & Vaniea'
]

In [4]:
initialPapers=[GetPaper({'q': search}) for search in initialPaperSearches]

GetPaper:{'q': 'Software engineering for security: a roadmap PT Devanbu'}
GetPaper:{'q': 'You Get Where You’re Looking For: The Impact of Information Sources on Code Security, Acar et al'}
GetPaper:{'q': 'A Survey on Developer-Centred Security, by Tahaei & Vaniea'}


In [5]:
initialPapersDf=WellCitedPapers(initialPapers) # Includes the Related, which is what we want
initialPapersDf['Round']=1 # all ones.
initialPapersDf

Unnamed: 0,Key,Citations,Year,Title,Authors,Link,Related,Snippet,Round
0,17567618191233461182,645,2000,Software engineering for security: a roadmap,"PT Devanbu, S Stubblebine",https://dl.acm.org/doi/abs/10.1145/336512.336559,visp7qq3zPMJ,Is there such a thing anymore as a software sy...,1
1,7379463099867128855,188,2016,You get where you're looking for: The impact o...,"Y Acar, M Backes, S Fahl, D Kim…",https://ieeexplore.ieee.org/abstract/document/...,F-SuXJceaWYJ,Vulnerabilities in Android code--including but...,1
2,3515120953808077324,21,2019,A Survey on Developer-Centred Security,"M Tahaei, K Vaniea",https://ieeexplore.ieee.org/abstract/document/...,DEJDMuI3yDAJ,Software developers are key players in the sec...,1


In [6]:
allPapers = [foundPaper for relatedPaper in initialPapersDf.itertuples()
                 for foundPaper in GetPapers(RelatedQuery(relatedPaper.Related))
            ] # Concatenate returned lists...
print(len(allPapers))
firstCutRelatedDf=(WellCitedPapers(allPapers)
                   .assign(Round=2)
                  )
firstCutRelatedDf

Retrieving 101 papers for {'q': 'related:visp7qq3zPMJ:scholar.google.com/'}
Retrieving 101 papers for {'q': 'related:F-SuXJceaWYJ:scholar.google.com/'}
Retrieving 101 papers for {'q': 'related:DEJDMuI3yDAJ:scholar.google.com/'}
303


Unnamed: 0,Key,Citations,Year,Title,Authors,Link,Related,Snippet,Round
0,17567618191233461182,645,2000,Software engineering for security: a roadmap,"PT Devanbu, S Stubblebine",https://dl.acm.org/doi/abs/10.1145/336512.336559,visp7qq3zPMJ,Is there such a thing anymore as a software sy...,2
1,12813600874337390632,613,1999,Using abuse case models for security requireme...,"J McDermott, C Fox",https://ieeexplore.ieee.org/abstract/document/...,KNCDGBsO07EJ,The relationships between the work products of...,2
2,11287243237506407833,944,2002,UMLsec: Extending UML for secure systems devel...,J Jürjens,https://link.springer.com/chapter/10.1007/3-54...,md3_EOxXpJwJ,Developing secure-critical systems is difficul...,2
3,17600728651530679484,1060,2002,SecureUML: A UML-based modeling language for m...,"T Lodderstedt, D Basin, J Doser",https://link.springer.com/chapter/10.1007/3-54...,vLSGtnRZQvQJ,We present a modeling language for the model-d...,2
4,15499650706254585718,1378,2005,Eliciting security requirements with misuse cases,"G Sindre, AL Opdahl",https://link.springer.com/content/pdf/10.1007/...,dt_vWDLSGdcJ,Use cases have become increasingly common duri...,2
...,...,...,...,...,...,...,...,...,...
124,16879893825225754361,61,2018,Security in the software development lifecycle,"H Assal, S Chiasson",https://www.usenix.org/conference/soups2018/pr...,-QbkIgBuQeoJ,We interviewed developers currently employed i...,2
125,4325511156220754863,21,2019,""" If you want, I can store the encrypted passw...","A Naiakshina, A Danilova, E Gerlitz…",https://dl.acm.org/doi/abs/10.1145/3290605.330...,r_uGonpNBzwJ,"In 2017 and 2018, Naiakshina et al.(CCS'17, SO...",2
126,14295319890240191588,9,2019,"DevOps, a new approach to cloud development & ...","P Agrawal, N Rawat",https://ieeexplore.ieee.org/abstract/document/...,ZLxXSLMtY8YJ,Organization's proficiency to deliver services...,2
127,11415931451330937671,10,2020,Understanding privacy-related questions on Sta...,"M Tahaei, K Vaniea, N Saphra",https://dl.acm.org/doi/abs/10.1145/3313831.337...,R0PKuCuJbZ4J,ABSTRACT We analyse Stack Overflow (SO) to und...,2


In [7]:
['Charles','Pierre','AgreedScore']+list(initialPapersDf.columns.values)

['Charles',
 'Pierre',
 'AgreedScore',
 'Key',
 'Citations',
 'Year',
 'Title',
 'Authors',
 'Link',
 'Related',
 'Snippet',
 'Round']

In [8]:
allPapersSoFarDf=(pd.concat([initialPapersDf, firstCutRelatedDf]) 
                .drop_duplicates(subset=['Key'])
                .reindex(columns=list(initialPapersDf.columns.values)+['Charles','Pierre','AgreedScore'],fill_value='')
               )
print(len(allPapersSoFarDf))
allPapersSoFarDf.to_excel('PapersToCode.xlsx')
allPapersSoFarDf

126


Unnamed: 0,Key,Citations,Year,Title,Authors,Link,Related,Snippet,Round,Charles,Pierre,AgreedScore
0,17567618191233461182,645,2000,Software engineering for security: a roadmap,"PT Devanbu, S Stubblebine",https://dl.acm.org/doi/abs/10.1145/336512.336559,visp7qq3zPMJ,Is there such a thing anymore as a software sy...,1,,,
1,7379463099867128855,188,2016,You get where you're looking for: The impact o...,"Y Acar, M Backes, S Fahl, D Kim…",https://ieeexplore.ieee.org/abstract/document/...,F-SuXJceaWYJ,Vulnerabilities in Android code--including but...,1,,,
2,3515120953808077324,21,2019,A Survey on Developer-Centred Security,"M Tahaei, K Vaniea",https://ieeexplore.ieee.org/abstract/document/...,DEJDMuI3yDAJ,Software developers are key players in the sec...,1,,,
1,12813600874337390632,613,1999,Using abuse case models for security requireme...,"J McDermott, C Fox",https://ieeexplore.ieee.org/abstract/document/...,KNCDGBsO07EJ,The relationships between the work products of...,2,,,
2,11287243237506407833,944,2002,UMLsec: Extending UML for secure systems devel...,J Jürjens,https://link.springer.com/chapter/10.1007/3-54...,md3_EOxXpJwJ,Developing secure-critical systems is difficul...,2,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
121,6285781247893872317,5,2020,Interventions for long‐term software security:...,"C Weir, I Becker, J Noble, L Blair…",https://onlinelibrary.wiley.com/doi/abs/10.100...,vZ6G0L2UO1cJ,Though some software development teams are hig...,2,,,
123,5117288349966358717,35,2018,Why developers cannot embed privacy into softw...,"A Senarath, NAG Arachchilage",https://dl.acm.org/doi/abs/10.1145/3210459.321...,vVj9SKRCBEcJ,Pervasive use of software applications continu...,2,,,
126,14295319890240191588,9,2019,"DevOps, a new approach to cloud development & ...","P Agrawal, N Rawat",https://ieeexplore.ieee.org/abstract/document/...,ZLxXSLMtY8YJ,Organization's proficiency to deliver services...,2,,,
127,11415931451330937671,10,2020,Understanding privacy-related questions on Sta...,"M Tahaei, K Vaniea, N Saphra",https://dl.acm.org/doi/abs/10.1145/3313831.337...,R0PKuCuJbZ4J,ABSTRACT We analyse Stack Overflow (SO) to und...,2,,,


In [9]:
!open PapersToCode.xlsx