-
Notifications
You must be signed in to change notification settings - Fork 0
HOWTO: Add new author profiles
Locate the platform or service's author search page.
Using ProQuest as an example: http://www.scholaruniverse.com/namesearch?f=kathy&m=&l=smith
Use the EmptyProfiles template to create a new profile type.
Edit the url variable to match the search URL for the platform.
Example:
String url = "http://www.scholaruniverse.com/namesearch?" +
"f=" + URLEncoder.encode(gn,"UTF-8") +
"&m=&l=" + URLEncoder.encode(sn,"UTF-8");View the source HTML of the results page and locate a pattern that includes the platform's author identifier.
Example:
<a href="profiles/people/78F7C707AC1BA51A0949B95D87EBEF53?q=firstname%3Akathy+lastname%3Asmith">
<span class="scholarName">Katherine Ellinger-Smith</span>
</a>Translate the tags into a regular expression (see the linkTag pattern in the EmptyProfiles template) to extract the identifier.
Example:
Pattern linkTag = Pattern.compile("<a href='profiles/people/([A-Za-z0-9]+)[^']+'>\\s*<span class=\"scholarName\">([^<]+)</span>");#3. Determine the author search result page URL
For example, the ProQuest author page URLs looks like this: http://www.scholaruniverse.com/profiles/people/78F7C707AC1BA51A0949B95D87EBEF53
Set the personUrl variable based on this.
Example:
String personUrl = "http://www.scholaruniverse.com/profiles/people/" + linkMatch.group(1);#4. Locate author's affiliation
View the source HTML for the author page and see if there is a tag that contains the author's affiliation.
If you find an affiliation tag, translate it into a regular expression in the EmptyProfiles affTag pattern, in the scrapeAffiliations method.
Example:
Pattern affTag = Pattern.compile("<b>Affiliation:</b> </td><td[^>]+>([^<]+)<");Next, set the profile's affiliation property to the result of the scrapeAffiliations method:
p.setAffiliation(scrapeAffiliations(personUrl));If you don't find affiliations, simply set the profile's affiliation property to "unknown"
Example:
p.setAffiliation("unknown");#5. Locate author's total number of works
Just like affiliations, try to locate a tag that contains the author's total number or works. If your find one, translate it into a regular expression in the EmptyProfile's worksTag pattern, in the scrapeWorks method.
Example:
Pattern worksTag = Pattern.compile("<a href=\"#\" id=\"docCntLnk\"[^>]+>(\\d+)</a>");Next, set the profile's works property to the result of the scrapeWorks method:
p.setWorks(scrapeWorks(personUrl));If you don't find a tag for total works, simply set the profile's works property to 0
Example:
p.setWorks(0);#6. Add the new profile to the API
Edit the ProfileResource API to include the new profile you created.
Example:
// PROQUEST
ProquestProfiles proquestProfiles = new ProquestProfiles(surname, givenname);
profiles.addAll(proquestProfiles.getProfiles());