Skip to content

HOWTO: Add new author profiles

bmckinney edited this page Nov 22, 2016 · 3 revisions

1. Find an author search URL

Locate the platform or service's author search page.

Using ProQuest as an example: http://www.scholaruniverse.com/namesearch?f=kathy&m=&l=smith

Use the EmptyProfiles template to create a new profile type.

Edit the url variable to match the search URL for the platform.

Example:

String url = "http://www.scholaruniverse.com/namesearch?" +
    "f=" + URLEncoder.encode(gn,"UTF-8") +
    "&m=&l=" + URLEncoder.encode(sn,"UTF-8");

2. Find HTML tags that contain the author identifiers

View the source HTML of the results page and locate a pattern that includes the platform's author identifier.

Example:

<a href="profiles/people/78F7C707AC1BA51A0949B95D87EBEF53?q=firstname%3Akathy+lastname%3Asmith">
    <span class="scholarName">Katherine  Ellinger-Smith</span>
</a>

Translate the tags into a regular expression (see the linkTag pattern in the EmptyProfiles template) to extract the identifier.

Example:

Pattern linkTag = Pattern.compile("<a href='profiles/people/([A-Za-z0-9]+)[^']+'>\\s*<span class=\"scholarName\">([^<]+)</span>");

#3. Determine the author search result page URL

For example, the ProQuest author page URLs looks like this: http://www.scholaruniverse.com/profiles/people/78F7C707AC1BA51A0949B95D87EBEF53

Set the personUrl variable based on this.

Example:

String personUrl = "http://www.scholaruniverse.com/profiles/people/" + linkMatch.group(1);

#4. Locate author's affiliation

View the source HTML for the author page and see if there is a tag that contains the author's affiliation.

If you find an affiliation tag, translate it into a regular expression in the EmptyProfiles affTag pattern, in the scrapeAffiliations method.

Example:

Pattern affTag = Pattern.compile("<b>Affiliation:</b>&nbsp;&nbsp;</td><td[^>]+>([^<]+)<");

Next, set the profile's affiliation property to the result of the scrapeAffiliations method:

p.setAffiliation(scrapeAffiliations(personUrl));

If you don't find affiliations, simply set the profile's affiliation property to "unknown"

Example:

p.setAffiliation("unknown");

#5. Locate author's total number of works

Just like affiliations, try to locate a tag that contains the author's total number or works. If your find one, translate it into a regular expression in the EmptyProfile's worksTag pattern, in the scrapeWorks method.

Example:

Pattern worksTag = Pattern.compile("<a href=\"#\" id=\"docCntLnk\"[^>]+>(\\d+)</a>");

Next, set the profile's works property to the result of the scrapeWorks method:

p.setWorks(scrapeWorks(personUrl));

If you don't find a tag for total works, simply set the profile's works property to 0

Example:

p.setWorks(0);

#6. Add the new profile to the API

Edit the ProfileResource API to include the new profile you created.

Example:

// PROQUEST
ProquestProfiles proquestProfiles = new ProquestProfiles(surname, givenname);
profiles.addAll(proquestProfiles.getProfiles());

Clone this wiki locally