## Finding predatory journals in Trove
#### Create by Ziting Zhang

    This notebook is aiming to retrieve all journal articles in Trove and find out which ones are on the predatory list,
    which is Beall's list in this notebook.
    
    As for Tim Sherratt has done some related works (https://glam-workbench.github.io/trove-journals/) on this field , it
    makes me easier to find references for my notebook.
    
    In Tim Sherratt's work, he retrieved the journals that have digital resources online, but what I need to find 
    is all journal articles, so there should be a large amount of resources that need to be retrieved out.
    
    Now let's start with importing the required libraries that the first part of the notebook needed. 

## Part 1

## Libraries 
    All libraries to be used in this part.

In [1]:
#for HTTP request
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

# show a smart progress meter
from tqdm import tqdm_notebook
# display tools in IPython
from IPython.display import display, FileLink
# provide dataframe for data analysis
import pandas as pd
# json page API
import json


## Set API key and request session
    Now, we have all the supported libraries for the first part, so we can start with using the Trove API.
    
    For using the Trove API, the first thing you need to do is get your own Trove API key to make use of 
    Trove API. The following key is mine own API key and you need to change it to your own one after apply
    for the key. 
    
    Do not feel worry about getting the API key, just follow these steps:
    1. Go to the Trove official website (https://trove.nla.gov.au/)
    2. Click 'API' at the bottom of the page.
    3. Then you can follow the instruction called 'How do I get an API key?' provided by Trove to get your API key.

In [2]:
api_key = '9aqiim9kqb98hqvt'  #change to your own API key

    The session object is mainly used to persist certain parameters, like cookies, across different HTTP requests. 
    A session object may use a single TCP connection for handling multiple network requests and responses, which 
    results in performance improvement.
    Set retry times to avoid temperory connection issues.

In [3]:
s = requests.Session()
retries = Retry(total=5, backoff_factor=1, status_forcelist=[ 502, 503, 504 ])
s.mount('https://', HTTPAdapter(max_retries=retries))
s.mount('http://', HTTPAdapter(max_retries=retries))

## Set configurations

    It's the time for modifying the search query used for Trove API!
    
    Now let's think about what things we need to retrieve from the Trove database. All the journal articles in Trove.
    So we need to know where are these resources stored in the database. In Trove, there are several zones for storing
    different kind of resources. The one contains journal articles is called 'Journals, articles and datasets'.


In [35]:
# prefix of the search query
api_search_url = 'https://api.trove.nla.gov.au/v2/result'
params = {
    'key': api_key,                 # key -- API key
    'zone': 'article',              # zone -- Trove zones. article refers to zone "Journals, articles and data sets"
    'q': ' ',                       # q -- query
    'bulkHarvest': 'true',          # bulkHarvest -- set the order consistantly
    'n': 100,                       # n -- number in each page. Maximun 100 per page.
    'encoding': 'json',             # encoding -- can be json or xml
    'l-format': 'Periodical',       # l-format -- set "Periodical" to avoid articles not in journal.
    'l-availability': 'y',          # l-availability -- available online
    'l-australia': 'y',             # l-australia -- set yes to ensure the resources to be found is from Australia
    'l-language': 'English',        # l-language -- set 'English' to retrieve english content resources
    's': '*'                        # s -- use for turning the page
}
these_params = params.copy()

## Start harvesting

    Using the set configurations to find the results in Trove
    The specific frame of finding works refers to the results returned by Trove
    The name and troveURL will be stored in the dataframe.

In [36]:
def zone_finder():

    start = '*'
    
    names = []
    troveURL = []
    contributor = []
    while start:
        these_params['s'] = start #Set s to be the first page and then be the value of 'nextStart'.
        response = s.get('https://api.trove.nla.gov.au/v2/result', params=these_params) 
        
        #if you don't want to see the reponse json page, please comment the following line
        #results url will be showed below. Click to see the detailed information.
        print(response.url)
        
        #store the reponse json page in data
        data = response.json()
        
        try:
            #refer to the frame returned by Trove
            start = data['response']['zone'][0]['records']['nextStart']
        except KeyError:
            #If error stopped searching
            start = None
        # Loop through the works in data
        for work in data['response']['zone'][0]['records']['work']:
            # Add the article information into names and troveURL
            names.append(work['title'])
            troveURL.append(work['troveUrl'])
    # Return the names and url to a dataframe
    return pd.DataFrame({'names': names, 'TroveURL': troveURL})

#call the method zone_finder()
result = zone_finder()

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=%2A
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxMDA1MjM0MA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxMDA5NTg3NQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxMDE0MDM4OA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=E

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxMTI0NTM4Ng%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxMTI0NzkwMQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxMTI1MTM3Nw%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxMTI1Njg3MQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-aus

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxNDY5NTc0Ng%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxNDkzNDI5NA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxNDk5MTk4OQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxNTEyNDMxMDU%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-austr

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxNjI5ODEzNDM%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxNjI5ODE3MTU%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxNjI5ODI1NzU%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxNjI5ODMxMTk%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxNjQzMjk0OTM%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxNjQzMjk3OTM%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxNjQzMzA1MjE%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxNjQzMzEwMzk%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxNzE0NjE4ODk%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxNzE2ODkxNzI%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxNzIwMDkxNg%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxNzIwNTA0Nzg%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxODAzNTIzODU%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxODA1MzQ4NTU%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxODA3MDgyNjk%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxODA4MDEwNjc%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxODk5ODI3Nw%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxOTAwMDkxNTU%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxOTAwNDY5MA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxOTAwOTEwNjk%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-austral

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxOTIzNTAxMTc%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxOTIzNTAyMTc%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxOTIzNTAzMTc%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxOTIzNTMwNDI%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxOTczNDg2Mg%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UxOTc1MTU1NDA%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxOTc4NDc0Mw%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UxOTc5OTkzNg%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-austr

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMDY3NjAwMzU%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMDY3NjM3NjY%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMDY3NjQ5ODg%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMDY4Nzk3MTQ%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMTQ0NTU4MDc%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMTQ1MjQwNjE%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMTQ2NjYyNDU%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMTQ3MzUwMjc%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UyMjUzOTY3NQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMjU2MDA0NTY%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMjU2MDA1NjI%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMjU2MDE2NDM%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMjk2NTA1NTE%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMjk2NTA4MDY%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMjk2NTEwNDc%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMjk2NTEyNzc%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMzI3ODgxMDM%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UyMzI4MTY5OA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMzI5NDgwNTk%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoErc3UyMzMwOTM0NTk%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UyNDQ4MDU3Ng%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UyNDQ5ODU5MQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UyNDU4NDMyNA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UyNDc4MzM1OQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-aus

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UyODA5OTE2OQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UyODExODM5OQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UyODEyMDg5OA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UyODEyMjkwNg%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-aus

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzMTM0MDg2OQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzMTM2NTM0Nw%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzMTM5MDE2MQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzMTU2MDQxOQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-aus

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzMjczMTk1MQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzMjczNjA0Mg%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzMjc0NjI1MQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzMjc1MzUzNA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-aus

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzNDE4MzA0MQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzNDIwMjg1Mw%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzNDIxNTc4OA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzNDIyOTAzOA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-aus

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzNTk2NTAxNQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzNjMxMjQxOA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzNjMyMDA5NQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzNjMzNTcyMA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-aus

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzODMwOTM0Nw%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzODMxNzQ4Mg%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzODM0Mjg2MA%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEqc3UzODM5NDE2MQ%3D%3D
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-aus

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEpc3U2MDY1OTQ4
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEpc3U2MTE2MjE0
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEpc3U2MTIxNDcx
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEpc3U2MTI0NTA2
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=Ao

https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEpc3U4MzgwMTU1
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEpc3U4Mzg0OTg4
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEpc3U4NDA1NjEy
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=AoEpc3U4NDI1Mzk1
https://api.trove.nla.gov.au/v2/result?key=9aqiim9kqb98hqvt&zone=article&q=+&bulkHarvest=true&n=100&encoding=json&l-format=Periodical&l-availability=y&l-australia=y&l-language=English&s=Ao

In [4]:
df_url = pd.read_csv("./predatory_journals_with_url.csv", sep = ',' ,header = None)
pd.set_option('max_colwidth',100)
df_url.head()

Unnamed: 0,0,1
0,names,TroveURL
1,"Reports / North Atlantic Assembly, Defence and Security Committee",https://trove.nla.gov.au/work/10002103
2,"Reports / North Atlantic Assembly, Scientific and Technical Committee",https://trove.nla.gov.au/work/10002106
3,"Reports / North Atlantic Assembly, Political Committee",https://trove.nla.gov.au/work/10002110
4,"Western fisheries (Perth, W.A.)",https://trove.nla.gov.au/work/10002212


## Save dataframe into CSV file
    Save the data into CSV file with two columns which refer to the title and the Trove url.
    Once the data is saved, we do not need to rerun the previous steps because it takes pretty much time.
    We can simply use the CSV file for further uses.

In [37]:
#save to dataframe
df = pd.DataFrame(result)
#save to csv file
df.to_csv('Trove_journals_with_url.csv', index=False)

    Now, we get 301847 items from Trove which is pretty large number of journal articles.
    This number can be varied because the Trove database is updating over time.

# Part2
## Comparison
    Compare the data extract in the previous step with Beall's predatory list
    Using fuzzy function in this part to avoid the same journal can not be found which from different resources.
    Fuzzy ratio set to 90 out of 100 to avoid to find large amount of journals.

## Import libraries
    First we need to install the modules we do not have, just run the following cell to make it completed!

In [3]:
!pip install fuzzywuzzy
!pip install xlrd



    Then import the required libraries for part 2

In [3]:
#for calculations in Python
import numpy as np
#provide dataframe
import pandas as pd
#fuzzy function
from fuzzywuzzy import fuzz
from fuzzywuzzy import process

## Read journals in predatory list from CSV file and transfer to list
    In this part, the predatory list we use is the Beall's list. As Beall's list is not difficult to find on the website,
    I copy the names of the journals in the list to an CSV file from https://beallslist.weebly.com/.
    
    Then, loading the CSV file into the dataframe for further uses.

In [39]:
#load into dataframe
df2 = pd.read_csv("./list.csv", sep = ',' ,header = None)
#save the names into a list
list2 = df2[0].tolist()

## Compare extract data with predatory list
    In this section, the names from Trove and Beall's list will be compared, and the same ones will be regarded as 
    the results we found. 
    
    In this part, the data from Trove and Beall’s list need to do the comparison. But in this process, I found that 
    some names are slightly different. For example, “Scientific journals international” from Trove and “Scientific 
    Journals International” are the same, but their names have few differences. You will see this example in the 
    results below. These problems can cause the same journal to fail to match among two datasets. In order to solve 
    this problem, I added a fuzzy function in the process of comparison. The fuzzy function can calculate the 
    similarity between two strings. 
    
    In this situation, I set the similarity of two names to 90% to get similar names and ensure that there would not be too many errors.
    As there are 30 journals can be retrieved out, I can manually check if the two journals are the same in different 
    datasets by using the links on Beall’s list’s website. 
    
    This should be a long process, if you are working on a hign performance server it will become faster.

In [40]:
#read journals from csv file
df_article = pd.read_csv("./Trove_journals_with_url.csv", sep = ',' ,header = None)
#transfer to list
list_art = df_article[0].tolist()
count_art = 0
count = 0
list_trove = []
list_pre = []
#find matchings
for p in list2: #name in predatory list
    for l in list_art: #name in Trove
        #use fuzzy function
        score = fuzz.ratio(l.lower(),p.lower()) #strings to lowercase
        if score > 90 :
            count_art = count_art + 1 #count the number
            list_trove.append(l) #add into list
            list_pre.append(p)
            break

    Load the journal articles with the Trove url into the dataframe

## Count the number of journals found
    Here are the total number of items found by comparison

In [41]:
count_art

30

## List them all and manually check
    First let's have a look of the names found from Trove

In [42]:
list_trove

['Oasis',
 'Dance research journal',
 'American research journal',
 'RSC journals',
 'ARN journal',
 'AIDScience',
 'Computer Science Journal',
 'Esk journal',
 'European journal of educational studies',
 'International journal on computer science and engineering',
 'International CLIL research journal',
 'Air international',
 'Medical science',
 'ET journal',
 'Newpubli',
 'Open Journal Systems',
 'RIRDC publications',
 'APS Journals',
 'Revista acadêmica',
 'JSAP international',
 'ScienceAlert',
 'Science and technology publishing',
 'Scientific journals international',
 'Scitech',
 'Scitech',
 'RSCAS Publications',
 'CTI journal',
 'ULK scientific journal',
 'HVS international journal',
 'World scholar']

    Then let's found the url for these journal articles.
    
    Here the dictionary is used for finding the paired url for journal articles.
    A dictionary is a collection which is unordered, changeable and indexed. In Python, dictionaries are written
    with curly brackets, and they have keys and values.
    We use names as the keys to find its values, which are urls.

In [44]:
dict_trove = df_url.set_index(0).T.to_dict('list')
urls = []
for title in list_trove:
    urls.append(dict_trove.get(title))

  """Entry point for launching an IPython kernel.


    Here is the results of the journal articles found with their urls.

In [45]:
trove = pd.DataFrame({'names': list_trove, 'TroveURL': urls})
trove

Unnamed: 0,names,TroveURL
0,Oasis,[https://trove.nla.gov.au/work/37193230]
1,Dance research journal,[https://trove.nla.gov.au/work/38309383]
2,American research journal,[https://trove.nla.gov.au/work/189974372]
3,RSC journals,[https://trove.nla.gov.au/work/28410399]
4,ARN journal,[https://trove.nla.gov.au/work/11351373]
5,AIDScience,[https://trove.nla.gov.au/work/28646447]
6,Computer Science Journal,[https://trove.nla.gov.au/work/165736143]
7,Esk journal,[https://trove.nla.gov.au/work/20852549]
8,European journal of educational studies,[https://trove.nla.gov.au/work/35153346]
9,International journal on computer science and ...,[https://trove.nla.gov.au/work/151780769]


    Then let's have a look of the ones found in Beall's list.
    
    Are they the same ones?

In [46]:
list_pre

['OASIS)',
 'Advanced Research Journals',
 'American Research Journals',
 'ARC Journals',
 'ARPN Journals',
 'Avid Science',
 'Computer Science Journals',
 'Eko Journal',
 'European Journals of Education Studies',
 'International Conference on Computer Science and Engineering',
 'International Skill Research Journals',
 'IORE International',
 'Medical science',
 'Net Journals',
 'Newpubli',
 'Open Journal Systems',
 'ORIC Publications',
 'PBS Journals',
 'Revistas Academicas',
 'SAVAP International',
 'Science Alert',
 'Science and Technology Publishing',
 'Scientific Journals International',
 'SCITECH',
 'Scitechz',
 'SS Publications',
 'TI Journals',
 'USN Scientific Journal',
 'VSRD International Journals',
 'World Scholars']

    Let's put them into one table to have a clear look

In [47]:
comparison = pd.DataFrame({'journals from Trove': list_trove, 'journals from Bealls list': list_pre})
comparison

Unnamed: 0,journals from Trove,journals from Bealls list
0,Oasis,OASIS)
1,Dance research journal,Advanced Research Journals
2,American research journal,American Research Journals
3,RSC journals,ARC Journals
4,ARN journal,ARPN Journals
5,AIDScience,Avid Science
6,Computer Science Journal,Computer Science Journals
7,Esk journal,Eko Journal
8,European journal of educational studies,European Journals of Education Studies
9,International journal on computer science and ...,International Conference on Computer Science a...


    Now we can see that some of them are same, some we are not sure, some are different.
    Let's also save the results into a CSV file.

In [48]:
comparison.to_csv('compared_results.csv', index=False)

## Show the results in table with links
    As we found that there are some journals we are not sure if they are same in the previous part, now let's check to make the dicision.
    
    First, let's get to the webpage https://beallslist.weebly.com/ which we get the Beall's list. 
    Find the journals we are not sure by name to get into the journals official website.
    Then also use the TroveURL get from the list from Trove, to have a look of the Trove page of the journal.
    Now, we know which of them are the same ones. Put their information in to a CSV file, and load it into dataframe to see the results.

In [3]:
pd.set_option('max_colwidth',100)
#read journals from csv file
results = pd.read_csv("./results.csv", sep = ',')
results

Unnamed: 0,Title,Link
0,Scientific Journals International,http://www.scientificjournals.org/
1,Computer Science Journals,http://www.cscjournals.org/
2,Medical science,http://www.ghrnet.org/index.php/index/index
3,Newpubli,http://www.newpubli.com/index.shtml
4,Science and Technology Publishing,http://www.scitecpub.com/
5,Science Alert,https://scialert.net/
6,Open Journal Systems,http://ambs-journal.co.uk/ojs-2.4.7-1/index.php/index
7,American Research Journals,https://www.arjonline.org/
8,European Journals of Education Studies,https://oapub.org/edu/index.php/index


    Here it is! With the title and links for researchers to see clearly.