# Scholarly Journal Impact Factor Display 
Background: The impact factor is a concept created by librarians for optimizing shelf arrangement for scientific articles. It tells how much of an impact a specific journal has made on other scientific articles based on how often it is quoted in other articles. It is defined as the number of quotations referring to the journal in question divided by the number of articles in the journal. 

Purpose: The program serves as a template for reading a text file line by line and scrape the names of each journal and their impact factor. The journal names and their impact factors will be sorted, stored together, and displayed in the output console side by side with each other. The file can later be used for reading data from files that were made after data was collected from webscraping applications.

### Sources: 
- Flaig, Ruediger-Marcus. "Bioinformatics Programming in Python: A Practical Course for Beginners." WILEY-VCH Verlag GmbH & Co KGaA, 2008.
- "Insertion Sort." GeeksForGeeks.com, 21 Feb. 2025. Retrieved from URL: https://www.geeksforgeeks.org/python-program-for-insertion-sort/.
- "Filter() in Python." GeeksForGeeks.com, 11 Dec. 2024. Retrieved from URL: https://www.geeksforgeeks.org/filter-in-python/.
- "Python sys Module." GeeksForGeeks.com, 18 Nov. 2023. Retrieved from URL: https://www.geeksforgeeks.org/python-sys-module/.
- "sort." Python Reference (The Right Way). Retrieved from URL: https://python-reference.readthedocs.io/en/latest/docs/list/sort.html.

In [145]:
import string, sys # load modules for strings and system access. sys is used for scripting purposes in a command line terminal

In [96]:
delim = 25 # "magic number" for formatting, value is determined by the format of the file

In [98]:
# Getting the text file
fil, jlist, rlist = open("impact-factor.txt", "r"), [], []

In [108]:
for line in fil: # scan the file line by line to create a list of tuples
    if line != "" or line != " ": # skip empty lines if there are any
        journal, fact = line.strip(line[delim:]), line.strip(line[:delim])
        # cut line
        if fact == "" or fact == " ": fact = "0.0" # used if the journal does not have a recorded impact factor
        jlist += [(journal, float(fact))] # build a list of (journal,impact) itmes 

In [110]:
jlist

[('AARGH BULL', 1.419),
 ('ABHOR MATH SEM HAMBURG', 0.115),
 ('ABOMINABLE IMAGING', 0.891),
 ('AC/DC', 2.904),
 ('ACRASIOLOGIA', 0.095),
 ('ACCOUNTS IRREPROD RES', 11.795),
 ('ADVANCED DARK ARTS J', 1.748),
 ('ZUCKER & EPILEPTIKER', 0.003),
 ('ZYZZYX LETTERS', 0.815)]

In [104]:
# Screen the jlist for presence of keywords
for j in jlist:
    acceptable = True
    for arg in sys.argv[1:]:
        (a,b) = j # "a" is name, "b" is impact factor
        if a.find(arg) < 0: # if substring is not found, returns a -1, indicating false
            acceptable = False # no match was found in jlist for keyword
            break # no need to test for other args
    if acceptable: rlist += [j] # if found, adds the item to rlist

In [125]:
rlist

[]

In [106]:
# Displays rlist, which now holds all the "(name, impact)" tuples of all the journals matching the inputted query
list.sort(rlist)
for item in rlist:
    print("%20s ---> %2.3f" % item)

### The Functional Reusable Code

In [241]:
def getfile(fname, delim):
    """Read the fname file and converts it into a list of tuples"""
    
    fil, jlist = open(fname, "r"), []
    
    for line in fil: # scan the file line by line to create a list of tuples
        if line != "" or line != " ": # skip empty lines if there are any
            journal, fact = line.strip(line[delim:]), line.strip(line[:delim])
            # cut line
            if fact == "" or fact == " ": fact = "0.0" # used if the journal does not have a recorded impact factor
            jlist += [(journal, float(fact))] # build a list of (journal,impact) items
            
    return jlist

In [243]:
def jn_contains (j,t):
    """Checks to see if the journal name contains the search term, 
    and returns a True or False value."""
    (a,b) = j
    return a.find(t) >= 0 

In [245]:
def ldump(l):
    """Prints the filtered list"""
    
    def insertionSort(l):
        """Judgement function for sorting. Takes 2 tuples and compares the second component of them.
        Used for sorting the list in terms of impact factor instead of by names."""
        n = len(l)
    
        if n <= 1:
            return # list is already sorted if it has 0 or 1 elements
        
        for i in range(1, n): # iterate over the list, starting from the second element
            key = l[i] # stores the current element as the key to be inserted in the right position
            j = i - 1
            while j >= 0 and key[1] < l[j][1]: # move elements greater than the key's impact factor one position ahead
                l[j+1] = l[j] # shift elements to the right
                j -= 1
            l[j+1] = key # insert the key with the higher impact factor value in the correct position

    # Sorting the list
    insertionSort(l)

    # Printing sorted list
    for item in l: print ("%30s ---> %2.3f" % item)

#### Main Program

In [248]:
jl = getfile("impact-factor.txt", 30)
jl

[('AARGH BULL                 1.4', 9.0),
 ('ABHOR MATH SEM HAMBURG     0.', 5.0),
 ('ABOMINABLE IMAGING         0.8', 91.0),
 ('AC/DC                      2.9', 4.0),
 ('ACRASIOLOGIA               0.0', 95.0),
 ('ACCOUNTS IRREPROD RES      11.', 795.0),
 ('ADVANCED DARK ARTS J       1.7', 48.0),
 ('ZUCKER & EPILEPTIKER       0.', 3.0),
 ('ZYZZYX LETTERS             0.8', 15.0)]

In [252]:
ldump(jl)

 ZUCKER & EPILEPTIKER       0. ---> 3.000
AC/DC                      2.9 ---> 4.000
 ABHOR MATH SEM HAMBURG     0. ---> 5.000
AARGH BULL                 1.4 ---> 9.000
ZYZZYX LETTERS             0.8 ---> 15.000
ADVANCED DARK ARTS J       1.7 ---> 48.000
ABOMINABLE IMAGING         0.8 ---> 91.000
ACRASIOLOGIA               0.0 ---> 95.000
ACCOUNTS IRREPROD RES      11. ---> 795.000


In [254]:
for arg in sys.argv[1:]: 
    jl = filter(lambda j:jn_contains(j, arg), jl)

The filter() function filters the given sequence by testing each element in the sequence given sequence to be true or not.

The line "for arg in sys.argv[1:]:" is used for command-line arguments passed to a Python script. 
- argv[0] is the name of the Python script
- argv[1:] are the arguments passed to the Python script 

(Eg)
##### bash
python script.py input.txt output.txt log.txt


##### Python
import sys

for arg in sys.argv[1:]:

    print(f"Argument: {arg}")


##### Output

Argument: input.txt
Argument: output.txt
Argument: log.txt