<a href="https://colab.research.google.com/github/Roopg/Research_Metrics/blob/main/Author_Citation_Metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



#### Objective:

Author metrics are used to quantify the reach and impact of a researcher's publications.

Calculate the following citation metrics for an author-

* Citations
* i10-index
* h-index


Input:  A list of citations for a particular authors citations. 

Output: Citations, i10-index, h-index

#### Citations Metric

* This is the total number of citations an author receives on all his publications.

In [None]:
def citations(A):
    return sum(auth_cit_num)

citations([34,12,11,23,20,45,1,22])

168

So total citations for this author is 168.

#### i10-Index Metric
* The number of publications with at least 10 citations. 

In [None]:
def i10_index(A):
    N=len(citation_list)
    i10=0
    for i in range(N):
        if A[i]>=10:
            i10=i10+1
    return i10

i10_index([34,12,11,23,20,45,1,22])

7

#### h-index or Hirsch index Metric

The h-index is the largest number h such that h publication have at least h citations each. 

For example, if an author has five publications, with 11, 17, 8, 2, and 1 citations (ordered from greatest to least), then the author's h-index is 3, because the author has three publications with 3 or more citations. 

Important Note: An author's h-index can only be as great as their number of publications. So if an author has N publications, h-index can not be more than N.


#### Approach 1: Sorting and Linear traversal

* Time Complexity: O(NlogN) 
* Space Complexity: O(N)

In [None]:
def hIndex_linear(citation_list):
    N=len(citation_list)
    h_index = 0

    # Step1: Sort the citation list in descending order
    citation_list.sort()


    #Step 2: Linearly traverse the list to compare how many papers are there with citations >= ith citation
    for i in range(N):  #0-7
        if citation_list[i]>=N-i:
            h_index=N-i# As we sorted the citation list in step 1, we know for a paper i, there are N-i papers that have citations same or more than itself.
            break
            
    return h_index

hIndex_linear([34,12,11,23,20,45,1,22])


7

#### Approach 2 : Sorting and Binary Search

* Time Complexity: O(NlogN) + O(logN)= O(nlogN)   *Search space reduced to half at each step.
* Space Complexity: O(1)

In [None]:
def hIndex_Binary(citation_list):
    
    citation_list.sort()
    N=len(citation_list)
    low=0
    mid=0
    h=0
    high=N-1
    h_index=0

    while(low<=high):
        mid=(low+high)//2
        if citation_list[mid]>=N-mid:
            h_index=N-mid
            high=mid-1 #search in the left as h_index will have higher value in N-i range (as i will be smaller)
        else:
            low=mid+1 # else we search on the right.
    return h_index


hIndex_Binary([34,12,11,23,20,45,1,22])

7

#### Approach 3 : Prefix Sum and Contribution technique

* Time Complexity: O(N)
* Space Complexity: O(N)
* No sorting required
* Prefix Sum Array and Carry Forward Technique

In [None]:
def hIndex_prefixsum(citation_list):
    
    """
    Step 1: Compute number of publications
    """

    N=len(citation_list)

    """
    Step 2: Create a empty prefix sum array that holds the posisble h_index values in the range [0,N]
    """
    
    hindex_array=[0]*(N+1)
   
    
    """
    # Step 3: Traverse over the citation list and for each citation, if the citation is >= N
     # then we add the h_index value to the element in the prefix sum array.
    """
     
    for c in citation_list:
        if c>=N:
            hindex_array[N]=hindex_array[N]+1 
        else:
            hindex_array[c]=hindex_array[c]+1
    """
    Step 4 : Assign the h_index to the total number of publications that an author has published (N). 
    """
    h_index= N
    
    """"
    Step 5: As each publication i can contribute to a possible value of h_index and it’s lower values, we sum up the counts in h_index array
    (prefix sum array) from reverse, and start iterating from the end of the h_index array and pick the first element for which the number of citations is greater or 
    equal than h_index.
    """
    
    while h_index>=1:
        hindex_array[h_index-1]+=hindex_array[h_index]
        print(f"h-index_array is {hindex_array}")

        if hindex_array[h_index]>=h_index:
            break
        else:
            h_index-=1

    print(f"Author with {N} publications has a h_index value of {h_index}")

    
hIndex_prefixsum([34,12,11,23,20,45,1,22])   

    

h-index_array is [0, 1, 0, 0, 0, 0, 0, 7, 7]
h-index_array is [0, 1, 0, 0, 0, 0, 7, 7, 7]
Author with 8 publications has a h_index value of 7


### Conclusion:

We discussed three approaches above to compute the h_index of a particular author. Out of these the third approach using prefix sum and carry forward technique is the most efficient approach in terms of time and space complexity.


In [None]:
!jupyter nbconvert "Author_Citation_Metrics.ipynb" --to pdf

[NbConvertApp] Converting notebook Author_Citation_Metrics.ipynb to pdf
[NbConvertApp] Writing 32760 bytes to notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', 'notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', 'notebook']
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 46810 bytes to Author_Citation_Metrics.pdf
