<h2 style="text-align:center;">ELICITATION OF FUNCTIONAL REQUIREMENT FEATURES: MODEL TESTING<br/></h2>

In [1]:
import os
from pdfminer.high_level import extract_text
from docx import Document
import streamlit as st

import numpy as np
from gensim.models import KeyedVectors
import joblib

import ipywidgets as widgets
from IPython.display import display

In [2]:
model = KeyedVectors.load_word2vec_format('Google-News-Vectors/GoogleNews-vectors-negative300.bin', binary=True)

In [3]:
keywords = joblib.load('keywords.joblib')['keywords'] # load saved requirement keywords

In [4]:
def convert_pdf_to_txt(file):
    text = extract_text(file)
    text_list = text.replace('\uf06c', '').replace('\x0c', '').split('\n\n')
    text_list = [i.replace('\n', '') for i in text_list]
    return text_list

In [5]:
def convert_docx_to_txt(file):
    doc = Document(file)
    texts = [paragraph.text for paragraph in doc.paragraphs]
    return texts

In [6]:
def sentence_vector(sentence, word2vec_model):
    # Calculate the vector representation of a sentence
    word_vectors = [word2vec_model[word] for word in sentence.split() if word in word2vec_model]
    if not word_vectors:
        return None
    return np.mean(word_vectors, axis=0)

In [7]:
def sentence_vector(sentence, word2vec_model):
    # Calculate the vector representation of a sentence
    word_vectors = [word2vec_model[word] for word in sentence.split() if word in word2vec_model]
    if not word_vectors:
        return None
    return np.mean(word_vectors, axis=0)

In [8]:
def calculate_similarity(word_list, sentence, word2vec_model):
    # Calculate the similarity between a list of words and a sentence
    word_list_vector = np.mean([word2vec_model[word] for word in word_list if word in word2vec_model], axis=0)
    sentence_vector_ = sentence_vector(sentence, word2vec_model)

    if word_list_vector is None or sentence_vector_ is None:
        return None

    similarity = np.dot(word_list_vector, sentence_vector_) / (np.linalg.norm(word_list_vector) * np.linalg.norm(sentence_vector_))
    return similarity

In [9]:
def main(file_path, threshold):    
    file_extension = os.path.splitext(file_path)[1]

    if file_extension == '.pdf':
        file_contents = convert_pdf_to_txt(file_path)
    elif file_extension == '.docx':
        file_contents = convert_docx_to_txt(file_path)
    else:
        print("Unsupported file type. Please use a .pdf or .docx file.")
        return None
    
    count = 0
    
    elicited_requirements = ""
    
    for sentence in file_contents:
        similarity = calculate_similarity(keywords, sentence, model)

        if len(sentence.split()) <= 60 and similarity and similarity >= threshold:
            elicited_requirements += f"Similarity Score: {similarity}"
            elicited_requirements += '\n\n'
            elicited_requirements += str(sentence) 
            elicited_requirements += '\n\n\n'
            
            print(f"Similarity Score: {similarity}", end='\n')
            print(sentence, end='\n\n')
            count += 1
    
    print(f"Total Extracted Functional Requirements: {count}")
    
    return elicited_requirements

In [10]:
if __name__ == "__main__":
    # Accepted formats: pdf, docx and txt
    file_path = "Papers/PROMOTION GUIDELINES 2021 APPROVED VERSION.pdf"
    
    elicited_requirements = main(file_path, threshold=0.7)
    
    with open('elicited_requirements.txt', 'w', encoding='utf-8') as file:
        file.write(elicited_requirements)

Similarity Score: 0.7014855146408081
Departmental  Appointments  and  Promotions  Panels  shall  have  a  minimum  of  five  (5) members and where there are not enough qualified staff in a Department, such a Department should co-opt Staff from relevant Departments to make up the minimum. 

Similarity Score: 0.7860649824142456
(f) Promotion cases that fail at the departmental level       Promotion cases that fail at the departmental level should not be forwarded to the Faculty A & P Panel.  In  such  a  case,  the  candidate  should  be  formally  informed  in  writing  by  the DepartmentalA & P Panelwithin 72 hoursof that decision stating the reasons for the failure of his/her case. 

Similarity Score: 0.7358455657958984
the  particular Faculty/Department;  (b)  academic  units  at  the  university  level,  research  centres (public and private) and professional bodies (academic and technical); and (c) others as may be decided upon from time-to-time. 

Similarity Score: 0.7394348978996

Similarity Score: 0.7374932765960693
d.  For  Part  II  promotion  candidates,  publications  that  are  not  in  print  as  at  the  time  of application  for  promotion  shall  not  constitute  more  than  10%  of  the  candidate‟s publications for assessment.  

Similarity Score: 0.7684040665626526
f.  Candidates  seeking  promotion  should  digitise  their  publications  (e.g.  articles published  in  a  local  journal  not  yet  indexed  that  ordinarily  will  not  be  visible  on Google Scholar and other citation indexing bodies) in the University Library.  

Similarity Score: 0.7257642149925232
h.  Candidates  seeking  promotion  should  make  clear  photocopies  of  their publications  in  the  University  Library  or  any  other  reputable  photocopying outlet. 

Similarity Score: 0.7236863374710083
5.0 ASSESSMENT OF PUBLICATIONS (i)  Each  Departmental  A&P  Panel  shall  do  a  paper-by-paper  narrative  on  its  candidates‟ publications  as  part  of  its  recommendations 

Similarity Score: 0.7352656126022339
Fellow Promotion from Lecturer I to the grade of Senior Lecturer may be made based on:  (a) A minimum of three years teaching experience;  (b) Adequate research;  (c) Adequate publications; and  (d) Possession of a PhD or its equivalent is mandatory for this category of staff.  

Similarity Score: 0.7846730351448059
(ii)  Recommendation  for  promotions  up  to  the  grade  of  Senior  Lectureship  shall  be considered  by  the  appropriate  Faculty/College  Panel,  which  shall  decide  on  the  said recommendation with internal assessors‟ reports as sufficient basis.  

Similarity Score: 0.7532494068145752
 (a) adequate experience, including where applicable, professional competence;  (b) outstanding research and publications;  (c) adequate teaching ability for a minimum of 3 years as a Senior Lecturer; and  (d) possession of a higher degree of PhD or its equivalent.  

Similarity Score: 0.7602995038032532
(a) Adequate experience, including where 