# Paul Mathews Irigi #

# Automated Research Gap Finder #

It is a significant work in industrial and academic research to detect research gaps. Manually scanning large volumes of literature is not efficient and time-consuming. This project suggests automating the task using CrewAI and Llama 3.2 to create a system that summarizes critical results from research papers, assesses existing literature for unresearched gaps, and offers new research directions based on the identified gaps.

This document has an extensive discussion of the implementation, including how the code is laid out and how every cell of the Jupyter Notebook works.

### Importing Necessary Libraries ###

In [1]:
import torch
import numpy as np

In [2]:
import time

In [4]:
import os
from crewai_tools import SerperDevTool, BrowserbaseLoadTool, EXASearchTool
from crewai import Agent
from crewai import Task
from crewai import LLM
from crewai import Crew, Process

In [5]:
from langchain_community.llms import Ollama


### Initializing Llama 3.2 ###

Following the importation of the necessary libraries, the next step is to initialize Llama 3.2, which is the language model used by the AI agents. The LLM is initialized using the Ollama class, with the "llama3.2" model. This allows the agents to generate structured responses and process textual information efficiently.

In [6]:
llm=LLM( model="ollama/llama3.2", base_url="http://localhost:11434" )

In [7]:
def timed_execution(task_function):
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = task_function(*args, **kwargs)
        end_time = time.time()
        execution_time = round(end_time - start_time, 2)
        return result, execution_time
    return wrapper

### Defining AI Agents ###

Following the language model initialization, the AI agents are defined. The agents each play a specific role in the research analysis process. The first agent, the Literature Extractor, pulls out key findings from existing research. The second agent, the Knowledge Gap Analyzer, is tasked with identifying gaps in research from the extracted summaries. The third agent, the Research Direction Recommender, suggests potential research questions that address the identified gaps. The agents run sequentially, with each step of the research gap identification process building upon the previous.

In [8]:
literature_extractor = Agent(
    role="Literature Extractor",
    goal="Generate and summarize key findings from existing research in a given field.",
    backstory=(
        "A highly skilled research assistant capable of synthesizing knowledge "
        "from various sources and providing structured insights."
    ),
    llm=llm,
    allow_delegation=False
)

In [9]:
gap_analyzer = Agent(
    role="Knowledge Gap Analyzer",
    goal="Identify research gaps and unanswered questions based on generated summaries.",
    backstory=(
        "An AI-driven analyst with deep expertise in recognizing missing knowledge areas "
        "within existing literature."
    ),
    llm=llm,
    allow_delegation=False
)

In [10]:
research_recommender = Agent(
    role="Research Direction Recommender",
    goal="Propose future research directions based on identified gaps.",
    backstory=(
        "A research strategist who formulates new research questions and aligns them "
        "with academic and industry trends."
    ),
    llm=llm,
    allow_delegation=False
)

### Defining Tasks ###

After defining the agents, the work that they will do must be specified. The Literature Extractor has the work of summarizing research findings, including methodologies and primary conclusions. The Knowledge Gap Analyzer finds at least three research gaps from the extracted summaries. Finally, the Research Direction Recommender suggests three to five research questions to cover the gaps found. These methods accept an expected_output parameter to ensure that the agents produce well-formatted responses.

In [11]:
extract_task = Task(
    description=(
        "Using Llama 3.2, generate and summarize key findings, methodologies, and conclusions "
        "from existing research on a given topic."
    ),
    agent=literature_extractor,
    expected_output="A structured summary of key findings, methodologies, and conclusions from existing research."
)

In [12]:
analyze_task = Task(
    description=(
        "Analyze the generated research summaries and identify gaps where no clear consensus "
        "or research exists."
    ),
    agent=gap_analyzer,
    expected_output="A detailed list of missing knowledge areas or gaps in the reviewed research."
)

In [13]:
recommend_task = Task(
    description=(
        "Based on the identified research gaps, suggest novel research questions and potential "
        "study areas for further exploration."
    ),
    agent=research_recommender,
    expected_output="A set of research questions and study directions that address the identified gaps."
)

### Creating Crew ###

Once the tasks are determined, the agents are put together into a Crew that will make sure that the tasks are executed in sequence. The async_execution parameter is set to False so that the tasks will be run sequentially, and each agent waits for the other one's output. The verbose parameter is set to ensure detailed execution logs so that it would be easy to monitor.

In [14]:
research_gap_finder_crew = Crew(
    agents=[literature_extractor, gap_analyzer, research_recommender],
    tasks=[extract_task, analyze_task, recommend_task],
    verbose=True,  
    async_execution=True  
)

### Running the AI Workflow ###

In [15]:
results = research_gap_finder_crew.kickoff()

if isinstance(results, list): 
    for idx, result in enumerate(results, 1):
        print(f"\nStep {idx}: {result}\n")
else:
    print(f"\nResult: {results}\n")


[1m[95m# Agent:[00m [1m[92mLiterature Extractor[00m
[95m## Task:[00m [92mUsing Llama 3.2, generate and summarize key findings, methodologies, and conclusions from existing research on a given topic.[00m


[1m[95m# Agent:[00m [1m[92mLiterature Extractor[00m
[95m## Final Answer:[00m [92m
Research on the topic of "The Effects of Social Media on Mental Health in Young Adults" has yielded several key findings, methodologies, and conclusions. 

One study published in the Journal of Youth and Adolescence found that young adults who spent more time on social media were more likely to experience symptoms of depression and anxiety (Király et al., 2019). The researchers used a survey-based approach to collect data from a sample of 1,000 young adults aged 15-25.

Another study published in the journal Cyberpsychology, Behavior, and Social Networking found that social media use was positively correlated with feelings of loneliness and social isolation among young adults (Burke et

In [20]:
!pip install scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.6.1-cp310-cp310-win_amd64.whl.metadata (15 kB)
Collecting scipy>=1.6.0 (from scikit-learn)
  Downloading scipy-1.15.2-cp310-cp310-win_amd64.whl.metadata (60 kB)
Collecting joblib>=1.2.0 (from scikit-learn)
  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Downloading threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.6.1-cp310-cp310-win_amd64.whl (11.1 MB)
   ---------------------------------------- 0.0/11.1 MB ? eta -:--:--
   -------- ------------------------------- 2.4/11.1 MB 12.2 MB/s eta 0:00:01
   ----------------- ---------------------- 5.0/11.1 MB 12.1 MB/s eta 0:00:01
   --------------------------- ------------ 7.6/11.1 MB 13.1 MB/s eta 0:00:01
   ----------------------------------- ---- 10.0/11.1 MB 12.7 MB/s eta 0:00:01
   ---------------------------------------- 11.1/11.1 MB 12.0 MB/s eta 0:00:00
Downloading joblib-1.4.2-py3-n

In [31]:
!pip install sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-3.4.1-py3-none-any.whl.metadata (10 kB)
Downloading sentence_transformers-3.4.1-py3-none-any.whl (275 kB)
Installing collected packages: sentence-transformers
Successfully installed sentence-transformers-3.4.1


In [22]:
!pip install textstat

Collecting textstat
  Downloading textstat-0.7.5-py3-none-any.whl.metadata (15 kB)
Collecting pyphen (from textstat)
  Downloading pyphen-0.17.2-py3-none-any.whl.metadata (3.2 kB)
Collecting cmudict (from textstat)
  Downloading cmudict-1.0.32-py3-none-any.whl.metadata (3.6 kB)
Downloading textstat-0.7.5-py3-none-any.whl (105 kB)
Downloading cmudict-1.0.32-py3-none-any.whl (939 kB)
   ---------------------------------------- 0.0/939.4 kB ? eta -:--:--
   --------------------------------------- 939.4/939.4 kB 10.9 MB/s eta 0:00:00
Downloading pyphen-0.17.2-py3-none-any.whl (2.1 MB)
   ---------------------------------------- 0.0/2.1 MB ? eta -:--:--
   ----------------------------------- ---- 1.8/2.1 MB 11.2 MB/s eta 0:00:01
   ---------------------------------------- 2.1/2.1 MB 9.0 MB/s eta 0:00:00
Installing collected packages: pyphen, cmudict, textstat
Successfully installed cmudict-1.0.32 pyphen-0.17.2 textstat-0.7.5


In [32]:
import time
import numpy as np
from sklearn.metrics import precision_score, recall_score
from textstat import flesch_reading_ease
from sentence_transformers import SentenceTransformer

In [33]:
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [48]:
def generate_embedding(text):
    return embedding_model.encode(text, convert_to_numpy=True)

def evaluate_relevance(results, known_gaps):
    result_embeddings = np.array([generate_embedding(gap) for gap in results])
    known_embeddings = np.array([generate_embedding(gap) for gap in known_gaps])

    similarities = [
        max(np.dot(result_emb, known_embeddings.T) / 
            (np.linalg.norm(result_emb) * np.linalg.norm(known_embeddings, axis=1))) 
        for result_emb in result_embeddings
    ]
    
    return round(np.mean(similarities), 2)


In [49]:
def evaluate_readability(results):
    scores = [flesch_reading_ease(gap) for gap in results]
    return round(np.mean(scores), 2)

def calculate_precision_recall(results, true_gaps):
    predicted = [1 if gap in true_gaps else 0 for gap in results]
    actual = [1] * len(true_gaps) + [0] * (len(results) - len(true_gaps))
    
    precision = precision_score(actual, predicted, zero_division=1)
    recall = recall_score(actual, predicted, zero_division=1)
    
    return round(precision, 2), round(recall, 2)

def check_diversity(results):
    unique_terms = set()
    for gap in results:
        unique_terms.update(gap.split())
    
    return round(len(unique_terms) / sum(len(gap.split()) for gap in results), 2)

In [50]:
known_gaps = ["AI in supply chain risk management", 
              "Bias in predictive maintenance models", 
              "Data privacy in smart manufacturing"]

true_gaps = ["AI in supply chain risk management", 
             "Bias in predictive maintenance models"]

results = ["AI in supply chain risk management", 
           "Bias in predictive maintenance models", 
           "Blockchain applications in manufacturing"]


In [53]:
start_eval_time = time.perf_counter()

performance_metrics = {
    "Relevance Score": evaluate_relevance(results, known_gaps),
    "Precision": calculate_precision_recall(results, true_gaps)[0],
    "Recall": calculate_precision_recall(results, true_gaps)[1],
    "Diversity Score": check_diversity(results)
}

In [54]:
print("\n Performance Metrics:")
for metric, value in performance_metrics.items():
    print(f"{metric}: {value}")


 Performance Metrics:
Relevance Score: 0.8299999833106995
Precision: 1.0
Recall: 1.0
Diversity Score: 0.87
