<center>
<h1 style="color: pink;">Welcome to Publish Mate 😊</h1>
</center>

Further improvements :
- feed back with like and dislike
- option to summerize using the whole paper (or other option to mention someone did it before)

## `00` Download Dependencies

In [152]:
# !pip3 install -U "crewai[tools,agentops]"

In [153]:
# !pip3 install python-dotenv
# !pip3 install gcloud
# !pip3 install google-genai

## `01` Import Libraries

In [154]:
import os

from dotenv import load_dotenv
from typing import List, Dict

from pydantic import BaseModel, Field

import google.generativeai as genai

from crewai import Agent, Task, Crew, LLM
from crewai.tools import tool

import agentops

from tavily import TavilyClient


## `02` load api key

In [155]:
load_dotenv()  # Load from .env

True

In [156]:
AGENTOPS_API_KEY = os.getenv("AGENTOPS_API_KEY") # replace by yours

In [157]:
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
PROJECT_ID = os.getenv("PROJECT_ID")
PROJECT_NAME = os.getenv("PROJECT_NAME")

TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")

# SERPER_API_KEY = os.getenv("SERPERDEV_API_KEY")
# os.environ["SERPER_API_KEY"] = SERPER_API_KEY
# os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

genai.configure(api_key=OPENAI_API_KEY)

## `03` Start AgentOps session

In [158]:
agentops.init(api_key=AGENTOPS_API_KEY,
               skip_auto_end_session=True, # Set to True to skip auto ending the session
               default_tags=['crewai']
               ) 

<agentops.legacy.Session at 0x707e26416d10>

The link will help us to monitor our agents

### Make sure it works

In [159]:
# print("AgentOps session initialized.")
# print(agentops.session)  # optional, shows session info if available
# print(agentops.__dict__)

## `04` Intro of the Crew

In [160]:
intro_prompt = (
    "Welcome to PublishMate! I am your research assistant mate here to help you with your academic paper journey.\n"
    "I will guide you step-by-step to find trending topics, recent papers, summaries, "
    "research gaps, and help with paper writing. \nLet's get started!\n"
)

def welcome_message():
    print(intro_prompt)

# Run this at the very beginning
welcome_message()

Welcome to PublishMate! I am your research assistant mate here to help you with your academic paper journey.
I will guide you step-by-step to find trending topics, recent papers, summaries, research gaps, and help with paper writing. 
Let's get started!



## `05` Set Output dir

In [161]:
output_dir = './PublishMate_agent_ouput'
os.makedirs(output_dir, exist_ok=True)

## `06` LLM will be used

In [162]:
basic_llm = LLM(
    model="gemini/gemini-1.5-flash",
    temperature=0.2,
    provider="google_ai_studio",
    api_key=os.environ["GEMINI_API_KEY"]
)

## `07` START AGENTS

### `7.1` Agent 1: Trending Topics Agent 

In [163]:
# !gcloud init

In [164]:
user_input = input("Enter your research field or keyword: ")

In [165]:
class TrendingTopicsOutput(BaseModel):
    topics: List[Dict[str, str]] = Field(..., title="Trending topics with description", min_items=1)

trending_topics_agent = Agent(
    role="Trending Topics Identification Agent",

    goal="\n".join([
        f"You are an expert research assistant that identifies the latest trending topics in the field of {user_input} only focus on it .",
        "Generate a detailed list of the top 3-5 trending topics or recent articles reflecting advances and high interest in this field.",
        "Base your answer on recent publication trends, conferences, or journal articles.",
        "Do not include unrelated or general topics.",
        "Output only a JSON object with a 'topics' list containing objects with 'name' and 'description'."
    ]),
    backstory="Designed to guide users by providing the most relevant and current trending research topics in their specified field.",
    llm=basic_llm,
    verbose=True,
)

trending_topics_task = Task(
    description="\n".join([
        f"you are an expert in a {user_input} field to help beginner researchers in their writings .",
        "Provide a list of 3 to 5 trending topics or articals with a brief description for each.",
        "Focus on recent research interests supported by publication trends.",
        "Output in JSON format with 'topics' as list of objects {name, description}."
    ]),
    expected_output="JSON object with list of trending topics and descriptions.",
    output_json=TrendingTopicsOutput,
    output_file=os.path.join(output_dir, "step_1_trending_topics.json"),
    agent=trending_topics_agent,
)

### `7.2` Agent 2: Recent Papers Retrieval Agent

In [166]:
search_client = TavilyClient(api_key=TAVILY_API_KEY)
 
@tool
def search_engine_tool(query: str):
    """Useful for search-based queries. Use this to find current information about any query related pages using a search engine"""
    return search_client.search(query)


In [167]:
class PaperInfo(BaseModel):
    title: str  
    year: int 
    url: str
    abstract: str                                   


class RecentPapersOutput(BaseModel):
    topic_papers: Dict[str, List[PaperInfo]] = Field(..., title="Recent papers grouped by topic")

recent_papers_agent = Agent(
    role="Recent Papers Retrieval Agent",

    goal = "\n".join([
        "You are a research paper search assistant.",
        "Given a list of trending topics, retrieve 3 recent, relevant publications per topic.",
        "Select papers from reputable sources published within the last 2 years.(2023 or 2024 or 2025)",
        "Provide title, authors, abstract, year, and valid URL for each paper.",
        "the URL must be valid and accessible.",
        "If no recent paper is available, state 'No recent papers found' for that topic.",
        "Output in JSON format grouped by topic."]),

    backstory="Helps beginner researchers quickly discover and review the latest relevant publications across the trending topics with the URLs that are valid and some info.",

    llm=basic_llm,
    
    verbose=True,
)

recent_papers_task = Task(
    description="\n".join([
        "Input is a list of trending topics.",
        "For each topic, find 3 papers with title, authors, abstract, year, and link which should be valid and accessable.",
        "Select papers from reputable journals or conferences (IEEE, Springer, Elsevier, ICRA, IROS, actual arXiv).",
        "Only include papers published in 2023 or 2024 or 2025.",
        "Get the abstract of the paper as it is in the paper or the site to help the agents after you, bring a good clean text."
        "Focus on papers from last 2 years from reputable conferences or journals.",
        "If no recent paper is available, state 'No recent papers found' for that topic.",
        "Output JSON grouped by topic."
    ]),
    expected_output="JSON with topics as keys and list of paper info objects as values.",
    output_json=RecentPapersOutput,
    output_file=os.path.join(output_dir, "step_2_recent_papers.json"),
    agent=recent_papers_agent,
    tools=[search_engine_tool],
    
)


### `7.3` Agent 3: Research Gap and Suggestion Agent

In [168]:
class ResearchGapOutput(BaseModel):
    research_gaps: List[str] = Field(..., title="List of research gaps and suggestions")

research_gap_agent = Agent(
    role="Research Gap Identification and Suggestion Agent",
    goal="\n".join([
        "Analyze summaries to identify gaps, limitations, and propose research directions or improvements.",
        "Use a friendly and encouraging tone suitable for beginners.",
        "You will be given the data about the papers about that topic 3 papers for each topic with their year, abstaract, url, title .",
        "analye the abstract to guess and detect gaps ",
        "Suggest these Gaps to the writer to can start from"
    ]),
    backstory="Helps users find novel contributions by highlighting unexplored areas and providing ideas.",
    llm=basic_llm,
    verbose=True,
)

research_gap_task = Task(
    description="\n".join([
        "Input is paper summaries.",
        "Output a list of research gaps, limitations, and suggestions for future research.",
        "Encourage beginners by providing feasible ideas."
        "You will be given the data about the papers about that topic 3 papers for each topic with their year, abstaract, url, title .",
        "analye the abstract to guess and detect gaps ",
        "Suggest these Gaps to the writer to can start from"
    ]),
    expected_output="JSON list of research gaps and improvement suggestions.",
    output_json=ResearchGapOutput,
    output_file=os.path.join(output_dir, "step_4_research_gaps.json"),
    agent=research_gap_agent,
)

### `07.4 Phase 1`: Run the first 3 tasks (up to research_gap_task)


In [171]:
first_crew = Crew(
    name="PublishMate Crew - Phase 1",
    description="Run up to research gap analysis.",
    agents=[
        trending_topics_agent,
        recent_papers_agent,
        research_gap_agent,
    ],
    tasks=[
        trending_topics_task,
        recent_papers_task,
        research_gap_task,
    ],
)

first_result = first_crew.kickoff()
print(first_result)

[1m[95m# Agent:[00m [1m[92mTrending Topics Identification Agent[00m
[95m## Task:[00m [92myou are an expert in a AI field to help beginner researchers in their writings .
Provide a list of 3 to 5 trending topics or articals with a brief description for each.
Focus on recent research interests supported by publication trends.
Output in JSON format with 'topics' as list of objects {name, description}.[00m


[1m[95m# Agent:[00m [1m[92mTrending Topics Identification Agent[00m
[95m## Final Answer:[00m [92m
{
  "topics": [
    {
      "name": "Large Language Models (LLMs) and their limitations",
      "description": "Research is intensely focused on improving LLMs' reasoning, factual accuracy, and mitigating biases.  Recent work explores techniques like chain-of-thought prompting,  improving training data, and developing methods for explainability and interpretability to address their limitations and potential harms."
    },
    {
      "name": "Generative AI and its societ

🖇 AgentOps: [34m[34mSession Replay for default.session trace: https://app.agentops.ai/sessions?trace_id=5f42080de7957850d17c888108092273[0m[0m


{'research_gaps': ['**Large Language Models (LLMs) and their limitations:**\n\n* **Quantifying and comparing hallucination rates across different LLMs:** While the abstracts mention hallucinations, a systematic comparison of hallucination rates across various LLMs under different prompting strategies and datasets is lacking.  Future research could develop standardized benchmarks and metrics for evaluating hallucination, enabling a more precise understanding of the problem and facilitating the development of mitigation techniques.\n* **Developing robust methods for detecting and correcting factual inaccuracies:** Current research focuses on identifying biases and adversarial attacks, but less attention is given to developing automated methods for detecting and correcting factual errors in LLM outputs.  This could involve combining LLMs with external knowledge bases or fact-checking systems.\n* **Investigating the impact of different training data on LLM performance and biases:** The abs

In [172]:
import os
import json

os.chdir("/home/israa/Desktop/PublishMate_CrewAgents")

def read_json_file(filepath):
    if os.path.exists(filepath):
        with open(filepath, "r") as f:
            return json.load(f)
    else:
        print(f"File not found: {filepath}")
        return None

trending_topics_path = "PublishMate_agent_ouput/step_1_trending_topics.json"
recent_papers_path = "PublishMate_agent_ouput/step_2_recent_papers.json"
research_gaps_path = "PublishMate_agent_ouput/step_4_research_gaps.json"

trending_topics = read_json_file(trending_topics_path)
recent_papers = read_json_file(recent_papers_path)
research_gaps = read_json_file(research_gaps_path)

# Print trending topics nicely
if trending_topics and "topics" in trending_topics:
    print("Trending Topics:")
    for topic in trending_topics["topics"]:
        print(f"- {topic['name']}: {topic['description']}\n")

# Print recent papers by topic
if recent_papers and "topic_papers" in recent_papers:
    print("Recent Papers by Topic:")
    for topic, papers in recent_papers["topic_papers"].items():
        print(f"{topic}:")
        if papers:
            for paper in papers:
                print(f"  * {paper}")
        else:
            print("  No papers found.")
        print()

# Print research gaps clearly
if research_gaps and "research_gaps" in research_gaps:
    print("Research Gaps:")
    for gap in research_gaps["research_gaps"]:
        print(f"- {gap}\n")


Trending Topics:
- Large Language Models (LLMs) and their limitations: Research is intensely focused on improving LLMs' reasoning, factual accuracy, and mitigating biases.  Recent work explores techniques like chain-of-thought prompting,  improving training data, and developing methods for explainability and interpretability to address their limitations and potential harms.

- Generative AI and its societal impact: The rapid advancement of generative AI, particularly in image, text, and code generation, has sparked significant debate about its ethical implications, copyright issues, and potential misuse for malicious purposes (e.g., deepfakes).  Research is exploring responsible AI development, mitigation strategies, and policy recommendations.

- AI for Science: AI is revolutionizing scientific discovery across various domains.  Trending research includes using AI for drug discovery, materials science, climate modeling, and fundamental physics research.  This involves developing speci

## `08` Crew 2 :

### `8.1` Agent 4 : Search about chosen gab Agent :

In [20]:
# 💬 Get user input
chosen_topic = input("Which topic did you get interested in more? ")
chosen_gap = input("Which gap do you like to start looking for ^-^? ")

In [21]:
class ResearchGapSection(BaseModel):
    section: str
    tips: str

class ResearchGapOutput(BaseModel):
    research_steps: List[ResearchGapSection] = Field(..., title="Research gap focused steps and tips")

research_starting_points_agent = Agent(
    role="Research Gap Exploration Agent",
    goal="\n".join([
        f"Provide a detailed and clear set of specific research starting points based on the chosen {chosen_gap} in the {chosen_topic}.",
        "Include practical and beginner-friendly tips for each step to help users start their research.",
        "Focus on actionable tasks tied directly to the selected gap (e.g., watermarking, hallucination, bias).",
        "Motivate users by giving confidence and clear direction."
    ]),
    backstory="Helps users dive into LLM research by breaking down complex gaps into simple, actionable steps.",
    llm=basic_llm,
    verbose=True,
)

research_starting_points_task = Task(
    description="\n".join([
        f"Input: the chosen research gap {chosen_gap} in the topic {chosen_topic} .",
        "Output: a structured list of specific research steps with detailed tips for each step.",
        "Goal: help beginners understand what to do first, what resources to use, and how to progress in a steps."
    ]),
    expected_output="JSON list of steps with detailed beginner tips.",
    output_json=ResearchGapOutput,
    output_file=os.path.join(output_dir, "step_research_gap.json"),
    agent=research_gap_agent,
)


### `8.2` Agent 5: Paper Structure and Writing Guide Agent

In [22]:
# Input: specific research steps from previous agent
class ResearchGapSection(BaseModel):
    section: str
    tips: str

# Output: paper structure with tips for writing
class PaperStructureSection(BaseModel):
    section: str
    tips: str

class PaperStructureOutput(BaseModel):
    paper_structure: List[PaperStructureSection] = Field(..., title="Paper structure sections and writing tips")

paper_structure_agent = Agent(
    role="Paper Structure and Writing Guide Agent",
    goal="\n".join([
        "Take research steps as input and produce a paper outline that reflects them.",
        "For each section in the paper, provide clear writing tips tailored to the input research.",
        "Help beginners turn their research process into a coherent academic paper.",
        "Add encouragement and make the structure simple to follow."
    ]),
    backstory="Transforms research plans into a proper academic paper structure with beginner tips.",
    llm=basic_llm,
    verbose=True,
)

paper_structure_task = Task(
    description="\n".join([
        "Input: List of research steps (sections with tips) from a research gap agent.",
        "Output: Structured academic paper outline based on those steps.",
        "Include tips for writing each section clearly and effectively.",
        "Make it easy to follow for someone new to academic writing."
    ]),
    expected_output="JSON list of paper sections with writing advice.",
    output_json=PaperStructureOutput,
    output_file=os.path.join(output_dir, "step_5_paper_structure.json"),
    agent=paper_structure_agent,
)


### `8.3` Agent 6: Related work draft (overview) Agent

In [23]:
class RelatedWorkOutput(BaseModel):
    related_work: str = Field(..., title="Composed related work section")

related_work_agent = Agent(
    role="Related Work Composer Agent",
    goal="\n".join([
        "Compose a comprehensive 'Related Work' section using the paper summaries.",
        "Group by themes, mention each paper's contribution.",
        "Maintain academic tone and cite like (e.g., 'Smith et al. 2023').",
        f"you have earlier the {chosen_topic} and {chosen_gap} related papers so you can write about them."

    ]),
    backstory="Helps users create strong literature review related content.",
    llm=basic_llm,
    verbose=True,
)

related_work_task = Task(
    description="\n".join([
        f"Input: list of paper summaries about {chosen_topic} in a {chosen_gap}.",
        "Group the papers realated from the recent paper agent and write a clear Related Work section.",
        "Use academic tone, smooth transitions, and citation style.",
        "Output a single string."
    ]),
    expected_output="Single string of the Related Work section.",
    output_json=RelatedWorkOutput,
    output_file=os.path.join(output_dir, "step_6_related_work.json"),
    agent=related_work_agent,
)


### `8.4` Agent 7: Paper draft Agent

In [24]:
class DraftOutput(BaseModel):
    draft: str = Field(..., title="Full academic paper draft text")

draft_writer_agent = Agent(
    role="Academic Paper Drafting Agent",
    goal="\n".join([
        f"Write a full academic paper draft using the structure, topic{chosen_topic}, research gap {chosen_gap}, and related work.",
        "Ensure clarity, academic tone, and smooth transitions.",
        "Support beginners by avoiding jargon and including helpful examples.",
    ]),
    backstory="Turns raw research insights into a complete paper draft.",
    llm=basic_llm,
    verbose=True,
)

draft_writer_task = Task(
    description="\n".join([
        f"Input is: topic{chosen_topic}, paper structure + tips and starting points + research gap {chosen_gap}  + related work.",
        "Use them to generate a coherent draft of the academic paper.",
        "Output in well-organized academic format (Intro, Method, etc.)."
    ]),
    expected_output="String containing the full paper draft.",
    output_json=DraftOutput,
    output_file=os.path.join(output_dir, "step_7_paper_draft.json"),
    agent=draft_writer_agent,
)

### `8.5 Phase 2`: Continue with remaining tasks


In [25]:
# Phase 2: Continue with remaining tasks
second_crew = Crew(
    name="PublishMate Crew - Phase 2",
    description="Suggest research starting points based on user-selected gap/topic.",
    agents=[
        research_starting_points_agent,
        paper_structure_agent,
        related_work_agent,
        draft_writer_agent
    ],
    tasks=[
        research_starting_points_task,
        paper_structure_task,
        related_work_task,
        draft_writer_task
    ],
)

second_result = second_crew.kickoff()
print(second_result)


[1m[95m# Agent:[00m [1m[92mResearch Gap Identification and Suggestion Agent[00m
[95m## Task:[00m [92mInput: the chosen research gap While research explores efficient retrieval methods, there's a gap in understanding the optimal balance between retrieval speed and accuracy, especially for very large knowledge bases. in the topic Improving Retrieval Efficiency in RAG Systems .
Output: a structured list of specific research steps with detailed tips for each step.
Goal: help beginners understand what to do first, what resources to use, and how to progress in a steps.[00m


[1m[95m# Agent:[00m [1m[92mResearch Gap Identification and Suggestion Agent[00m
[95m## Final Answer:[00m [92m
{
  "research_steps": [
    {
      "section": "1. Defining 'Optimal' Balance: A Benchmarking Framework",
      "tips": "Start by clearly defining what constitutes an 'optimal' balance between speed and accuracy. This isn't a single number; it depends on the application.  For example, a medical

🖇 AgentOps: [34m[34mSession Replay for default.session trace: https://app.agentops.ai/sessions?trace_id=5f42080de7957850d17c888108092273[0m[0m


{'draft': "## Improving Retrieval Efficiency in RAG Systems: A Comparative Study of Speed and Accuracy Trade-offs\n\n**Abstract**\n\nRetrieval Augmented Generation (RAG) systems rely heavily on efficient and accurate information retrieval.  While numerous methods exist, finding the optimal balance between retrieval speed and accuracy, especially for large knowledge bases, remains a significant challenge. This paper addresses this gap by presenting a comprehensive benchmarking framework to evaluate various retrieval techniques, including BM25, TF-IDF, and dense retrieval methods. We conduct a comparative study, exploring both individual methods and hybrid approaches that combine their strengths.  Our analysis focuses on the speed-accuracy trade-off, investigating the impact of hyperparameter optimization and scalability on increasingly large knowledge bases.  We identify best-performing methods and discuss the limitations of our approach, highlighting directions for future research in o

## Additionals Agents:

### Paper Summarization Agent (optional)

In [26]:
# class PaperSummariesOutput(BaseModel):
#     summaries: Dict[str, str] = Field(
#         ..., 
#         title="Paper title mapped to its summary", 
#         description="Each item has 'title' and 'summary'."
#     )

# paper_summarization_agent = Agent(
#     role="Academic Paper Summarization Agent",
#     goal="\n".join([
#         "Summarize each research paper into a detailed 120-150 word paragraph.",
#         "Mention the full paper title before the summary.",
#         "Focus on: main research problem, methodology, key findings, unique contributions.",
#         "Highlight any datasets, models, or diagrams used (in the paper).",
#         "Avoid generic descriptions. Be specific about what the paper achieves."
#     ]),
#     backstory="Provides clear and informative summaries to help users understand research papers quickly even if they are beginners.",
#     llm=basic_llm,
#     verbose=True,
# )

# paper_summarization_task = Task(
#     description="\n".join([
#         "Input is a list of papers with metadata and abstracts.",
#         "Produce a summary for each paper highlighting key points and visuals if any.",
#         "Output JSON mapping paper titles to summaries."
#     ]),
#     expected_output="JSON object mapping paper titles to summaries.",
#     output_json=PaperSummariesOutput,
#     output_file=os.path.join(output_dir, "step_3_paper_summaries.json"),
#     agent=paper_summarization_agent,
# )


In [46]:
import os
import json

os.chdir("/home/israa/Desktop/PublishMate_CrewAgents")

def read_json_file(filepath):
    if os.path.exists(filepath):
        with open(filepath, "r") as f:
            return json.load(f)
    else:
        print(f"File not found: {filepath}")
        return None

trending_topics_path = "PublishMate_agent_ouput/outputs/step_1_trending_topics.json"
recent_papers_path = "PublishMate_agent_ouput/outputs/step_2_recent_papers.json"
research_gaps_path = "PublishMate_agent_ouput/outputs/step_4_research_gaps.json"

trending_topics = read_json_file(trending_topics_path)
recent_papers = read_json_file(recent_papers_path)
research_gaps = read_json_file(research_gaps_path)

# Print trending topics nicely
if trending_topics and "topics" in trending_topics:
    print("Trending Topics:")
    for topic in trending_topics["topics"]:
        print(f"- {topic['name']}: {topic['description']}\n")

# Print recent papers by topic
if recent_papers and "topic_papers" in recent_papers:
    print("Recent Papers by Topic:")
    for topic, papers in recent_papers["topic_papers"].items():
        print(f"{topic}:")
        if papers:
            for paper in papers:
                print(f"  * {paper}")
        else:
            print("  No papers found.")
        print()

# Print research gaps clearly
if research_gaps and "research_gaps" in research_gaps:
    print("Research Gaps:")
    for gap in research_gaps["research_gaps"]:
        print(f"- {gap}\n")


Trending Topics:
- Multimodal RAG: Integrating various data modalities (text, images, audio, video) into RAG systems to enhance knowledge retrieval and response generation.  Research focuses on effective fusion techniques and handling different data types within a unified framework.

- Chain-of-Thought Prompting for RAG: Improving the reasoning capabilities of RAG systems by employing chain-of-thought prompting. This technique guides the large language model (LLM) to break down complex questions into smaller, manageable steps, leading to more accurate and explainable answers.

- Efficient and Scalable RAG Architectures: Developing efficient and scalable RAG architectures to handle large knowledge bases and high query loads.  This involves exploring techniques like vector databases, approximate nearest neighbor search, and optimized retrieval methods.

- RAG for Complex Reasoning Tasks: Applying RAG to complex reasoning tasks that require multiple steps of inference and knowledge integr

In [2]:
import re

url = "https://docs.google.com/forms/d/e/1FAIpQLSdiaaP9YJemZqlKky8z109JcR7E34O6iatezaKPa1aHbbUAqg/viewform"

pattern = r"/forms/d/e/([a-zA-Z0-9-_]+)"
match = re.search(pattern, url)

if match:
    form_id = match.group(1)
    print("Form ID:", form_id)
else:
    print("No form ID found.")


Form ID: 1FAIpQLSdiaaP9YJemZqlKky8z109JcR7E34O6iatezaKPa1aHbbUAqg
