<center>
<h1 style="color: pink;">Welcome to Publish Mate 😊</h1>
</center>

Further improvements :
- feed back with like and dislike
- option to summerize using the whole paper (or other option to mention someone did it before)

## `00` Download Dependencies

In [2]:
# !pip3 install -U "crewai[tools,agentops]"

In [3]:
# !pip3 install python-dotenv
# !pip3 install gcloud
# !pip3 install google-genai

## `01` Import Libraries

In [4]:
from dotenv import load_dotenv
import os
import google.generativeai as genai

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
from crewai import Agent, Task, Crew, Process, LLM
from crewai.tools import tool
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
from crewai.llms.base_llm import BaseLLM

from pydantic import BaseModel, Field, HttpUrl
from typing import List, Dict

import agentops
import json
import gcloud
import google.generativeai as genai
from dotenv import load_dotenv
from vertexai.preview.generative_models import Content, Part

from tavily import TavilyClient
from crewai.tools import tool


## `02` load api key

In [6]:
load_dotenv()  # Load from .env

True

In [7]:
AGENTOPS_API_KEY = os.getenv("AGENTOPS_API_KEY") # replace by yours

In [8]:
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
PROJECT_ID = os.getenv("PROJECT_ID")
PROJECT_NAME = os.getenv("PROJECT_NAME")

TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")

# SERPER_API_KEY = os.getenv("SERPERDEV_API_KEY")
# os.environ["SERPER_API_KEY"] = SERPER_API_KEY
# os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

genai.configure(api_key=OPENAI_API_KEY)

## `03` Start AgentOps session

In [9]:
agentops.init(api_key=AGENTOPS_API_KEY,
               skip_auto_end_session=True, # Set to True to skip auto ending the session
               default_tags=['crewai']
               ) 

🖇 AgentOps: [34m[34mSession Replay for default trace: https://app.agentops.ai/sessions?trace_id=ec530ef96d76d2aa79906cd289bedb02[0m[0m


<agentops.legacy.Session at 0x7eb954e984c0>

The link will help us to monitor our agents

### Make sure it works

In [10]:
# print("AgentOps session initialized.")
# print(agentops.session)  # optional, shows session info if available
# print(agentops.__dict__)

## `04` Intro of the Crew

In [11]:
intro_prompt = (
    "Welcome to PublishMate! I am your research assistant mate here to help you with your academic paper journey.\n"
    "I will guide you step-by-step to find trending topics, recent papers, summaries, "
    "research gaps, and help with paper writing. \nLet's get started!\n"
)

def welcome_message():
    print(intro_prompt)

# Run this at the very beginning
welcome_message()

Welcome to PublishMate! I am your research assistant mate here to help you with your academic paper journey.
I will guide you step-by-step to find trending topics, recent papers, summaries, research gaps, and help with paper writing. 
Let's get started!



## `05` Set Output dir

In [12]:
output_dir = './PublishMate_agent_ouput'
os.makedirs(output_dir, exist_ok=True)

## `06` LLM will be used

In [13]:
basic_llm = LLM(
    model="gemini/gemini-1.5-flash",
    temperature=0.2,
    provider="google_ai_studio",
    api_key=os.environ["GEMINI_API_KEY"]
)

## `07` START AGENTS

### `7.1` Agent 1: Trending Topics Agent 

In [14]:
# !gcloud init

In [15]:
user_input = input("Enter your research field or keyword: ")

In [16]:
class TrendingTopicsOutput(BaseModel):
    topics: List[Dict[str, str]] = Field(..., title="Trending topics with description", min_items=1)

trending_topics_agent = Agent(
    role="Trending Topics Identification Agent",

    goal="\n".join([
        f"You are an expert research assistant that identifies the latest trending topics in the field of {user_input} only focus on it .",
        "Generate a detailed list of the top 3-5 trending topics or recent articles reflecting advances and high interest in this field.",
        "Base your answer on recent publication trends, conferences, or journal articles.",
        "Do not include unrelated or general topics.",
        "Output only a JSON object with a 'topics' list containing objects with 'name' and 'description'."
    ]),
    backstory="Designed to guide users by providing the most relevant and current trending research topics in their specified field.",
    llm=basic_llm,
    verbose=True,
)

trending_topics_task = Task(
    description="\n".join([
        f"you are an expert in a {user_input} field to help beginner researchers in their writings .",
        "Provide a list of 3 to 5 trending topics or articals with a brief description for each.",
        "Focus on recent research interests supported by publication trends.",
        "Output in JSON format with 'topics' as list of objects {name, description}."
    ]),
    expected_output="JSON object with list of trending topics and descriptions.",
    output_json=TrendingTopicsOutput,
    output_file=os.path.join(output_dir, "step_1_trending_topics.json"),
    agent=trending_topics_agent,
)

### `7.2` Agent 2: Recent Papers Retrieval Agent

In [17]:
search_client = TavilyClient(api_key=TAVILY_API_KEY)
 
@tool
def search_engine_tool(query: str):
    """Useful for search-based queries. Use this to find current information about any query related pages using a search engine"""
    return search_client.search(query)


In [18]:
class PaperInfo(BaseModel):
    title: str  
    year: int 
    url: str
    abstract: str                                   


class RecentPapersOutput(BaseModel):
    topic_papers: Dict[str, List[PaperInfo]] = Field(..., title="Recent papers grouped by topic")

recent_papers_agent = Agent(
    role="Recent Papers Retrieval Agent",

    goal = "\n".join([
        "You are a research paper search assistant.",
        "Given a list of trending topics, retrieve 3 recent, relevant publications per topic.",
        "Select papers from reputable sources published within the last 2 years.(2023 or 2024 or 2025)",
        "Provide title, authors, abstract, year, and valid URL for each paper.",
        "the URL must be valid and accessible.",
        "If no recent paper is available, state 'No recent papers found' for that topic.",
        "Output in JSON format grouped by topic."]),

    backstory="Helps beginner researchers quickly discover and review the latest relevant publications across the trending topics with the URLs that are valid and some info.",

    llm=basic_llm,
    
    verbose=True,
)

recent_papers_task = Task(
    description="\n".join([
        "Input is a list of trending topics.",
        "For each topic, find 3 papers with title, authors, abstract, year, and link which should be valid and accessable.",
        "Select papers from reputable journals or conferences (IEEE, Springer, Elsevier, ICRA, IROS, actual arXiv).",
        "Only include papers published in 2023 or 2024 or 2025.",
        "Get the abstract of the paper as it is in the paper or the site to help the agents after you, bring a good clean text."
        "Focus on papers from last 2 years from reputable conferences or journals.",
        "If no recent paper is available, state 'No recent papers found' for that topic.",
        "Output JSON grouped by topic."
    ]),
    expected_output="JSON with topics as keys and list of paper info objects as values.",
    output_json=RecentPapersOutput,
    output_file=os.path.join(output_dir, "step_2_recent_papers.json"),
    agent=recent_papers_agent,
    tools=[search_engine_tool],
    
)


### `7.3 optional` Agent 3: Paper Summarization Agent (optional)

In [1]:
# class PaperSummariesOutput(BaseModel):
#     summaries: Dict[str, str] = Field(
#         ..., 
#         title="Paper title mapped to its summary", 
#         description="Each item has 'title' and 'summary'."
#     )

# paper_summarization_agent = Agent(
#     role="Academic Paper Summarization Agent",
#     goal="\n".join([
#         "Summarize each research paper into a detailed 120-150 word paragraph.",
#         "Mention the full paper title before the summary.",
#         "Focus on: main research problem, methodology, key findings, unique contributions.",
#         "Highlight any datasets, models, or diagrams used (in the paper).",
#         "Avoid generic descriptions. Be specific about what the paper achieves."
#     ]),
#     backstory="Provides clear and informative summaries to help users understand research papers quickly even if they are beginners.",
#     llm=basic_llm,
#     verbose=True,
# )

# paper_summarization_task = Task(
#     description="\n".join([
#         "Input is a list of papers with metadata and abstracts.",
#         "Produce a summary for each paper highlighting key points and visuals if any.",
#         "Output JSON mapping paper titles to summaries."
#     ]),
#     expected_output="JSON object mapping paper titles to summaries.",
#     output_json=PaperSummariesOutput,
#     output_file=os.path.join(output_dir, "step_3_paper_summaries.json"),
#     agent=paper_summarization_agent,
# )


### `7.3` Agent 3: Research Gap and Suggestion Agent

In [20]:
class ResearchGapOutput(BaseModel):
    research_gaps: List[str] = Field(..., title="List of research gaps and suggestions")

research_gap_agent = Agent(
    role="Research Gap Identification and Suggestion Agent",
    goal="\n".join([
        "Analyze summaries to identify gaps, limitations, and propose research directions or improvements.",
        "Use a friendly and encouraging tone suitable for beginners.",
        "You will be given the data about the papers about that topic 3 papers for each topic with their year, abstaract, url, title .",
        "analye the abstract to guess and detect gaps ",
        "Suggest these Gaps to the writer to can start from"
    ]),
    backstory="Helps users find novel contributions by highlighting unexplored areas and providing ideas.",
    llm=basic_llm,
    verbose=True,
)

research_gap_task = Task(
    description="\n".join([
        "Input is paper summaries.",
        "Output a list of research gaps, limitations, and suggestions for future research.",
        "Encourage beginners by providing feasible ideas."
        "You will be given the data about the papers about that topic 3 papers for each topic with their year, abstaract, url, title .",
        "analye the abstract to guess and detect gaps ",
        "Suggest these Gaps to the writer to can start from"
    ]),
    expected_output="JSON list of research gaps and improvement suggestions.",
    output_json=ResearchGapOutput,
    output_file=os.path.join(output_dir, "step_4_research_gaps.json"),
    agent=research_gap_agent,
)

### Intermediate agent

In [None]:
# chosen_topic = input("Which topic did you got interested about more? ")
# chosen_gap = input("Which gap do you like to start looking for ^-^")

In [2]:
# # --- New Pydantic Model for Starting Points Output ---
# class StartingPoint(BaseModel):
#     area: str
#     description: str
#     actionable_steps: List[str] = Field(..., description="Concrete steps a researcher can take")

# class ResearchStartingPointsOutput(BaseModel):
#     suggested_starting_points: List[StartingPoint] = Field(
#         ...,
#         title="Suggested starting points for research based on selected gap/topic"
#     )

# # --- New Agent Definition ---
# research_starting_points_agent = Agent(
#     role="Research Starting Points Suggester",
#     goal="\n".join([
#         f"Based on a provided research topic {chosen_topic}, and its gap {chosen_gap} , and considering user interests, "
#         "suggest concrete and actionable starting points for a new research project.",
#         "Include ideas for initial exploration, types of articles to search for, "
#         "relevant methodologies, or specific areas of focus."
#     ]),
#     backstory="""
#     An expert advisor in academic research, skilled at translating broad research gaps into practical,
#     first steps for aspiring researchers. You help define the initial scope and direction.
#     """,
#     llm=basic_llm,
#     verbose=True,
# )

# # --- New Task Definition ---
# research_starting_points_task = Task(
#     description="\n".join([
#         f"Input is a selected research topic {chosen_topic}, and its gap {chosen_gap} "
#         "along with any user preferences (e.g., 'I want to build new architectures').",
#         "Based on this input, provide a list of 3-5 actionable starting points for a beginner researcher.",
#         "For each starting point, include:",
#         "- **Area:** A concise title for the starting point.",
#         "- **Description:** What this starting point entails.",
#         "- **Actionable Steps:** Specific, concrete actions the researcher can take immediately (e.g., 'Search for recent papers on X', 'Explore open-source libraries for Y', 'Read foundational surveys on Z', 'Look for benchmark datasets for W').",
#         "Consider different angles: theoretical, empirical, practical application, data-driven, model-building, etc.",
#         "Output the suggestions in JSON format as defined by ResearchStartingPointsOutput."
#     ]),
#     expected_output="JSON list of suggested research starting points.",
#     output_json=ResearchStartingPointsOutput,
#     output_file=os.path.join(output_dir, "step_4.1_Intermideiate_research_starting_points.json"),
#     agent=research_starting_points_agent,
#     # This task might also benefit from search_tool if it needs to validate
#     # if certain "starting points" (like a library) exist or are relevant.
#     # tools=[search_tool],
# )


### `7.5` Agent 5: Paper Structure and Writing Guide Agent

In [23]:
class PaperStructureSection(BaseModel):
    section: str
    tips: str

class PaperStructureOutput(BaseModel):
    paper_structure: List[PaperStructureSection] = Field(..., title="Paper structure sections and writing tips")

paper_structure_agent = Agent(
    role="Paper Structure and Writing Guide Agent",
    goal="\n".join([
        "Provide a clear outline for structuring an academic paper.",
        "Give detailed tips on what to write in each section to help beginners.",
        "Include motivational and supportive writing advice."
    ]),
    backstory="Guides users through the paper writing process with a beginner-friendly approach.",
    llm=basic_llm,
    verbose=True,
)

paper_structure_task = Task(
    description="\n".join([
        "Input is the chosen research topic.",
        "Output a recommended paper structure with sections and detailed writing tips for each.",
        "Help beginners understand what content belongs in each part of the paper."
    ]),
    expected_output="JSON list of sections with writing tips.",
    output_json=PaperStructureOutput,
    output_file=os.path.join(output_dir, "step_5_paper_structure.json"),
    agent=paper_structure_agent,
)


### `7.6` Agent 6: Related work draft Agent

In [24]:
class RelatedWorkOutput(BaseModel):
    related_work: str = Field(..., title="Composed related work section")

related_work_agent = Agent(
    role="Related Work Composer Agent",
    goal="\n".join([
        "Compose a comprehensive 'Related Work' section using the paper summaries.",
        "Organize by themes or trends, and mention each paper's key contributions.",
        "Maintain academic tone and proper citation-like references (e.g., 'Smith et al. 2023')."
    ]),
    backstory="Helps users create strong literature review content automatically.",
    llm=basic_llm,
    verbose=True,
)

related_work_task = Task(
    description="\n".join([
        "Input is the list of paper summaries.",
        "Group papers by similarity and write a flowing Related Work section.",
        "Ensure good transitions, academic tone, and clear references.",
        "Output as a single string."
    ]),
    expected_output="Single string of the Related Work section.",
    output_json=RelatedWorkOutput,
    output_file=os.path.join(output_dir, "step_6_related_work.json"),
    agent=related_work_agent,
)


### `7.7` Agent 7: Paper draft Agent

In [25]:
class DraftOutput(BaseModel):
    draft: str = Field(..., title="Full academic paper draft text")

draft_writer_agent = Agent(
    role="Academic Paper Drafting Agent",
    goal="\n".join([
        "Write a full academic paper draft using the structure, research gap, and related work.",
        "Ensure clarity, academic tone, and smooth transitions.",
        "Support beginners by avoiding jargon and including helpful examples."
    ]),
    backstory="Turns raw research insights into a complete paper draft.",
    llm=basic_llm,
    verbose=True,
)

draft_writer_task = Task(
    description="\n".join([
        "Input is: paper structure + research gap + related work.",
        "Use them to generate a coherent draft of the academic paper.",
        "Output in well-organized academic format (Intro, Method, etc.)."
    ]),
    expected_output="String containing the full paper draft.",
    output_json=DraftOutput,
    output_file=os.path.join(output_dir, "step_7_paper_draft.json"),
    agent=draft_writer_agent,
)

## `08` Crew

In [26]:
# # Define the Crew
# crew_agents = Crew(
#     name="PublishMate Crew",
    
#     description="A crew of agents designed to assist with academic research and paper writing.",

#     agents=[trending_topics_agent, 
#             recent_papers_agent, 
#         #     paper_summarization_agent, 
#             research_gap_agent, 
#             research_starting_points_agent,
#         #     paper_structure_agent, 
#         #     related_work_agent, 
#         #     draft_writer_agent
#             ],
    

#     tasks=[trending_topics_task, 
#            recent_papers_task, 
#         #    paper_summarization_task, 
#            research_gap_task, 
#            research_starting_points_task,
#         #    paper_structure_task, 
#         #    related_work_task, 
#         #    draft_writer_task
#            ],
#     # tools=[tavily_paper_search],
# )

# result = crew_agents.kickoff()
# print(result)

In [27]:
# Phase 1: Run the first 3 tasks (up to research_gap_task)
first_crew = Crew(
    name="PublishMate Crew - Phase 1",
    description="Run up to research gap analysis.",
    agents=[
        trending_topics_agent,
        recent_papers_agent,
        research_gap_agent,
    ],
    tasks=[
        trending_topics_task,
        recent_papers_task,
        research_gap_task,
    ],
)

# Run the first part
first_result = first_crew.kickoff()
print(first_result)

[1m[95m# Agent:[00m [1m[92mTrending Topics Identification Agent[00m
[95m## Task:[00m [92myou are an expert in a transformers field to help beginner researchers in their writings .
Provide a list of 3 to 5 trending topics or articals with a brief description for each.
Focus on recent research interests supported by publication trends.
Output in JSON format with 'topics' as list of objects {name, description}.[00m


[1m[95m# Agent:[00m [1m[92mTrending Topics Identification Agent[00m
[95m## Final Answer:[00m [92m
{
  "topics": [
    {
      "name": "Efficient Transformers for Long Sequences",
      "description": "Research focuses on addressing the quadratic complexity of self-attention in standard transformers, hindering processing of long sequences.  Trending approaches include linear attention mechanisms, sparse attention, and hierarchical architectures to enable efficient handling of longer contexts in tasks like long-document summarization and long-range dependency

🖇 AgentOps: [34m[34mSession Replay for default.session trace: https://app.agentops.ai/sessions?trace_id=ec530ef96d76d2aa79906cd289bedb02[0m[0m
[31;1m🖇 AgentOps: [agentops.InternalSpanProcessor] Error uploading logfile: Upload failed: 401[0m


{'research_gaps': ["**Efficient Transformers for Long Sequences:**\n\n* **Gap 1:  Memory Efficiency for Extremely Long Sequences:** Current efficient Transformer methods still struggle with extremely long sequences (e.g., exceeding 100k tokens). Research could focus on novel memory management techniques or architectural innovations to handle such lengths efficiently.  This could involve exploring techniques beyond linear attention, such as hierarchical chunking with sophisticated inter-chunk communication or novel memory-efficient attention mechanisms.\n* **Gap 2:  Task-Specific Optimization:** While general-purpose efficient Transformers exist, there's a need for task-specific optimizations.  For example, long-document summarization might benefit from attention mechanisms that prioritize important information, while time-series forecasting might require specialized handling of temporal dependencies.  Research could explore how to tailor efficient Transformer architectures to specific 

In [28]:
# 💬 Get user input
chosen_topic = input("Which topic did you get interested in more? ")
chosen_gap = input("Which gap do you like to start looking for ^-^? ")

In [None]:
class ResearchGapSection(BaseModel):
    section: str
    tips: str

class ResearchGapOutput(BaseModel):
    research_steps: List[ResearchGapSection] = Field(..., title="Research gap focused steps and tips")

research_gap_agent = Agent(
    role="Research Gap Exploration Agent",
    goal="\n".join([
        f"Provide a detailed and clear set of specific research starting points based on the chosen {chosen_gap} in the {chosen_topic}.",
        "Include practical and beginner-friendly tips for each step to help users start their research.",
        "Focus on actionable tasks tied directly to the selected gap (e.g., watermarking, hallucination, bias).",
        "Motivate users by giving confidence and clear direction."
    ]),
    backstory="Helps users dive into LLM research by breaking down complex gaps into simple, actionable steps.",
    llm=basic_llm,
    verbose=True,
)

research_gap_task = Task(
    description="\n".join([
        f"Input: the chosen research gap {chosen_gap} in the topic {chosen_topic} .",
        "Output: a structured list of specific research steps with detailed tips for each step.",
        "Goal: help beginners understand what to do first, what resources to use, and how to progress in a steps."
    ]),
    expected_output="JSON list of steps with detailed beginner tips.",
    output_json=ResearchGapOutput,
    output_file=os.path.join(output_dir, "step_research_gap.json"),
    agent=research_gap_agent,
)



# Phase 2: Continue with remaining tasks
second_crew = Crew(
    name="PublishMate Crew - Phase 2",
    description="Suggest research starting points based on user-selected gap/topic.",
    agents=[
        research_starting_points_agent,
    ],
    tasks=[
        research_starting_points_task,
    ],
)

second_result = second_crew.kickoff()
print(second_result)


[1m[95m# Agent:[00m [1m[92mResearch Starting Points Suggester[00m
[95m## Task:[00m [92mInput is a selected research topic transformers, and its gap fe along with any user preferences (e.g., 'I want to build new architectures').
Based on this input, provide a list of 3-5 actionable starting points for a beginner researcher.
For each starting point, include:
- **Area:** A concise title for the starting point.
- **Description:** What this starting point entails.
- **Actionable Steps:** Specific, concrete actions the researcher can take immediately (e.g., 'Search for recent papers on X', 'Explore open-source libraries for Y', 'Read foundational surveys on Z', 'Look for benchmark datasets for W').
Consider different angles: theoretical, empirical, practical application, data-driven, model-building, etc.
Output the suggestions in JSON format as defined by ResearchStartingPointsOutput.[00m


[1m[95m# Agent:[00m [1m[92mResearch Starting Points Suggester[00m
[95m## Final Answer

🖇 AgentOps: [34m[34mSession Replay for default.session trace: https://app.agentops.ai/sessions?trace_id=ec530ef96d76d2aa79906cd289bedb02[0m[0m


{'suggested_starting_points': [{'area': 'Survey of Transformer Architectures for Specific Tasks', 'description': 'Begin by understanding the current landscape of transformer architectures and their applications. Focus on a specific task (e.g., natural language processing, computer vision, time series forecasting) to narrow the scope.', 'actionable_steps': ["Search for recent survey papers on transformer architectures in your chosen task area (e.g., 'A Survey of Transformer Architectures for Natural Language Processing').", 'Identify key architectural innovations and their impact on performance.', 'Analyze the strengths and weaknesses of different architectures for your chosen task.', 'Create a table summarizing key architectures, their characteristics, and their performance on benchmark datasets.']}, {'area': 'Exploring Transformer Limitations and Open Research Problems', 'description': 'Identify the current limitations of transformer models, such as computational cost, data efficiency