In [1]:
# !pip install openai
# !pip install requests

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting openai
  Downloading openai-1.16.1-py3-none-any.whl.metadata (21 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading openai-1.16.1-py3-none-any.whl (266 kB)
   ---------------------------------------- 0.0/266.9 kB ? eta -:--:--
   ---------------------------------------- 266.9/266.9 kB 8.3 MB/s eta 0:00:00
Downloading distro-1.9.0-py3-none-any.whl (20 kB)
Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
   ---------------------------------------- 0.0/75.6 kB ? eta -:--:--
   ---------

- This is the MVP / proof of concept
- Using gpt-3.5-turbo because it's the cheapest

To Do:
- The current approach is basic. Some prompt engineering could be helpful. Use OAI Playground Compare to do this
    - Need to refine enhance_query_with_openai so it's more helpful and elaborate. The current queries being constructed are trivial
    - suggest_research_direction() needs to be split up into a few functions, and should probably make its own arxiv calls based on user_feedback and research_interests. 
    - incorporate RAG 
- Port over to langchain to build a more conversational app
- Have a more stable way of inserting the user's API key
- A front end would be nice. Maybe GitHub pages for a basic UI? Try to find a template online?
- Connect to medrxiv and/or biorxiv? Building a routing system to direct API calls to the most appropriate pre-print server shouldn't be too hard.
    - Bug: It will generate answers to biological or medical questions without even trying to find any relevant literature to back up its answers. It doesn't know which questions are out of scope. Hallucinations could be an issue

In [34]:
from openai import OpenAI
import requests
import re

In [35]:
# Set your OpenAI API key here
openai.api_key = ''
# client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

client = openai.OpenAI(api_key=openai.api_key)

In [39]:
def enhance_query_with_openai(original_query):
    prompt = f"""
    Given a natural language query, convert it into a structured search query for the arXiv API. The arXiv API query format uses field prefixes like 'au' for author, 'ti' for title, 'cat' for category, and logical operators like 'AND', 'OR'. Below are examples of converting natural language queries into structured arXiv API queries:

    Here are a few examples:
    
    Natural Language Query: Papers by Albert Einstein about relativity
    Structured arXiv API Query: au:Albert Einstein AND all:relativity

    Natural Language Query: Quantum computing research after 2015
    Structured arXiv API Query: all:quantum computing AND submittedDate:[2015 TO *]

    Natural Language Query: Machine learning applications in finance
    Structured arXiv API Query: all:machine learning AND all:finance

    Now, convert the following natural language query into a structured arXiv API query:
    '{original_query}'
    Structured arXiv API Query:
    """

    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
    )
    # Extracting the structured query from the response
    arxiv_query = response.choices[0].message.content.strip()
    return arxiv_query

def search_arxiv(query):
    url = f'http://export.arxiv.org/api/query?search_query={query}&start=0&max_results=5'
    response = requests.get(url)
    return response.text

def extract_titles_and_summaries(xml_response):
    # Regex patterns to match titles and summaries
    title_pattern = re.compile(r'<title>(.*?)<\/title>')
    summary_pattern = re.compile(r'<summary>(.*?)<\/summary>', re.DOTALL)  # re.DOTALL to match across newlines

    titles = title_pattern.findall(xml_response)
    summaries = summary_pattern.findall(xml_response)

    # The first 'title' match is always "ArXiv Query: ..." so we skip it
    titles = titles[1:]

    # Pairing titles with summaries
    papers_info = [{"title": title, "summary": summary.strip()} for title, summary in zip(titles, summaries)]

    return papers_info

def summarize_with_openai(initial_query, papers_info):
    prompt = f"The user asked: '{initial_query}'. Based on the following titles and summaries from academic papers, provide a detailed and accessible explanation of the topic:\n\n"
    
    for paper in papers_info:
        prompt += f"Title: {paper['title']}\nSummary: {paper['summary']}\n\n"
    
    prompt += "Please review the titles and summaries to provide a thoughtful response to the user's question."

    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": prompt,
            }
        ],
    )
    
    thoughtful_response = response.choices[0].message.content.strip()
    return thoughtful_response

def suggest_research_directions(initial_query, thoughtful_response):
    """
    Generates novel research directions based on the user's feedback on a provided summary
    and their specific interests.

    Parameters:
    - initial_query: The original query posed by the user.
    - thoughtful_response: A comprehensive response to the initial query, summarizing relevant
      academic papers and insights.

    Returns:
    - A string containing suggestions for research trends, gaps, next steps, or future directions.
    """

    print("\n--- Research Direction Suggestion ---")
    user_feedback = input("What are your thoughts on the provided summary? Any specific areas of interest or questions that arise? ")

    research_interests = input("Could you specify any particular research interests or areas where you're seeking innovation? ")
    
    prompt = f"""
    Based on the initial inquiry about '{initial_query}' and the provided summary, the researcher shared their thoughts: '{user_feedback}'. They expressed a particular interest in '{research_interests}'.

    Considering the current state of research and potential future developments, identify emerging trends, and gaps in the literature, and suggest novel research directions or next steps that could significantly advance the field. Emphasize novelty and innovation in your suggestions.
    """

    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": prompt,
            }
        ],
    )

    research_suggestions = response.choices[0].message.content.strip()
    return research_suggestions

In [40]:
def main():
    user_query = input("Enter your search query: ")
    print("Converting your query into an arXiv-friendly format...")
    arxiv_query = enhance_query_with_openai(user_query)
    print(f"Here is the arXiv Query I am using: {arxiv_query}\nFetching papers from arXiv...")
    arxiv_results = search_arxiv(arxiv_query)
    extracted_information = extract_titles_and_summaries(arxiv_results)
    # print(extracted_information)
    thoughtful_response = summarize_with_openai(user_query, extracted_information)
    print("Here's what I found:\n", thoughtful_response)
    research_suggestions = suggest_research_directions(user_query, thoughtful_response)
    print("Here are a few research ideas to inspire your work:\n", research_suggestions)

if __name__ == "__main__":
    main()

Enter your search query:  Latest advancements in CRISPR and gene editing?


Converting your query into an arXiv-friendly format...
Here is the arXiv Query I am using: all:"CRISPR AND gene editing" AND sortBy:submittedDate
Fetching papers from arXiv...
Here's what I found:
 Title: "Prime editing: precision genome editing by reverse transcription"
Summary: Prime editing is a new type of genome editing tool that allows for precise changes to the DNA sequence of an organism without the need for double-strand breaks, making it potentially safer and more efficient than previous gene editing techniques.

Title: "Cytosine base editors with minimized unguided DNA and RNA off-target events and high on-target activity"
Summary: Cytosine base editors are a type of gene editing tool that can change individual DNA bases in a highly targeted manner. This research focuses on improving the precision and reducing off-target effects of these base editors, making them more effective and safer for use in research and potential therapeutic applications.

Title: "CRISPR-Cas12a targe

What are your thoughts on the provided summary? Any specific areas of interest or questions that arise?  Pretty interesting, I'm interested in prime editing.
Could you specify any particular research interests or areas where you're seeking innovation?  Yes, I want to innovate on pegRNAs.


Here are a few research ideas to inspire your work:
 Given your interest in prime editing and pegRNAs, here are some emerging trends, gaps in the literature, and potential novel research directions that could significantly advance the field:

1. **Emerging Trends**:
   - **Improving Editing Efficiency**: Researchers are actively working on enhancing the efficiency of prime editing by optimizing the components involved, such as the pegRNA design, Cas enzyme, and delivery methods.
   - **Reducing Off-Target Effects**: Addressing off-target effects remains a critical area of focus, with advancements in developing more precise editing tools and strategies to minimize unintended genetic modifications.
   - **Expanding Targetable Genomic Regions**: Efforts are being made to increase the scope of targetable genomic regions using prime editing, including challenging loci with high GC content or repetitive sequences.

2. **Gaps in the Literature**:
   - **Functional Characterization of pegRNAs*