<a href="https://colab.research.google.com/github/bhaskatripathi/HypothesisHub/blob/main/Hypothesis_Hub.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Hypothesis Hub**: *An AI Tool for Automated Research Question and Hypothesis Generation from a given Scientific Literature*

In [None]:
pip install OpenAI langchain gradio


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gradio
  Downloading gradio-3.25.0-py3-none-any.whl (17.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.5/17.5 MB[0m [31m92.7 MB/s[0m eta [36m0:00:00[0m
Collecting semantic-version
  Downloading semantic_version-2.10.0-py2.py3-none-any.whl (15 kB)
Collecting orjson
  Downloading orjson-3.8.10-cp39-cp39-manylinux_2_28_x86_64.whl (140 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m140.5/140.5 kB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-multipart
  Downloading python_multipart-0.0.6-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.7/45.7 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting websockets>=10.0
  Downloading websockets-11.0.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (129 kB)
[2K     [90

In [None]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain, SequentialChain
from langchain.prompts import PromptTemplate
from langchain.memory import SimpleMemory
import os

## **The following code generates three research questions and hypothesis (Ho,H1) for each of the research questions.**
The code can be modified to address more research questions based on individual needs

# **Sequence Diagram**

In [60]:
from IPython.display import Image
Image(url='https://raw.githubusercontent.com/bhaskatripathi/HypothesisHub/main/Sequence%20diagram.PNG')


In [70]:
class ResearchAndHypothesisGenerator:
    def __init__(self, openai_api_key):
        self.llm = OpenAI(temperature=0.7, openai_api_key=openai_api_key,model_name="text-davinci-003")

        research_question_template = """Given the following text, generate three research questions related to the topic.

Text: {text}

Research Question 1: 
Research Question 2: 
Research Question 3: """
        self.rq_prompt_template = PromptTemplate(input_variables=["text"], template=research_question_template)
        self.research_question_chain = LLMChain(
            llm=self.llm, prompt=self.rq_prompt_template, output_key="research_questions"
        )

        hypothesis_template = """Given the research question "{research_question}", generate a null hypothesis (H0) and an alternate hypothesis (H1).
Null Hypothesis (H0):
Alternate Hypothesis (H1):"""
        self.hypothesis_prompt_template = PromptTemplate(input_variables=["research_question"], template=hypothesis_template)
        self.hypothesis_chain = LLMChain(
            llm=self.llm, prompt=self.hypothesis_prompt_template, output_key="hypotheses"
        )

    def generate_research_questions(self, user_text):
        result = self.research_question_chain({"text": user_text})
        research_questions = [rq.strip() for rq in result["research_questions"].split('\n') if rq.startswith('Research Question')]
        return research_questions

    def generate_hypotheses1(self, research_questions):
        hypotheses = []
        for question in research_questions:
            result = self.hypothesis_chain({"research_question": question})
            hypothesis_lines = [line.strip() for line in result["hypotheses"].split('\n') if line.strip()]
            if len(hypothesis_lines) == 1:
                hypothesis = hypothesis_lines[0]
                if "H1:" in hypothesis:
                    h1 = hypothesis.replace("H1:", "").strip()
                    h0 = f"There are {h1.lower()} between the different {question.split(':')[1].strip()}."
                else:
                    h0 = hypothesis
                    h1 = "N/A"
            elif len(hypothesis_lines) == 2:
                h0 = hypothesis_lines[0].replace("H0:", "").strip()
                h1 = hypothesis_lines[1].replace("H1:", "").strip()
            else:
                print(f"Error generating hypotheses for question: {question}")
                h0, h1 = "N/A", "N/A"
            hypotheses.append((h0, h1))
        return hypotheses

    def generate_hypotheses(self, research_questions):
      hypotheses = []
      for question in research_questions:
          result = self.hypothesis_chain({"research_question": question})
          hypothesis_lines = [line.strip() for line in result["hypotheses"].split('\n') if line.strip()]

          if len(hypothesis_lines) == 0:
              hypotheses.append(("N/A", "N/A"))

          elif len(hypothesis_lines) == 1:
              # Generate H1 using LLMChain
              h0 = hypothesis_lines[0]
              h1_result = self.hypothesis_chain({"research_question": question, "hypothesis_type": "alternate"})
              h1_lines = [line.strip() for line in h1_result["hypotheses"].split('\n') if line.strip()]
              if len(h1_lines) == 1:
                  h1 = h1_lines[0]
              else:
                  h1 = "N/A"
              hypotheses.append((h0, h1))

          else:
              h0 = hypothesis_lines[0]
              h1 = hypothesis_lines[1]
              if h1 == "N/A":
                  # Generate H1 using LLMChain
                  h1_result = self.hypothesis_chain({"research_question": question, "hypothesis_type": "alternate"})
                  h1_lines = [line.strip() for line in h1_result["hypotheses"].split('\n') if line.strip()]
                  if len(h1_lines) == 1:
                      h1 = h1_lines[0]
                  else:
                      h1 = "N/A"
              hypotheses.append((h0, h1))

      return hypotheses

if __name__ == "__main__":
    user_text = """Objective : The Effect of News on Stock Prices: Evidence from Natural Language Processing is a study aimed at analyzing the relationship between news and stock prices using natural
     language processing techniques. The study analyzes news articles published by major financial news outlets and their impact on stock prices.
      The researchers use a sentiment analysis model to analyze the tone of the news articles and determine whether they have a positive or negative impact on the stock prices. 
      The study finds that there is a significant relationship between news sentiment and stock prices, with positive news leading to an increase in stock prices and negative news leading 
      to a decrease in stock prices. The study highlights the importance of monitoring news and understanding its impact on the stock market.The study aims to investigate the effect of news 
      sentiment on stock prices using natural language processing techniques. The research focuses on analyzing news articles from various sources and measuring the sentiment of the news using 
      machine learning algorithms. The study aims to identify the relationship between news sentiment and stock prices and to explore whether news sentiment can be used as a predictor of stock prices. 
    The findings of the study can potentially have significant implications for investors and financial analysts in terms of identifying market trends and making informed investment decisions.
    """
    os.environ["OPENAI_API_KEY"] = "sk-XXXXXXXX" #"you_openai_api_key"
    api_key=os.environ["OPENAI_API_KEY"]
    generator = ResearchAndHypothesisGenerator(api_key)
    print("RUNNING CHAIN OF THOUGHTS...:")
    research_questions = generator.generate_research_questions(user_text)
    print("ORIGINAL TEXT:")
    print(user_text)
    print("\n")
    
    print("LIST OF REQSEARCH QUESTIONS:")
    for i, question in enumerate(research_questions):
        print(f"{i + 1}: {question}")

    hypotheses = generator.generate_hypotheses(research_questions)
    print("\nLIST OF HYPOTHESIS:")
    for i, hypothesis_pair in enumerate(hypotheses):
        print(f"\nRESEARCH QUESTION {i + 1}:")
        print(f"NULL HYPOTHESIS (H0): {hypothesis_pair[0]}")
        print(f"ALTERNATE HYPOTHESIS (H1): {hypothesis_pair[1]}")


RUNNING CHAIN OF THOUGHTS...:
ORIGINAL TEXT:
Objective : The Effect of News on Stock Prices: Evidence from Natural Language Processing is a study aimed at analyzing the relationship between news and stock prices using natural
     language processing techniques. The study analyzes news articles published by major financial news outlets and their impact on stock prices.
      The researchers use a sentiment analysis model to analyze the tone of the news articles and determine whether they have a positive or negative impact on the stock prices. 
      The study finds that there is a significant relationship between news sentiment and stock prices, with positive news leading to an increase in stock prices and negative news leading 
      to a decrease in stock prices. The study highlights the importance of monitoring news and understanding its impact on the stock market.The study aims to investigate the effect of news 
      sentiment on stock prices using natural language processing tech