<a href="https://colab.research.google.com/github/bhaskatripathi/HypothesisHub/blob/main/Hypothesis_Hub.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Hypothesis Hub**: *An AI Tool for Automated Research Question and Hypothesis Generation from a given Scientific Literature*

In [1]:
pip install OpenAI langchain


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting OpenAI
  Downloading openai-0.27.4-py3-none-any.whl (70 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.3/70.3 kB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain
  Downloading langchain-0.0.138-py3-none-any.whl (520 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m520.7/520.7 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp
  Downloading aiohttp-3.8.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
Collecting async-timeout<5.0.0,>=4.0.0
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting SQLAlchemy<2,>=1
  Downloading SQLAlchemy-1.4.47-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)
[2K    

In [2]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain, SequentialChain
from langchain.prompts import PromptTemplate
from langchain.memory import SimpleMemory
import os

## **The following code generates three research questions and hypothesis (Ho,H1) for each of the research questions.**
The code can be modified to address more research questions based on individual needs

# **Sequence Diagram**

In [3]:
from IPython.display import Image
Image(url='https://raw.githubusercontent.com/bhaskatripathi/HypothesisHub/main/Sequence%20diagram.PNG')


In [7]:
class ResearchAndHypothesisGenerator:
    def __init__(self, openai_api_key):
        self.llm = OpenAI(temperature=0.7, openai_api_key=openai_api_key,model_name="text-davinci-003")

        research_question_template = """Given the following text, generate three research questions related to the topic. Ensure that the research questions are clear, focused, and can be investigated using a scientific method. Each research question should ideally involve one or more variables that can be measured or manipulated. 
Text: {text}
Examples of good research questions:
- How does variable A affect variable B?
- What is the relationship between variable X and variable Y?
- To what extent does factor M influence outcome N?
Research Question 1: 
Research Question 2: 
Research Question 3: """
        self.rq_prompt_template = PromptTemplate(input_variables=["text"], template=research_question_template)
        self.research_question_chain = LLMChain(
            llm=self.llm, prompt=self.rq_prompt_template, output_key="research_questions"
        )

        hypothesis_template = """Given the research question "{research_question}", generate a null hypothesis (H0) and an alternate hypothesis (H1). The null hypothesis should represent the absence of an effect, relationship, or difference, while the alternate hypothesis should represent the presence of an effect, relationship, or difference. Ensure that hypotheses H0 and H1 are never the same for the research question "{research_question}". 
Null Hypothesis (H0):
Alternate Hypothesis (H1):"""

        self.hypothesis_prompt_template = PromptTemplate(input_variables=["research_question"], template=hypothesis_template)
        self.hypothesis_chain = LLMChain(
            llm=self.llm, prompt=self.hypothesis_prompt_template, output_key="hypotheses"
        )

    def generate_research_questions(self, user_text):
        result = self.research_question_chain({"text": user_text})
        research_questions = [rq.strip() for rq in result["research_questions"].split('\n') if rq.startswith('Research Question')]
        return research_questions

    def generate_hypotheses(self, research_questions):
        hypotheses = []

        for question in research_questions:
            # Generate null hypothesis (H0)
            h0_result = self.hypothesis_chain({"research_question": question, "hypothesis_type": "null"})
            h0_lines = [line.strip() for line in h0_result["hypotheses"].split('\n') if line.strip()]
            if len(h0_lines) > 0:
                h0 = h0_lines[0]
            else:
                h0 = "N/A"

            # Generate alternate hypothesis (H1)
            h1_result = self.hypothesis_chain({"research_question": question, "hypothesis_type": "alternate"})
            h1_lines = [line.strip() for line in h1_result["hypotheses"].split('\n') if line.strip()]
            if len(h1_lines) > 0:
                h1 = h1_lines[0]
            else:
                h1 = "N/A"

            # Ensure H0 and H1 are different, generate new H1 if needed
            while h1 == h0 or h1 == "N/A":
                h1_result = self.hypothesis_chain({"research_question": question, "hypothesis_type": "alternate"})
                h1_lines = [line.strip() for line in h1_result["hypotheses"].split('\n') if line.strip()]
                if len(h1_lines) > 0:
                    h1 = h1_lines[0]
                else:
                    h1 = "N/A"

            hypotheses.append((h0, h1))

        return hypotheses

    def create_hypotheses(self, research_questions):
        hypotheses = []

        for question in research_questions:
            result = self.hypothesis_chain({"research_question": question})
            hypothesis_lines = [line.strip() for line in result["hypotheses"].split('\n') if line.strip()]

            if len(hypothesis_lines) < 2:
                # If we don't have two hypotheses, generate new hypotheses until we have both H0 and H1
                while len(hypothesis_lines) < 2:
                    result = self.hypothesis_chain({"research_question": question})
                    hypothesis_lines = [line.strip() for line in result["hypotheses"].split('\n') if line.strip()]

            h0 = hypothesis_lines[0]
            h1 = hypothesis_lines[1]

            # Ensure H0 and H1 are different, generate new hypotheses if needed
            while h1 == h0:
                result = self.hypothesis_chain({"research_question": question})
                hypothesis_lines = [line.strip() for line in result["hypotheses"].split('\n') if line.strip()]
                if len(hypothesis_lines) >= 2:
                    h0 = hypothesis_lines[0]
                    h1 = hypothesis_lines[1]

            hypotheses.append((h0, h1))

        return hypotheses


if __name__ == "__main__":
    user_text = """Objective: To identify the factors responsible for price inconsistencies across different major cryptocurrency exchanges .
Our third objective is to determine the factors responsible for price inconsistencies across different major Bitcoin exchanges. 
To examine the pricing inconsistencies across various exchanges, the Spread (Bid/Ask difference) in the top 3 volume traded Crypto exchanges will be studied. 
Factors such as Exchange volume, exchange market size, transaction fee cost of exchange, Anti Money Laundering (AML) or Know Your Customer (KYC) requirement will also be considered among others. 
It is also expected that higher transaction volumes lead to a lower difference between the prices at which a Cryptocurrencies are traded across different exchanges. 
Comparative prices of one exchange to another by using regression and other appropriate techniques will be analyzed. 
    """
    os.environ["OPENAI_API_KEY"] = #"you_openai_api_key"
    api_key=os.environ["OPENAI_API_KEY"]
    generator = ResearchAndHypothesisGenerator(api_key)
    print("RUNNING CHAIN OF THOUGHTS...:")
    research_questions = generator.generate_research_questions(user_text)
    print("ORIGINAL TEXT:")
    print(user_text)
    print("\n")
    
    print("LIST OF REQSEARCH QUESTIONS:")
    for i, question in enumerate(research_questions):
        print(f"{i + 1}: {question}")

    hypotheses = generator.create_hypotheses(research_questions)
    print("\nLIST OF HYPOTHESIS:")
    for i, hypothesis_pair in enumerate(hypotheses):
        print(f"\nRESEARCH QUESTION {i + 1}:")
        print(f"NULL HYPOTHESIS (H0): {hypothesis_pair[0]}")
        print(f"ALTERNATE HYPOTHESIS (H1): {hypothesis_pair[1]}")


RUNNING CHAIN OF THOUGHTS...:
ORIGINAL TEXT:
Objective: To identify the factors responsible for price inconsistencies across different major cryptocurrency exchanges .
Our third objective is to determine the factors responsible for price inconsistencies across different major Bitcoin exchanges. 
To examine the pricing inconsistencies across various exchanges, the Spread (Bid/Ask difference) in the top 3 volume traded Crypto exchanges will be studied. 
Factors such as Exchange volume, exchange market size, transaction fee cost of exchange, Anti Money Laundering (AML) or Know Your Customer (KYC) requirement will also be considered among others. 
It is also expected that higher transaction volumes lead to a lower difference between the prices at which a Cryptocurrencies are traded across different exchanges. 
Comparative prices of one exchange to another by using regression and other appropriate techniques will be analyzed. 
    


LIST OF REQSEARCH QUESTIONS:
1: Research Question 1: How 