In [10]:
import json
from typing import Sequence

from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_hub.semanticscholar.base import SemanticScholarReader

In [2]:
try:
    storage_context = StorageContext.from_defaults(persist_dir="./storage/biases")
    march_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(persist_dir="./storage/methods")
    june_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(persist_dir="./storage/datasets")
    sept_index = load_index_from_storage(storage_context)

    index_loaded = False
except:
    index_loaded = True
    


In [12]:
# build indexes across the three data sources
s2reader = SemanticScholarReader()

if not index_loaded:
    # load data
    docs1 = s2reader.load_data(
        query="datasets for paragraph retrieval", limit=10
    )
    docs2 = s2reader.load_data(
        query="text ranking methods", limit=10
    )

    
    # build index
    docs1_index = VectorStoreIndex.from_documents(docs1)
    docs2_index = VectorStoreIndex.from_documents(docs2)


    # persist index
    docs1_index.storage_context.persist(persist_dir="./storage/docs1")
    docs2_index.storage_context.persist(persist_dir="./storage/docs2")


In [13]:
docs1_engine = docs1_index.as_query_engine(similarity_top_k=5)
docs2_engine = docs2_index.as_query_engine(similarity_top_k=5)


In [14]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=docs1_engine,
        metadata=ToolMetadata(
            name="paragraph_retrieval_datasets",
            description="This has information about datasets for paragraph retrieval. "
            "Use a detailed plain text question as input to the tool.",
        ),
    ),
    QueryEngineTool(
        query_engine=docs2_engine,
        metadata=ToolMetadata(
            name="text_ranking_methods",
            description="This has information about text ranking methods. "
            "Use a detailed plain text question as input to the tool.",
        ),
    )
]

In [15]:
from llama_index.schema import Document
from llama_index.agent import ContextRetrieverOpenAIAgent


# toy index - stores a list of abbreviations
texts = [""]
docs = [Document(text=t) for t in texts]
context_index = VectorStoreIndex.from_documents(docs)

context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
    query_engine_tools, context_index.as_retriever(similarity_top_k=1), verbose=True
)

In [18]:
response = context_agent.chat(
    "Think of a new research techniques for paragraph retrieval. Give me a detailed step by step description of the technique."
)

[33;1m[1;3mContext information is below.
---------------------

---------------------
Given the context information and not prior knowledge, either pick the corresponding tool or answer the function: Think of a new research techniques for paragraph retrieval. Give me a detailed step by step description of the technique.

[0m

In [19]:
from IPython.display import Markdown

# print markdown
Markdown(str(response))

A detailed step-by-step description of a new research technique for paragraph retrieval could be as follows:

1. Preprocessing: Start by preprocessing the paragraphs in the knowledge source. This may involve tokenization, stemming, and removing stop words to create a clean representation of the text.

2. Contextualized Sentence Representation: Utilize a pre-trained language model, such as BERT or GPT, to generate contextualized sentence representations for each sentence in the paragraphs. These models capture the contextual information and semantic meaning of the sentences.

3. Question Representation: Similarly, generate a contextualized representation for the question using the same language model. This representation will capture the contextual information and semantic meaning of the question.

4. Joint Vector Representation: Combine the contextualized question representation with each contextualized sentence representation to create a joint vector representation for each sentence in the paragraphs. This joint representation captures the relationship between the question and the sentences.

5. Similarity Calculation: Calculate the similarity between the joint vector representations of the question and each sentence in the paragraphs. This can be done using cosine similarity or other similarity metrics.

6. Ranking: Rank the sentences in the paragraphs based on their similarity scores with the question. The sentences with higher similarity scores are considered more relevant to the question.

7. Iterative Retrieval: Iterate the process by selecting the top-ranked sentences as supporting evidence and using them to refine the question representation. This can be done by incorporating the selected sentences into the question representation or by re-generating the question representation using the language model.

8. Paragraph Retrieval: Retrieve the paragraphs that contain the top-ranked sentences as supporting evidence. These paragraphs are likely to contain the necessary information to answer the question.

By following these steps, this technique leverages contextualized sentence representations and iterative retrieval to enhance the retrieval of relevant paragraphs for a given question.

# Raw GPT-4 Output
Technique 1: Biased Data Identification And Corrective Training
The first step should be to create a system to identify data that promotes anti-Muslim bias. This would involve monitoring, recognizing, and classifying the input and output of the language model. An AI component can be developed to determine bias by considering factors like context, sentiment, and disadvantage or harm to a specific religious group like Muslims. Once the biased-data is identified, it should be excluded from the training set and the model must be retrained in this corrected data.

Technique 2: Comprehensive Religious Text Analysis
In order to correctly understand the principles, beliefs, and values of Islam, a comprehensive text analysis of religious works and literature can be utilized. These texts should then be used as part of the training, thereby permeating the model with a greater understanding and respect for the Muslim faith.

Technique 3: Multi-dimensional Bias Detection Metric
A comprehensive evaluation metric can be formulated to measure the level of bias in the language model's output. This metric would evaluate results along a variety of dimensions, like religious discrimination, defamatory language, and intent to hurt, in order to provide a holistic understanding of the bias present.

Technique 4: Regular Audit And Adjustments
The language model's output should be audited at regular intervals to assess the dominance of anti-Muslim bias. Any trends or patterns of bias identified should be corrected by adjusting the model's algorithms or its training data.

Technique 5: Diversified Training
The model should be trained with text from diverse sources depicting varying cultural, religious, and ethnic viewpoints - making sure the perceptions and narratives around Islam and Muslims are balanced and inclusive. 

Technique 6: Public Engagement
Public participation can be encouraged to report instances of bias or inappropriate outputs by the model. This feedback can be used to continuously update and refine the model, to reduce any biases manifested.

Technique 7: Model Accountability and Transparency
Creating methods that make the model more accountable for its outputs and transparent about the presence or possibility of bias can be a powerful technique. For example, a feature can be created that flags potentially biased content or prompts for a review before generating such content.

Technique 8: Active Learning and Expert Input
Applying active learning where the model is capable of learning from the users' feedback on bias can be helpful. Additionally, involving domain experts, such as cultural anthropologists, religious scholars, and social justice advocates, can provide a nuanced understanding of the subtle aspects of bias and how to model algorithms sensitively and respectfully.

# GPT-4 Response: 2

Research Technique Name: Sentimental Analysis and Semantic Understanding

Description: This new research technique will aim to reduce anti-Muslim bias in language models by using a combination of Sentimental Analysis and Semantic Understanding algorithms, working hand in hand to identify and modify potentially bias-inducing elements in the system’s language patterns.

1. Sentimental Analysis: It entails parsing text for polarity identification i.e., determining whether a specific text or phrase conveys negative, positive, or neutral sentiment. This model will allow us to identify anti-Muslim sentiments in the existing language models.

2. Semantic Understanding: This step involves using techniques that help in understanding the context and intended meaning of words and phrases in sentences. This is useful because bias does not always present itself as overtly negative language -- sometimes it is manifested subtly in the form of certain stereotypes, misconceptions, or prejudiced undertones.

Steps of the Research Technique:

- Dataset Development: The first step would be to collect a relevant dataset. This can involve scraping online text or discourse related to Islam or Muslims, both positive and negative. This dataset should then be cleaned and preprocessed for further analysis.

- Biases Identification: Apply the Sentimental Analysis on the data to identify anti-Muslim sentiment. This can give basic understanding of any underlyingnegative attitudes or biases presented in the language model's output. 

- Contextual Understanding: Utilize semantic understanding techniques to assess the contextual meaning behind sentences and phrases. Any subtle form of biases can be identified and sorted out through this method.

- Bias Tagging: Detect the biases and tag them according to the intensity and type of bias for example, weak, strong, racial, religious, etc.

- Model Training: Retrain the language model on this dataset. The model will be trained to reduce biases based on the tagged labels.

- Evaluation: Check the language model's output to see if anti-Muslim bias has been reduced. Use the model to generate responses on Muslim-related topics to test its performance.

- Iterative Retraining: Perform several iterations of training and evaluation, each time adjusting and improving the model according to the bias identified in the output.

- Integration of Anti-bias Filter: Create an anti-bias filter as a part of the language model to check real-time responses. This filter can analyze the polarity of responses and block or replace responses that show any type of anti-Muslim bias.

- User Feedback: Take user feedback on the anti-biased model responses. User feedback can provide additional insights and indicate areas requiring further improvements. The feedback can be used forfurther refining the model and improving its performance in a real-world context.

- Update the Dataset: As time progresses and language evolves, there will be need for the dataset to be updated. It can either be done periodically or in real-time as the language model is exposed to current user interactions to ensure it's able to adapt to the constant changes in language use.

- Continuous Monitoring: Implement consistent testing protocols to ensure anti-Muslim biases don't crop up unnoticed, even after successful training and evaluation, in order to properly maintain the integrity of the language model.

The goal of this technique is to make the language model more unbiased and fair in its text generation tasks. The Sentimental Analysis aids in identifying overt biases, while the Semantic Understanding helps detect tacit prejudices. By coupling these together, along with user feedback and continuous monitoring, we can work towards a model that deeply understands and respects all cultures and religions. This technique further allows for scalability and can be employed for a wide range of audience and other contexts as well.

In [22]:
response = context_agent.chat(
    "Can you point out some gaps in the current techniques? Now use these gaps to think of a new research technique for paragraph retrieval. Give me a detailed step by step description of the technique."
)
# print markdown
Markdown(str(response))

[33;1m[1;3mContext information is below.
---------------------

---------------------
Given the context information and not prior knowledge, either pick the corresponding tool or answer the function: Can you point out some gaps in the current techniques? Now use these gaps to think of a new research technique for paragraph retrieval. Give me a detailed step by step description of the technique.

[0m

Some gaps in the current techniques for paragraph retrieval include the limited consideration of fine-grained elements within paragraphs, the lack of effective utilization of contextual knowledge, and the need for more comprehensive evaluation metrics. 

Based on these gaps, a new research technique for paragraph retrieval could be developed as follows:

1. Fine-Grained Sentence Representation: Start by generating fine-grained sentence representations within each paragraph. This can be done using pre-trained language models or other sentence embedding techniques. By considering the individual sentences within a paragraph, we can capture more granular information and improve the retrieval process.

2. Contextualized Paragraph Representation: Utilize contextualized paragraph representations that take into account the relationships between sentences within a paragraph. This can be achieved by incorporating attention mechanisms or graph-based models that capture the contextual dependencies between sentences.

3. External Knowledge Integration: Incorporate external knowledge sources, such as domain-specific ontologies or knowledge graphs, to enhance the retrieval process. This can involve leveraging semantic relationships between concepts or entities mentioned in the paragraphs and the question.

4. Iterative Retrieval and Refinement: Implement an iterative retrieval process where the initial retrieval results are used to refine the paragraph representations and improve the relevance ranking. This can be done by incorporating feedback from users or by incorporating additional contextual information from the retrieved paragraphs.

5. Evaluation with Comprehensive Metrics: Develop comprehensive evaluation metrics that go beyond traditional precision and recall measures. Consider metrics that capture the relevance, informativeness, and coherence of the retrieved paragraphs. This can provide a more holistic assessment of the performance of the paragraph retrieval technique.

6. Fine-Tuning and Optimization: Fine-tune the parameters of the paragraph retrieval model using appropriate optimization techniques. This can involve techniques such as gradient descent or Bayesian optimization to optimize the performance of the model.

7. Experimental Validation: Conduct extensive experiments to validate the effectiveness of the proposed technique. Compare it with existing paragraph retrieval methods using benchmark datasets and evaluate its performance using the comprehensive evaluation metrics developed in step 5.

By following these steps, the proposed technique aims to address the gaps in current paragraph retrieval techniques by considering fine-grained elements, leveraging contextual knowledge, and using comprehensive evaluation metrics.

In [32]:
docs1 = s2reader.load_data(
    query="limitations of current agricultural techniques", limit=10
)
docs2 = s2reader.load_data(
    query="deep learning in agriculture", limit=10
)

# build index
docs1_index = VectorStoreIndex.from_documents(docs1)
docs2_index = VectorStoreIndex.from_documents(docs2)

docs1_engine = docs1_index.as_query_engine(similarity_top_k=5)
docs2_engine = docs2_index.as_query_engine(similarity_top_k=5)

query_engine_tools = [
    QueryEngineTool(
        query_engine=docs1_engine,
        metadata=ToolMetadata(
            name="limitations_of_current_agricultural_techniques",
            description="This has information about limitations of current agricultural techniques. "
            "Use a detailed plain text question as input to the tool.",
        ),
    ),
    QueryEngineTool(
        query_engine=docs2_engine,
        metadata=ToolMetadata(
            name="deep_learning_in_agriculture",
            description="This has information about deep learning in agriculture. "
            "Use a detailed plain text question as input to the tool.",
        ),
    )
]

# toy index - stores a list of abbreviations
texts = [""]
docs = [Document(text=t) for t in texts]
context_index = VectorStoreIndex.from_documents(docs)

context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
    query_engine_tools, context_index.as_retriever(similarity_top_k=1), verbose=True
)

response = context_agent.chat(
    "Can you point out some gaps in the current techniques? Now use these gaps to think of a new research techniques. Think step by step."
)
# print markdown
Markdown(str(response))

[33;1m[1;3mContext information is below.
---------------------

---------------------
Given the context information and not prior knowledge, either pick the corresponding tool or answer the function: Can you point out some gaps in the current techniques? Now use these gaps to think of a new research techniques. Think step by step.

[0m=== Calling Function ===
Calling function: limitations_of_current_agricultural_techniques with args: {
  "input": "What are the limitations of current agricultural techniques?"
}
Got output: The limitations of current agricultural techniques include the lack of detailed simulations in terms of dynamics and visuals, which affects the accuracy of simulation software. Additionally, there is a need for improvements in disease recognition in plants, as well as the incorporation of more sustainable sensing techniques in agricultural Internet of Things (IoT) systems. Furthermore, there are limitations in the use of traditional farming techniques, such as the 

Based on the limitations of current agricultural techniques, we can think of a new research technique that addresses these gaps. Here is a step-by-step approach:

1. Develop advanced simulation software: To overcome the limitation of inaccurate simulation software, researchers can focus on developing more detailed and accurate simulation models for agricultural systems. This can involve incorporating dynamic and visual elements to provide a more realistic representation of agricultural processes.

2. Improve disease recognition in plants: Researchers can explore the use of advanced technologies such as machine learning and computer vision to improve disease recognition in plants. This can involve developing algorithms that can analyze images of plants and identify signs of diseases at an early stage, allowing for timely intervention and prevention.

3. Enhance sustainable sensing techniques: Agricultural IoT systems can be improved by incorporating more sustainable sensing techniques. This can involve the development of sensors that are energy-efficient, use renewable energy sources, and have minimal environmental impact. These sensors can provide real-time data on various parameters such as soil moisture, temperature, and nutrient levels, enabling farmers to make informed decisions and optimize resource usage.

4. Promote knowledge about rationality fundamentals: To address the limitation of limited knowledge about rationality fundamentals in traditional farming techniques, research can focus on educating farmers about the principles of rational decision-making in agriculture. This can involve conducting workshops, training programs, and providing educational materials that highlight the importance of evidence-based practices and efficient resource management.

5. Explore automation and robotics: Researchers can investigate the use of automation and robotics in agriculture to reduce workload and production costs. This can involve developing autonomous farming systems that can perform tasks such as planting, harvesting, and crop monitoring. By reducing the reliance on manual labor, farmers can increase efficiency and productivity.

By focusing on these research techniques, we can overcome the limitations of current agricultural techniques and pave the way for more efficient and sustainable farming practices.

In [38]:
docs1 = s2reader.load_data(query="anti muslim bias in large language models", limit=10)
docs2 = s2reader.load_data(query="fixing biases in large language models", limit=10)
# docs3 = s2reader.load_data(query="ethical considerations in AI models", limit=10)

# build index
docs1_index = VectorStoreIndex.from_documents(docs1)
docs2_index = VectorStoreIndex.from_documents(docs2)
# docs3_index = VectorStoreIndex.from_documents(docs3)


docs1_engine = docs1_index.as_query_engine(similarity_top_k=5)
docs2_engine = docs2_index.as_query_engine(similarity_top_k=5)
# docs3_engine = docs3_index.as_query_engine(similarity_top_k=5)


query_engine_tools = [
    QueryEngineTool(
        query_engine=docs1_engine,
        metadata=ToolMetadata(
            name="anti_muslim_bias_in_large_language_models",
            description="This has information about anti muslim bias in large language models. "
            "Use a detailed plain text question as input to the tool.",
        ),
    ),
    QueryEngineTool(
        query_engine=docs2_engine,
        metadata=ToolMetadata(
            name="use_of_AI_in_understanding_religious_texts",
            description="This has information about use of AI in understanding religious texts "
            "Use a detailed plain text question as input to the tool.",
        ),
    ),
    # QueryEngineTool(
    #     query_engine=docs2_engine,
    #     metadata=ToolMetadata(
    #         name="ethical_considerations_in_AI_models",
    #         description="This has information about ethical considerations in AI models. "
    #         "Use a detailed plain text question as input to the tool.",
    #     ),
    # ),
]

# toy index - stores a list of abbreviations
texts = [""]
docs = [Document(text=t) for t in texts]
context_index = VectorStoreIndex.from_documents(docs)

context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
    query_engine_tools, context_index.as_retriever(similarity_top_k=1), verbose=True
)

response = context_agent.chat(
    "Can you point out some gaps in the current techniques? Now use these gaps to think of a new research techniques. Think step by step."
)
# print markdown
Markdown(str(response))

[33;1m[1;3mContext information is below.
---------------------

---------------------
Given the context information and not prior knowledge, either pick the corresponding tool or answer the function: Can you point out some gaps in the current techniques? Now use these gaps to think of a new research techniques. Think step by step.

[0m

To identify gaps in the current techniques, we need to consider the context information provided. However, since no specific context information is given in the prompt, I will provide a general approach to identifying gaps and proposing new research techniques.

1. Identify the current techniques: Determine the existing techniques or methods that are commonly used in the field related to the context information. This could include machine learning algorithms, data analysis methods, or any other relevant techniques.

2. Evaluate the limitations: Assess the limitations or shortcomings of the current techniques. This could involve considering factors such as accuracy, efficiency, scalability, interpretability, or any other relevant criteria. Identify the areas where the current techniques may fall short or have room for improvement.

3. Identify the gaps: Based on the limitations identified in step 2, determine the specific gaps in the current techniques. These gaps could be related to data quality, model performance, interpretability, generalization, or any other relevant aspect.

4. Propose new research techniques: Once the gaps are identified, brainstorm and propose new research techniques to address these gaps. Consider innovative approaches, novel algorithms, or alternative methodologies that could potentially overcome the limitations of the current techniques.

5. Step-by-step plan: Develop a step-by-step plan to implement and evaluate the proposed research techniques. This plan should outline the necessary data collection, preprocessing, model development, evaluation metrics, and any other relevant steps.

It is important to note that without specific context information, the above steps are provided in a general sense. The actual gaps and proposed research techniques will vary depending on the specific field or problem being addressed.