One way we can query phi-3 is through a locally run API server

In [13]:
from functions import *
context = "List and analyze the limitations of the studies provided within a greater context. Identify and explain any potential for future research indicated."
prompt = core_query("wind+speed+prediction", 10).loc[0, "text"]
if prompt.find("conclusion") + 4000 < len(prompt):
    prompt = prompt[prompt.find("conclusion"):prompt.find("conclusion") + 4000]
else:
    prompt = prompt[prompt.find("conclusion"):]
print("Querying API server, please be patient...")
response, context = api_server_query(context, prompt)
print(response)

invalid pdf header: b'<scri'
EOF marker not found


Bad link: https://core.ac.uk/download/pdf/6182363.pdf
Querying API server, please be patient...
The studies presented in the abstract highlight the limitations of using Markov chain models for wind speed forecasting. Specifically, the researchers note that Markov chain models require certain assumptions on the distribution of wind speed and its variations, which may not always hold true in real-world scenarios. Additionally, the researchers observed that while the Markov chain models perform well in reproducing the statistical features of wind speed, their ability to forecast wind speed at different horizon times is dependent on the time scale used. This suggests that the models may not be universally applicable and may require further optimization for specific use cases.

The use of semi-Markov models, particularly the indexed semi-Markov chain (ISMC) model, is suggested as a potential solution to these limitations. The ISMC model does not require any assumptions on the distribution o

Although it could be costly to scale, it is much faster to use a serverless implementation through Azure AI  (although it's still relatively slow because of the large context length).

In [2]:
from dotenv import load_dotenv
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

def research_query(prompt):
    ''' Takes a text prompt and returns phi 3.5 responses and a dictionary of token usage. '''
    load_dotenv()
    api_key = os.getenv("AZURE_INFERENCE_CREDENTIAL", '')
    if not api_key:
      raise Exception("A key should be provided to invoke the endpoint")

    client = ChatCompletionsClient(
        endpoint='https://Phi-3-5-mini-instruct-bzsru.eastus2.models.ai.azure.com',
        credential=AzureKeyCredential(api_key)
    )

    result = client.complete({
        "messages": [
          {
            "role": "user",
            "content": prompt + "List and explain within the list any potential for future research within the context of the field of research. Try to keep your response centered around 5 major gaps in the research."
          }
        ],
        "temperature": 0.8,
        "top_p": 0.95,
    })
    response = str(result.choices[0].message.content)

    # Bold the text between asterisks so it looks nice
    while "**" in response:
        first = response.find("**")
        left = response[:first]
        second = response[response.find("**") + 2:].find("**") + first + 2
        right = response[second + 2:]
        middle = response[first + 2:second]
        middle = '\033[1m' + middle + '\033[0m'
        response = left + middle + right

    response += "\n\n"
    for i in range(len(paper_df)):
        response += f"[{i + 1}] {paper_df.loc[i, 'citation']} \n\n"

    tokens_usage = {"prompt": result.usage.prompt_tokens, 
                    "completion": result.usage.completion_tokens, 
                    "total": result.usage.total_tokens}
    return response, tokens_usage

In [1]:
from functions import *
# "((stock+price)OR(valuation))AND(financial+metrics)"
# "(machine+learning)AND((price+prediction)OR(stock+price))"
print(research_query("(machine+learning)AND((price+prediction)OR(stock+price))", limit=9, offset=3)[0])

 1. [1mIntegration of Multiple Data Sources[0m: The research predominantly focuses on historical stock prices and traditional financial indicators. Future studies could explore the integration of alternative data sources such as social media sentiment, economic indicators, company news releases, and geopolitical events to enhance prediction models. This could provide a more holistic view of factors influencing stock prices and improve the robustness of predictions.

2. [1mReal-time Data Analysis and Prediction[0m: Most studies analyze data retrospectively. There is potential for research focused on developing models that can analyze and predict stock movements in real-time, utilizing streaming data from various sources. This would be highly beneficial for traders and investors seeking to make timely decisions.

3. [1mHyperparameter Optimization and Model Selection[0m: While several machine learning models are employed, there is limited exploration into the optimal tuning of hyper

In [18]:
paper_df = semantic_scholar_query("((stock+price)OR(earnings))AND(financial+metrics)", 10)
prompt = ""
for text in paper_df['abstract']:
    if text != None:
        prompt += text

In [20]:
paper_df.loc[1, 'abstract']

'Objectives: This paper sought to undertake a comprehensive analysis aimed at investigating the influence arising from the various financial metrics, namely the Current Ratio, Debt-to-Equity Ratio, Return On Assets, and Total Assets Turnover on Earnings Per Share (EPS), and Stock Prices as the moderating variable.\xa0Methodology: This research employs a quantitative descriptive methodology by collecting financial reports of Food and Beverage companies listed on the Indonesia Stock Exchange (BEI). The measurement model and hypothesis testing are Descriptive Statistics with a Panel Data Regression Model Selection.Finding: The study found that the Current Ratio had no significant direct effect on Earnings Per Share (EPS), while the Debt to Equity Ratio, Return on Assets, and Total Assets Turnover all had varying degrees of negative influence on EPS via Stock Price. The combined impact of these metrics was statistically significant, emphasizing the importance of considering multiple factor