In [1]:
import pandas as pd

df=pd.read_csv('Data/data.csv').drop(['Close'],axis=1)
input_data = df.iloc[0]
input_data

Date      2005-11-21
Open         1.94402
High        1.955116
Low         1.911029
Volume     511711200
ticker          AAPL
Year            2005
Month             11
Day               21
Name: 0, dtype: object

In [2]:
import pickle as pc
def load_assets():
    model = pc.load(open('Models/model.pkl','rb'))
    scaler = pc.load(open('Scaler/scaler.pkl','rb'))
    encoder = pc.load(open('encoders/encoder.pkl','rb'))

    return model, scaler, encoder

In [3]:
model, scaler, encoder = load_assets()

In [4]:
def transform_df(df, encoder, scaler):
    # Ensure df is a DataFrame (even if single row)
    if isinstance(df, pd.Series):
        df = df.to_frame().T

    df1 = df.copy()
    
    # Drop Date column safely
    if 'Date' in df1.columns:
        df1 = df1.drop(['Date'], axis=1,errors='ignore')
    
    # Encode ticker
    df1['ticker'] = encoder.transform(df1['ticker'])
    
    # Scale all features
    final_df = pd.DataFrame(
        data=scaler.transform(df1), 
        columns=scaler.get_feature_names_out()
    )
    
    return final_df

In [5]:
transform_df(input_data,encoder=encoder,scaler=scaler)

Unnamed: 0,Open,High,Low,Volume,ticker,Year,Month,Day
0,-0.780733,-0.781438,-0.780499,1.299734,-1.224745,-1.799038,1.300118,0.60065


In [6]:
model.predict(transform_df(input_data,encoder=encoder,scaler=scaler))



array([1.93159489])

In [7]:
from langchain_groq import ChatGroq
from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv('GROQ_API_KEY')
llm = ChatGroq(model='llama-3.1-8b-instant',api_key=api_key)
llm.invoke('Hello')

AIMessage(content='Hello, how can I assist you today?', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 36, 'total_tokens': 46, 'completion_time': 0.011139768, 'completion_tokens_details': None, 'prompt_time': 0.00196775, 'prompt_tokens_details': None, 'queue_time': 0.04986676, 'total_time': 0.013107518}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_4387d3edbb', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--a67bef69-a532-45ee-a898-d44bb0ab0890-0', usage_metadata={'input_tokens': 36, 'output_tokens': 10, 'total_tokens': 46})

In [8]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""
You are a data extraction and inference assistant.

Your task is to take a natural-language sentence about a stock and convert it
into a JSON object with the following fields:

[
  "Open",
  "High",
  "Low",
  "Volume",
  "ticker",
  "Year",
  "Month",
  "Day"
]

### Rules:
- Always output **valid JSON**.
- Infer or estimate any missing values based on context.
- If the sentence gives close/high/low but not open, you may reasonably infer it from the trend.
- "Date" must be in **YYYY-MM-DD** format.
- "Year", "Month" and "Day" must match the parsed date.
- "ticker" must be ONLY: "AAPL", "MSFT", "GOOGL".
- If multiple tickers appear, choose the first one.
- All numeric values must be numbers (not strings).
- Ensure the final JSON contains **all fields with inferred values**.


### User sentence and date:
{sentence} and {date}

### JSON Output:
""")

In [9]:
from datetime import datetime

today = datetime.today().date()
print(today)

2025-11-20


In [10]:

chain = prompt|llm
print(chain.invoke({'sentence':"Google shares went up today, closing at 171 with a trading volume of 2.3 million.",'date':today}).content)

To create the JSON object from the given sentence, I will follow the rules and infer any missing values.

Given sentence: "Google shares went up today, closing at 171 with a trading volume of 2.3 million. and 2025-11-20"

Since the sentence mentions "GOOGL" (Google), I will use it as the ticker.

I will parse the date: "2025-11-20"
- Year: 2025
- Month: 11
- Day: 20

Since the sentence mentions closing value "171", which is the low, high, or close value but not open, I will infer the open value based on the trend. However, without more information, I will leave the open value as 0 (zero) for now.

From the sentence, I can infer:
- High: 171 (since it's the closing value)
- Low: 171 (since it's the only value mentioned)
- Close: 171

Since only trading volume is mentioned, I will use that as the volume.
- Volume: 2,300,000

Here's the inferred JSON object:

```json
{
  "Open": 0,
  "High": 171,
  "Low": 171,
  "Volume": 2300000,
  "ticker": "GOOGL",
  "Year": 2025,
  "Month": 11,
  "Day

In [11]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""
You are a data extraction and inference assistant.

Your task is to take a natural-language sentence about a stock and convert it
into a JSON object with the following fields:

[
  "Open",
  "High",
  "Low",
  "Volume",
  "ticker",
  "Year",
  "Month",
  "Day"
]

### Rules:
- Always output **valid JSON**.
- Infer or estimate any missing values based on context.
- If the sentence gives close/high/low but not open, you may reasonably infer it from the trend.
- "Date" must be in **YYYY-MM-DD** format.
- "Year", "Month" and "Day" must match the parsed date.
- "ticker" must be ONLY: "AAPL", "MSFT", "GOOGL".
- If multiple tickers appear, choose the first one.
- All numeric values must be numbers (not strings).
- Ensure the final JSON contains **all fields with inferred values**.


### User sentence and date:
{sentence} and {date}

### JSON Output:
""")

In [12]:
from datetime import datetime

today = datetime.today().date()
print(today)

2025-11-20


In [13]:

chain = prompt|llm
print(chain.invoke({'sentence':"Google shares went up today, closing at 171 with a trading volume of 2.3 million.",'date':today}).content)

To extract the required information from the sentence, I'll analyze the context and use the given date to infer missing values.

Given sentence: "Google shares went up today, closing at 171 with a trading volume of 2.3 million. and 2025-11-20"

From the sentence, we can infer the following:

- The stock was "GOOGL" (Google shares).
- The date is 2025-11-20 (given).
- Closing price is 171.
- Trading volume is 2.3 million.

Assuming a typical stock market trend where the opening price is usually lower than the closing price, we can infer the opening price based on the trend. For a 171 closing price, let's assume the opening price is 25% lower, which is a common trend in the stock market.

Inferred opening price = 171 * 0.75 = 127.75

Now, let's use the given date to infer the year, month, and day.

- Year: 2025 (given).
- Month: 11 (given).
- Day: 20 (given).

Here's the JSON output with inferred values:

```json
{
  "Open": 127.75,
  "High": 171,
  "Low": 127.75, // Assuming the lowest 

In [14]:
import re
from datetime import datetime
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq  # or your LLM import

# ------------------------
# 1️⃣ Prompt Template
# ------------------------
prompt = ChatPromptTemplate.from_template("""
You are a data extraction and inference assistant.

Your task is to take a natural-language sentence about a stock and convert it
into a JSON object with the following fields:

[
  "Open",
  "High",
  "Low",
  "Volume",
  "ticker",
  "Year",
  "Month",
  "Day"
]

### Rules:
- Always output **valid JSON**.
- Infer or estimate any missing values based on context.
- If the sentence gives close/high/low but not open, you may reasonably infer it from the trend.
- "Date" must be in **YYYY-MM-DD** format.
- "Year", "Month" and "Day" must match the parsed date.
- "ticker" must be ONLY: "AAPL", "MSFT", "GOOGL".
- If multiple tickers appear, choose the first one.
- All numeric values must be numbers (not strings).
- Ensure the final JSON contains **all fields with inferred values**.

### User sentence and date:
{sentence} and {date}

### JSON Output:
""")

# ------------------------
# 2️⃣ LLM Setup
# ------------------------
llm = ChatGroq(model="llama-3.1-8b-instant")  # replace with your model

chain = prompt | llm

# ------------------------
# 3️⃣ Today's Date
# ------------------------
today = datetime.today().date()

# ------------------------
# 4️⃣ User Sentence
# ------------------------
user_sentence = "Google shares went up today, closing at 171 with a trading volume of 2.3 million."

# ------------------------
# 5️⃣ Invoke LLM
# ------------------------
raw_output = chain.invoke({'sentence': user_sentence, 'date': today}).content
print("Raw LLM Output:\n", raw_output)

# ------------------------
# 6️⃣ Extract JSON using regex
# ------------------------
json_match = re.search(r"\{[\s\S]*?\}", raw_output)
if json_match:
    extracted_json = json_match.group()
    print("\nExtracted JSON:\n", extracted_json)
else:
    print("\nNo JSON found in the output.")


Raw LLM Output:
 To convert the provided sentence into a JSON object, I need to extract and infer the required information. Here's the breakdown:

- Ticker: Based on the sentence, the company mentioned is Google, which corresponds to the ticker "GOOGL".
- Date: The date provided in the sentence is "2025-11-20", which matches the format required. This date can be used to infer the Year, Month, and Day.
- Open: Since the sentence mentions closing at 171, it's not possible to infer the open price directly. However, assuming a typical trading day with the stock price increasing, I can reasonably infer the open price to be lower. Let's assume a 10% drop from the closing price, making the open price around 153.9 (171 * 0.9).
- High: Since the sentence mentions the stock price going up, it's likely that the high price is higher than the closing price. Let's assume a 10% increase from the closing price, making the high price around 187.1 (171 * 1.1).
- Low: Since the sentence doesn't provide a

In [15]:
import pandas as pd
import json
extracted_json_data = json.loads(extracted_json)
extracted_df = pd.DataFrame(data=[extracted_json_data])

extracted_df

Unnamed: 0,Open,High,Low,Volume,ticker,Year,Month,Day
0,153.9,187.1,148.9,2300000,GOOGL,2025,11,20


In [18]:
model.predict(transform_df(extracted_df,encoder,scaler))



array([176.20938885])