In [1]:
import os
import ssl
import dotenv

dotenv.load_dotenv()
ssl._create_default_https_context = ssl._create_unverified_context


In [3]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["LANGSMITH_API_KEY"] = os.getenv("LANGSMITH_API_KEY")  # langsmith tracking
os.environ["LANGSMITH_TRACING"] = os.getenv("LANGSMITH_TRACING")
os.environ["LANGSMITH_PROJECT"] = os.getenv("LANGSMITH_PROJECT")
os.environ["LANGSMITH_ENDPOINT"] = os.getenv("LANGSMITH_ENDPOINT")

In [6]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    api_key = os.environ["OPENAI_API_KEY"],
    model = "gpt-4o"
)

In [8]:
llm.invoke(
    """You are an invoice analysis assistant. Based on the provided context, extract ONLY the invoice number.
Do not include any explanation, formatting, or extra words. Just output the invoice number as it appears in the document.

Context:
9 , 85 , 231 . 98 54 , 73 , 511 . 00 Tax Amount
References Dated Delivery Note Date Destination
BANK A / c No . : 7011645010 Branch  &  IFS Code
Invoice No . AHM / 2024 - 25 / 03 Delivery Note
    """
).content

'AHM / 2024 - 25 / 03'

In [10]:
## Prompt template
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "you are a invoice analytics expert, provide me answer based on the question asked"
        ),
        (
            "user",
            "{input_question}"
        )
    ]
)
input_question = "How to extract invoice date"


In [11]:
prompt

ChatPromptTemplate(input_variables=['input_question'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='you are a invoice analytics expert, provide me answer based on the question asked'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input_question'], input_types={}, partial_variables={}, template='{input_question}'), additional_kwargs={})])

In [12]:
chain = prompt | llm

In [13]:
chain

ChatPromptTemplate(input_variables=['input_question'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='you are a invoice analytics expert, provide me answer based on the question asked'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input_question'], input_types={}, partial_variables={}, template='{input_question}'), additional_kwargs={})])
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x115f633d0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x116b0af90>, root_client=<openai.OpenAI object at 0x116b00510>, root_async_client=<openai.AsyncOpenAI object at 0x115f57990>, model_name='gpt-4o', model_kwargs={}, openai_api_key=SecretStr('**********'))

In [14]:
response = chain.invoke(
    {
        "input_question":input_question
    }
)

In [17]:
print(response.content)

Extracting an invoice date can depend on the format and source of the invoice data. Here are common methods to extract invoice dates:

1. **From a Digital PDF:**
   - **Optical Character Recognition (OCR):** Use OCR software like Adobe Acrobat, Tesseract, or online OCR tools to convert the text from a scanned PDF into a machine-readable format.
   - **Text Parsing:** Once converted to text, use regular expressions (regex) or text parsing libraries in Python (such as nltk or re) to identify and extract the date based on common date formats (e.g., mm/dd/yyyy, dd-mm-yyyy).

2. **From a Spreadsheet (Excel, CSV):**
   - **Direct Access:** If the invoice data is structured in a spreadsheet, you can directly access and read the date from the specific cell or column allocated for invoice dates.
   - **Data Filtering:** Use Excel formulas or Python libraries like pandas to filter and extract the required date entries.

3. **From an ERP or Accounting System:**
   - **APIs:** Many accounting or E

In [18]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()
chain = prompt|llm|output_parser
response = chain.invoke(
    {
        "input_question":input_question
    }
)
print(response)

To extract an invoice date from an invoice document, you can use several methods depending on the format of the invoice (e.g., PDF, image, digital file) and the tools available to you:

1. **Optical Character Recognition (OCR):** 
   - If the invoice is in an image-based format (like PDF scans or photos), use OCR tools like Adobe Acrobat, Tesseract, Google Cloud Vision, or AWS Textract to convert the image text into machine-readable text.
   - Once the text is extracted, employ text parsing techniques to locate the invoice date.

2. **Regular Expressions (Regex):**
   - Develop a regex pattern to identify date formats typically used in invoices, such as "MM/DD/YYYY," "DD/MM/YYYY," or "YYYY-MM-DD."
   - Search through the extracted or digital text for matches.

3. **Natural Language Processing (NLP):**
   - Use NLP libraries like SpaCy or NLTK to identify date entities in the text.
   - These libraries have pre-trained models that can recognize dates and other entities within a body of 