### üß† Deadline Manager Agent ‚Äì EY AI Challenge

Modular notebook: OCR, date parsing, working-days, LLM agent para prazos legais e integra√ß√£o opcional de calend√°rio.

In [None]:
# DEPENDENCIES: Some useful dependencies. Theu might not be necessary.
!apt-get update && apt-get install -y tesseract-ocr
!pip install --upgrade pytesseract PyPDF2 pillow dateparser python-dateutil holidays transformers huggingface_hub[hf_xet]

In [1]:
# IMPORTS: Some useful libraries. They might not be necessary
import os
from datetime import datetime, timedelta
from dateparser.search import search_dates
import dateparser
from dateutil.relativedelta import relativedelta
import holidays
import pytesseract
from PIL import Image
from PyPDF2 import PdfReader
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

### üñºÔ∏è OCR & PDF Extraction
Functions to read text in images (Tesseract) and PDFs.

In [2]:
def extract_text_from_image(path):
    """Base da extra√ß√£o de texto a partir de uma imagem (em portugu√™s)."""
    return pytesseract.image_to_string(Image.open(path), lang='por')

def extract_text_from_pdf(path):
    """Base da extra√ß√£o de texto de todas as p√°ginas de um PDF."""
    rdr = PdfReader(path)
    return "\n".join(page.extract_text() or "" for page in rdr.pages)

### üß† Data extraction (NLU)
Extract the first future date from a free text like `dateparser.search.search_dates`.

In [3]:
def infer_deadline(text, base_date=None):
    """Base da identifica√ß√£o de uam data a partir de uma imagem."""
    base = base_date or datetime.now()
    res = search_dates(
        text,
        languages=['pt','en'],
        settings={
            'PREFER_DATES_FROM':'future',
            'RELATIVE_BASE':base,
            'DATE_ORDER':'DMY'
        }
    )
    return res[0][1] if res else None

### üìÖ Work days calculation (PT)
Add work days to a date, excluding weekends and Portuguese holidays.

In [None]:
def add_working_days(start_date, days):
    """Base de un√ß√£o auxiliar para somar dias √∫teis a uma data, gerir f√©rias judiciais, etc."""
    pt_hols = holidays.Portugal()
    curr = start_date
    added = 0
    while added < days:
        curr += relativedelta(days=1)
        if curr.weekday() < 5 and curr not in pt_hols:
            added += 1
    return curr

### ü§ñ Deadline Agent (LLM Free)
One type of open-source model (Flan-T5 small) to apply the following rules:
- Modelo 22: up to 31/jul
- IES: 15/apr (current and next year)
- Others: infer via NLP

In [None]:
# Implementation using simple LLM

tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")
model     = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small")

def llm_generate(prompt: str, max_length: int = 256) -> str:
    inputs = tokenizer(prompt, return_tensors="pt").input_ids
    outs = model.generate(
        inputs, num_beams=4, early_stopping=True, max_length=max_length
    )
    return tokenizer.decode(outs[0], skip_special_tokens=True)

def agent_process(text, reference_date=None):
    """ Base de um Agente que infere deadlines aplicando regras legais ou simplesmente L√≠ngua Natural. Retorna a data em dicion√°rio apto para JSON {'deadline': datetime} ou {'error':...}."""

    ref = reference_date or datetime.now()
    
    prompt = f"""
You are a Portuguese legal deadline assistant. Determine the deadline for the request below using these rules:
- "Modelo 22": due by {ref.year}-07-31
- "IES": due by {ref.year}-04-15 if before, else {ref.year+1}-04-15
- Otherwise infer via natural language (e.g. "5 working days from now").
Reference date: {ref.strftime('%Y-%m-%d')}
Input: "{text}"
Return ONLY a JSON object with key "deadline" (ISO8601 date string).
"""
    
    raw = llm_generate(prompt)
    
    try:
        obj = json.loads(raw)
        d = dateparser.parse(obj['deadline'])
        return {'deadline': d}
    except Exception as e:
        return {'error': f'LLM parse error: {e} | raw: {raw}'}

In [None]:
# Implementation using Gemini LLM

def config_llm_gemini(temperature:int):
  '''LLM api calling using Gemini  '''
  # Steps for students:
  # - Go to https://aistudio.google.com/app/apikey and generate your Gemini API key.
  # - Add the necessary packages to your requirements.txt:
  #    langchain
  #    langchain-google-genai
  # - Run the following command to install them:
  #     !pip install -r requirements.txt
  # - Follow the official integration guide for LangChain + Google Generative AI:
  #     https://python.langchain.com/docs/integrations/chat/google_generative_ai/
  # Pay attention to the request limits of the chosen model.
  return "llm" #Should return the LLM response

In [4]:
import re

def process_deadline_from_image_or_text(input_data, is_image=True, base_date=None):
    """
    Processa uma imagem (OCR) ou texto para identificar prazos legais, aplicando regras portuguesas.
    Retorna um dicion√°rio estruturado com data-limite, fonte, regra aplicada e confian√ßa.
    """
    # 1. OCR se for imagem
    if is_image:
        text = extract_text_from_image(input_data)
        source = f"OCR de {input_data}"
    else:
        text = input_data
        source = "Texto fornecido"

    # 2. Limpeza b√°sica do texto
    clean_text = " ".join(text.split())

    # 3. Infer√™ncia de data-base
    base = base_date or datetime.now()

    # 4. Busca por datas expl√≠citas e frases de prazo
    prazo_inferido = infer_deadline(clean_text, base_date=base)

    # 5. Busca por frases do tipo "X dias √∫teis"
    match = re.search(r'(\d+)\s*dias?\s*√∫teis?', clean_text, re.IGNORECASE)
    if match:
        dias_uteis = int(match.group(1))
        # Busca data-base expl√≠cita
        data_base_match = search_dates(clean_text, languages=['pt'], settings={'PREFER_DATES_FROM':'future','RELATIVE_BASE':base,'DATE_ORDER':'DMY'})
        if data_base_match:
            data_base = data_base_match[0][1]
        else:
            data_base = base
        data_limite = add_working_days(data_base, dias_uteis)
        regra = f"{dias_uteis} dias √∫teis a partir de {data_base.strftime('%d/%m/%Y')}"
        confidence = 0.95
    elif prazo_inferido:
        data_limite = prazo_inferido
        regra = "Data expl√≠cita identificada"
        confidence = 0.8
    else:
        # fallback: usar LLM para tentar deduzir
        agent_result = agent_process(clean_text, reference_date=base)
        if 'deadline' in agent_result:
            data_limite = agent_result['deadline']
            regra = "Inferido por LLM"
            confidence = 0.7
        else:
            data_limite = None
            regra = "N√£o identificado"
            confidence = 0.0

    return {
        "data_limite": data_limite.strftime('%Y-%m-%d') if data_limite else None,
        "fonte": source,
        "regra": regra,
        "confian√ßa": confidence,
        "texto": clean_text
    }


### üîó Calendar integration (Opcional)
Function to create events in external calendar tool

In [None]:
# def create_calendar_event(summary, start, end, timezone='UTC'):
#     pass  # implementar conforme API desejada

### üß™ Use case examples

In [None]:
# Exemplo OCR:
# img_text = extract_text_from_image('scan.png')
# print(infer_deadline(img_text))

# Exemplo agente:
# print(agent_process('Entregar Modelo 22'))
# print(agent_process('Enviar IES at√© dia 15 de abril'))

# Working days:
# base = datetime(2025,5,27)
# print(add_working_days(base,5))