## Docling ( PDF to MD )

In [4]:
import docling

In [5]:
dir(docling)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__']

In [7]:
from docling.document_converter import DocumentConverter

In [23]:
pdfs = ["./BIT301_Tutorial_1.pdf", "./agile-project-management-whats-the-story.pdf"]

In [14]:
# Initialize the Docling object with the path to your document
converter = DocumentConverter()

In [33]:
# Convert the document to a structured format
structured_data = converter.convert(pdfs[1])



In [34]:
output = structured_data.document.export_to_markdown()

In [35]:
type(output)

str

In [36]:
output  # Display the first 1000 characters of the output

"<!-- image -->\n\nWhitepaper\n\n## Agile Project Management: What's the story?\n\nThere is no doubt that Agile project management, the adaptive methodology that spread like wildfire around the turn of the millennium, is here to stay.\n\nAs organizations make the shift to Agile methods, there remains confusion, skepticism, and rampant misconceptions about Agile. And like any methodology, Agile requires organizations to look at the triple foundation of people, process, and technology. To underplay any of those is to sabotage success.\n\nThis white paper will explore Agile from all three perspectives, and will offer guidelines for effective Agile project management. It will expose common myths, offer tips on the cultural shift to Agile, and address concerns about mixed environments where some departments use Agile and others do not. Finally, it will show how Agile is implemented in Planview Enterprise ®  - with and without adjacent tools such as Atlassian's CA Agile Central, Version One,

In [37]:
output_file = pdfs[1].replace(".pdf", ".md")
output_path = f"./markdown_docs/{output_file}"
# Save the Markdown text to a file
with open(output_path, "w",encoding="utf-8") as file:
    file.write(output)

## Using LLM

In [38]:
import ollama

In [40]:
# prompt
prompt = f"""
You are an assistant who summarizes and analyzes documents.
Here is a document content extracted from a PDF:
---
{output} 
---
Please provide:
1. A brief summary (max 5 bullet points)
2. Key topics
3. Any important observations
"""

In [None]:
model = "llama3.1:latest"

In [59]:
def call_llama(prompt, model="qwen3:latest"): 
    # Use Ollama to process the prompt
    response = ollama.chat(
        model=model,  # Use any local model you installed
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    return response['message']['content']

In [48]:
call_llama(prompt)

"The provided text is a comprehensive guide to Agile project management, with a focus on Planview Enterprise, a portfolio and resource management solution that supports Agile methodologies. Here are the main points:\n\n**Agile Project Management**\n\n* Agile teams prioritize customer satisfaction through early delivery of working software.\n* Sprints or iterations (usually 2-4 weeks) help manage work in progress.\n* Story points measure effort required to complete tasks.\n* Velocity measures a team's average story point completion rate.\n* Burndown charts track the remaining story points for each sprint.\n\n**Planview Enterprise and Agile**\n\n* Planview Enterprise supports Agile column sets, including story creation and tracking.\n* It allows team-based capacity planning and burndown/velocity charting.\n* Integration with leading Agile tools (e.g., JIRA, CA Agile Central) enables multiple options for organizations.\n* Resource authorizations and reserves help manage resource demand.\n

In [67]:
# Define the prompt for NER
prompt_ner = f"""
You are a natural language assistant.
Identify the entities like person name, organizations name, locations and dates from the provided Markdown content.

Markdown content:
---
{output}
---

Return only the identified names, locations and dates.
"""

In [68]:
print(call_llama(prompt_ner))

<think>
Okay, let's see. The user wants me to return only the identified names, locations, and dates from the provided text. First, I need to go through the text carefully.

Starting with the author's name: Jerry Manas is mentioned. Then there's a mention of Tom Peters, who is a management guru, and Pat Williams, Senior VP of the Orlando Magic. Locations include places like Iceland, where Jerry appeared on a National TV program discussing economic recovery. Dates mentioned are 2015, when Gartner predicted three quarters of knowledge-based project work would be by virtual teams. Also, the book "Napoleon on Project Management" and "Managing the Gray Areas" are referenced, but those are titles, not dates. The Planview Enterprise demo is mentioned, but no specific date there. 

Wait, the user specified to return only names, locations, and dates. So I need to make sure I don't include any other information. Let me check again. 

Names: Jerry Manas, Tom Peters, Pat Williams, Angela Ahrendts,