# 🤖 Unified Semantic Search: PDF Profile + CSV Funding + LLM Draft

This notebook:
- Embeds a user query from their PDF profile
- Queries both Pinecone namespaces: `pdf-upload` and default (CSV funding)
- Combines the results
- Uses GPT to generate a summary + funding recommendations


In [None]:
# import sys
# !{sys.executable} -m pip install -q openai pinecone tiktoken python-dotenv tqdm pandas

In [8]:
import os
import pandas as pd
from dotenv import load_dotenv
from openai import OpenAI
from pinecone import Pinecone
from tqdm import tqdm
import fitz  # PyMuPDF
import tiktoken
from typing import List

In [9]:
# Load env variables
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_ENV = os.getenv("PINECONE_ENV")
assert OPENAI_API_KEY and PINECONE_API_KEY and PINECONE_ENV

In [10]:
# Init Pinecone + OpenAI
client = OpenAI(api_key=OPENAI_API_KEY)
pc = Pinecone(api_key=PINECONE_API_KEY)
index = pc.Index("funding-search")

In [11]:
# %%
# Extract PDF profile text
pdf_path = "/Users/kiranmulawad/AI-Funding/openai_model/sample_user_profile.pdf"

def extract_text_from_pdf(pdf_path: str) -> str:
    doc = fitz.open(pdf_path)
    return "\n".join([page.get_text() for page in doc])

pdf_text = extract_text_from_pdf(pdf_path)
print("✅ PDF Text Sample:\n", pdf_text[:500])

✅ PDF Text Sample:
 Company Name: RoboAI Solutions
Industry: Artificial Intelligence, Robotics
Location: Rhineland-Palatinate, Germany
Company Description:
RoboAI Solutions is a startup focused on the intersection of artificial intelligence and robotics. We
are currently in the early research and prototyping phase for developing intelligent control systems
for industrial robots.
Goals:
- Advance AI-based robotic systems for automation
- Collaborate with academic institutions
- Apply for regional and national fundin


In [12]:
# %%
# Generate a query from the PDF profile
prompt = f"""You are an expert grant advisor. Based on the following user profile text, generate a single-sentence query to find relevant public funding in Germany:

Profile:
\"\"\"
{pdf_text[:2000]}
\"\"\"

Only output the query — do not include explanations or headers."""

query_response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You generate funding search queries from user descriptions."},
        {"role": "user", "content": prompt}
    ]
)

query = query_response.choices[0].message.content.strip()
print("📝 Auto-generated query:", query)


📝 Auto-generated query: "Public funding for Artificial Intelligence and Robotics research in Rhineland-Palatinate, Germany"


In [13]:
# %%
# Embed query
query_embedding = client.embeddings.create(
    input=[query],
    model="text-embedding-3-small"
).data[0].embedding

- Namespace `openai-v3`: CSV-based funding entries

In [14]:
# %%
# 🔍 Query CSV-based funding vectors
csv_results = index.query(
    vector=query_embedding,
    top_k=5,
    include_metadata=True,
    namespace="openai-v3"
)

In [15]:
# %%
# ✅ Show top matches
top_matches = []
for match in csv_results['matches']:
    score = match['score']
    meta = match['metadata']
    text = meta.get('description', '') + f"\nURL: {meta.get('url', '')}"
    top_matches.append((score, text))

for i, (score, text) in enumerate(top_matches, 1):
    print(f"🔹 Match {i} | Score: {score:.4f}\n{text[:400]}\n---")


🔹 Match 1 | Score: 0.6428
If you would like to submit an application (InnoTop), please click here:Program 358.
URL: https://isb.rlp.de/foerderung/269.html
---
🔹 Match 2 | Score: 0.6271
As part of this funding program, Rhineland-Palatinate companies are supported in the introduction of operational innovations. The grants are intended to contribute to creating or maintaining their performance and competitiveness. The grants are intended to provide investment incentives for the implementation of product innovations, innovative business models or innovations in the production proces
---
🔹 Match 3 | Score: 0.6182
Administrative regulation of the Ministry of Economic Affairs, Transport, Agriculture and Viticulture of 20 October 2021The funding program Promotion of innovation assistants in small and medium-sized enterprises in Rhineland-Palatinate is intended to help improve their innovation and competitiveness through technology and knowledge transfer. Funding is provided for the recruitment

In [17]:
# %%
# 🧠 Send to LLM for final recommendation
llm_prompt = f"""
The user's profile is:
{query}

Top matched funding programs:
"""

for i, (_, text) in enumerate(top_matches, 1):
    llm_prompt += f"{i}. {text}\n\n"

llm_prompt += """
Based on this, recommend the 2–3 most relevant funding opportunities with clear reasoning and how the user should proceed.
"""

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": "You are a funding advisor for startups in Germany."},
        {"role": "user", "content": llm_prompt}
    ]
)

print("\n🧾 GPT Recommendation:\n")
print(response.choices[0].message.content)


🧾 GPT Recommendation:

For a startup focused on Artificial Intelligence and Robotics research in Rhineland-Palatinate, Germany, the following funding programs appear to be the most relevant and beneficial based on the goals of advancing technological innovation and competitiveness:

1. **Administrative Regulation (Program 358) - Non-repayable Grants for R&D**:
   - URL: [Program 358](https://isb.rlp.de/foerderung/358.html)
   - **Relevance**: This program directly supports research and development in areas that include new technologies, processes, or services, making it highly pertinent to a startup in AI and robotics. The focus on industrial research and experimental development complements a high-tech startup’s objectives to innovate and develop cutting-edge technologies.
   - **Action Steps**: You should prepare a comprehensive application detailing your project, how it advances the state of the art in AI and robotics, and its feasibility studies. Emphasize any potential contributi