## Feature 1 — Ticket Classification

**What it does:**  
Automatically classifies support tickets using an AI model.

**How it works:**
- Loads tickets from a JSON file.
- Uses Google Generative AI to assign:
  - **Topic** (e.g., Product, Feedback)
  - **Sentiment** (e.g., Angry, Curious)
  - **Priority** (P0, P1, P2)

**Result:**  
Each ticket gets these three labels for easier management.


In [None]:
!pip install -U langchain-google-genai

Collecting langchain-google-genai
  Downloading langchain_google_genai-2.1.10-py3-none-any.whl.metadata (7.2 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain-google-genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting google-ai-generativelanguage<0.7.0,>=0.6.18 (from langchain-google-genai)
  Downloading google_ai_generativelanguage-0.6.18-py3-none-any.whl.metadata (9.8 kB)
Downloading langchain_google_genai-2.1.10-py3-none-any.whl (49 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Downloading google_ai_generativelanguage-0.6.18-py3-none-any.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m27.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: filetype, google-ai-generativelanguage, langchain-google-genai
  Attempting uninstall: google-ai-generativelan

In [None]:
!pip install -q langchain-community langchain-core

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m2.1/2.5 MB[0m [31m64.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m41.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.7/64.7 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests==2.32.4, but you have requests 2.32.5 which is incompatible.
google-generativeai 0.8.5 requires google-ai-generativelanguage==0.6.15, but you have google-ai-generativelanguage 0.

### Defing our LLM Model

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI
GOOGLE_API_KEY="AIzaSyDiE4IX_azfFI7sbnYDUXAUl949lzFr8kg"
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", google_api_key=GOOGLE_API_KEY)

### Definig our System Prompt

In [None]:
from langchain.prompts import PromptTemplate
classification_prompt = PromptTemplate(
    input_variables=["ticket_id","ticker_subject","ticket_text"],
    template=(
        "You are a ticket classification assistant.\n"
        "Given the user support ticket below, label it with:\n"
        "  - Topic: one of [How-to, Product, Connector, Feedback, ...]\n"
        "  - Sentiment: one of [Frustrated, Curious, Angry, Neutral]\n"
        "  - Priority: one of [P0/High, P1/Medium, P2/Low]\n"
        "Ticket Id: {ticket_id}\n"
        "Subject: {ticker_subject}\n"
        "Ticket:\n---\n{ticket_text}\n---\n"
        "Return format:\n"
        "Topic: <topic>\nSentiment: <sentiment>\nPriority: <priority>\n"
    )
)

### Creating LLM Chain

In [None]:
from langchain.chains import LLMChain
classification_chain = LLMChain(
    llm=llm,
    prompt=classification_prompt
)

  classification_chain = LLMChain(


### Loading the Data from `tickets.json` file

In [None]:
import json
with open('/content/tickets_data.json', 'r') as file:
    data = json.load(file)

ids = [ticket['id'] for ticket in data]
subjects = [ticket['subject'] for ticket in data]
bodies = [ticket['body'] for ticket in data]

print("IDs:", len(ids))
print("Subjects:", len(subjects))
print("Bodies:", len(bodies))

IDs: 30
Subjects: 30
Bodies: 30


### Passing each ticket to the LLM chain and store the results in a list

In [None]:
from concurrent.futures import ThreadPoolExecutor, as_completed

def classify_ticket(ticket_input):
    return classification_chain.apply([ticket_input])[0]["text"]

inputs = [
    {"ticket_id": ids[i], "ticker_subject": subjects[i], "ticket_text": bodies[i]}
    for i in range(len(ids))
]

results = []
with ThreadPoolExecutor(max_workers=2) as executor:
    futures = [executor.submit(classify_ticket, inp) for inp in inputs]
    for future in as_completed(futures):
        results.append(future.result())


  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerMinutePerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 10
}
, links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, retry_delay {
  seconds: 59
}
].
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerMinutePerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 10
}
, links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, retry_delay {
  seconds: 57
}
].
  quota_metric: "generativelanguage.googleapis.com/generate_

Topic: Connector
Sentiment: Neutral
Priority: P0/High
Topic: Product
Sentiment: Curious
Priority: P1/Medium
Topic: How-to
Sentiment: Frustrated
Priority: P0/High
Topic: How-to
Sentiment: Frustrated
Priority: P0/High
Topic: How-to
Sentiment: Neutral
Priority: P1/Medium
Topic: How-to
Sentiment: Neutral
Priority: P1/Medium
Topic: How-to
Sentiment: Curious
Priority: P1/Medium
Topic: How-to
Sentiment: Neutral
Priority: P1/Medium
Topic: Product
Sentiment: Frustrated
Priority: P0/High
Topic: How-to
Sentiment: Frustrated
Priority: P0/High
Topic: How-to
Sentiment: Neutral
Priority: P2/Low
Topic: How-to
Sentiment: Neutral
Priority: P2/Low
Topic: How-to
Sentiment: Curious
Priority: P1/Medium
Topic: Product
Sentiment: Neutral
Priority: P0/High
Topic: How-to
Sentiment: Curious
Priority: P1/Medium
Topic: How-to
Sentiment: Neutral
Priority: P1/Medium
Topic: Product
Sentiment: Neutral
Priority: P0/High
Topic: How-to
Sentiment: Neutral
Priority: P0/High
Topic: Product
Sentiment: Neutral
Priority: P2/Lo

In [None]:
results[1]

'Topic: Product\nSentiment: Curious\nPriority: P1/Medium'

In [None]:
ids[1]

'TICKET-246'

In [None]:
subjects[1]

'Which connectors automatically capture lineage?'

In [None]:
bodies[1]

"Hello, I'm new to Atlan and trying to understand the lineage capabilities. The documentation mentions automatic lineage, but it's not clear which of our connectors (we use Fivetran, dbt, and Tableau) support this out-of-the-box. We need to present a clear picture of our data flow to leadership next week. Can you explain how lineage capture differs for these tools?"

## Feature 2 — Documentation Crawler & Retriever

**What it does:**  
Crawls Atlan documentation sites to collect pages, builds a searchable knowledge base, and enables question-answering over the docs.

**How it works:**
- Crawls all relevant pages from given documentation URLs.
- Extracts and saves page text in chunks for processing.
- Converts text into embeddings and stores them using FAISS for fast retrieval.
- Provides a QA (Question Answering) interface powered by Google Generative AI, allowing users to ask questions and get answers based on the crawled documentation.

**Result:**  
You can search and answer queries from Atlan docs instantly using natural language, making it easy to find information.


In [None]:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
START_URL = "https://docs.atlan.com/"
visited = set()
to_visit = [START_URL]
all_urls_ = set()
while to_visit:
    url = to_visit.pop(0)
    if url in visited or ".pdf" in url or "#" in url:
        continue
    try:
        resp = requests.get(url, timeout=10)
        visited.add(url)
        all_urls_.add(url)
        soup = BeautifulSoup(resp.text, "html.parser")
        for link in soup.find_all("a", href=True):
            full_url = urljoin(url, link["href"])
            if urlparse(full_url).netloc == urlparse(START_URL).netloc and full_url not in visited:
                to_visit.append(full_url)
    except Exception as e:
        print(f"Error visiting {url}: {e}")

print(f"Discovered {len(all_urls_)} documentation URLs.")

Discovered 1082 documentation URLs.


In [None]:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
START_URL = "https://developer.atlan.com/"
visited = set()
to_visit = [START_URL]
all_urls= set()

while to_visit:
    url = to_visit.pop(0)
    if url in visited or ".pdf" in url or "#" in url:
        continue
    try:
        resp = requests.get(url, timeout=10)
        visited.add(url)
        all_urls.add(url)
        soup = BeautifulSoup(resp.text, "html.parser")
        for link in soup.find_all("a", href=True):
            full_url = urljoin(url, link["href"])
            if urlparse(full_url).netloc == urlparse(START_URL).netloc and full_url not in visited:
                to_visit.append(full_url)
    except Exception as e:
        print(f"Error visiting {url}: {e}")
print(f"Discovered {len(all_urls)} documentation URLs.")

Discovered 603 documentation URLs.


In [None]:
all_urls

{'https://developer.atlan.com',
 'https://developer.atlan.com/',
 'https://developer.atlan.com/concepts/',
 'https://developer.atlan.com/concepts/review/',
 'https://developer.atlan.com/conventions/',
 'https://developer.atlan.com/endpoints/',
 'https://developer.atlan.com/events/',
 'https://developer.atlan.com/events/scenarios/asset-classify/',
 'https://developer.atlan.com/events/scenarios/asset-create/',
 'https://developer.atlan.com/events/scenarios/asset-declassify/',
 'https://developer.atlan.com/events/scenarios/asset-delete/',
 'https://developer.atlan.com/events/scenarios/asset-update/',
 'https://developer.atlan.com/events/scenarios/custom-metadata-add/',
 'https://developer.atlan.com/events/scenarios/custom-metadata-delete/',
 'https://developer.atlan.com/events/scenarios/lineage-create/',
 'https://developer.atlan.com/events/types/business_attribute_update/',
 'https://developer.atlan.com/events/types/classification_add/',
 'https://developer.atlan.com/events/types/classif

In [None]:
import json

# Save all_urls to a JSON file
with open("developer_atlan_urls.json", "w") as f:
    json.dump(list(all_urls), f)

# Save all_urls_ to a JSON file
with open("docs_atlan_urls.json", "w") as f:
    json.dump(list(all_urls_), f)

print("URLs saved to developer_atlan_urls.json and docs_atlan_urls.json")

URLs saved to developer_atlan_urls.json and docs_atlan_urls.json


In [None]:
all_urls_

{'https://docs.atlan.com/product/administration/labs/how-tos/enable-sample-data-download',
 'https://docs.atlan.com/apps/connectors/business-intelligence/metabase/how-tos/set-up-metabase',
 'https://docs.atlan.com/faq/data-connections-and-integration',
 'https://docs.atlan.com/product/integrations/collaboration/microsoft-teams',
 'https://docs.atlan.com/tags/n-8-n',
 'https://docs.atlan.com/apps/connectors/database/sap-hana/how-tos/set-up-sap-hana',
 'https://docs.atlan.com/apps/connectors/etl-tools/aws-glue/references/what-does-atlan-crawl-from-aws-glue',
 'https://docs.atlan.com/product/integrations/identity-management/sso/faq/google-dashboard-login-error',
 'https://docs.atlan.com/product/capabilities/atlan-ai/how-tos/implement-the-atlan-mcp-server',
 'https://docs.atlan.com/product/capabilities/insights/faq/monitor-runaway-queries',
 'https://docs.atlan.com/product/capabilities/data-products/how-tos/create-data-products',
 'https://docs.atlan.com/product/capabilities/governance/tag

### Retrive the text from each link and store them in `Docunments` list

In [None]:
def fetch_page_text(url):
    resp = requests.get(url)
    soup = BeautifulSoup(resp.text, 'html.parser')
    for tag in soup(['nav', 'footer', 'script', 'style']):
        tag.decompose()
    text = '\n'.join([p.get_text(separator=' ', strip=True) for p in soup.find_all(['p', 'li', 'h2', 'h3'])])
    return text

def chunk_text(text, max_chunk_size=500):
    words = text.split()
    return [' '.join(words[i:i+max_chunk_size]) for i in range(0, len(words), max_chunk_size)]
documents = []
for url in all_urls_:
    raw_text = fetch_page_text(url)
    for chunk in chunk_text(raw_text):
        documents.append({"text": chunk, "source": url})


In [None]:
def fetch_page_text(url):
    resp = requests.get(url)
    soup = BeautifulSoup(resp.text, 'html.parser')
    for tag in soup(['nav', 'footer', 'script', 'style']):
        tag.decompose()
    text = '\n'.join([p.get_text(separator=' ', strip=True) for p in soup.find_all(['p', 'li', 'h2', 'h3'])])
    return text

def chunk_text(text, max_chunk_size=500):
    words = text.split()
    return [' '.join(words[i:i+max_chunk_size]) for i in range(0, len(words), max_chunk_size)]

for url in all_urls:
    raw_text = fetch_page_text(url)
    for chunk in chunk_text(raw_text):
        documents.append({"text": chunk, "source": url})


In [None]:
len(documents)

4511

In [None]:
!pip install -q faiss-cpu chromadb langchain

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.4/31.4 MB[0m [31m65.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.8/19.8 MB[0m [31m82.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.2/284.2 kB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m63.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.3/103.3 kB[0m [31m6.4 MB/s[0m eta [36m0:00:

In [None]:
!pip install -q langchain-google-genai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/49.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━[0m [32m1.1/1.4 MB[0m [31m31.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m23.6 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-generativeai 0.8.5 requires google-ai-generativelanguage==0.6.15, but you have google-ai-generativelanguage 0.6.18 which is incompatible.[0m[31m
[0m

In [None]:
texts = [doc["text"] for doc in documents]


In [None]:
metadatas = [{"source": doc["source"]} for doc in documents]


In [None]:
!pip install -q sentence-transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(texts)


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
import faiss
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)


In [None]:
faiss.write_index(index, "faiss.index")
import json
with open("metadata.json", "w") as f:
    json.dump(documents, f)

In [None]:
!pip install -q -U langchain-community

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m26.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.7/64.7 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests==2.32.4, but you have requests 2.32.5 which is incompatible.
google-generativeai 0.8.5 requires google-ai-generativelanguage==0.6.15, but you have google-ai-generativelanguage 0.6.18 which is incompatible.[0m[31m
[0m

In [None]:
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_texts(texts, embedding=embeddings, metadatas=metadatas)
vectorstore.save_local("faiss_store")

  embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")


In [None]:
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.load_local("faiss_store", embeddings,allow_dangerous_deserialization=True)

In [None]:
retriever = vectorstore.as_retriever()


In [None]:
prompt = PromptTemplate(
    template="Context:\n{context}\n\nQuestion: {question}\n\nAnswer:",
    input_variables=["context", "question"]
)


### We create the `qa_chain`

In [None]:
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)


### Testing by passing an query

In [None]:
query = "Hi team, we're trying to set up our primary Snowflake production database as a new source in Atlan, but the connection keeps failing. We've tried using our standard service account, but it's not working. Our entire BI team is blocked on this integration for a major upcoming project, so it's quite urgent. Could you please provide a definitive list of the exact permissions and credentials needed on the Snowflake side to get this working? Thanks."
result = qa_chain({"query": query})
print("Answer:", result["result"])

if "source_documents" in result:
    print("\nSources:")
    for doc in result["source_documents"]:
        url = doc.metadata.get("source", None) or doc.metadata.get("url", None)
        print(f"- {url}")


Answer: Hi team,

It sounds like you're trying to set up the foundational connection for Atlan, likely including its Data Quality features, which require specific Snowflake configurations beyond a standard service account. The provided context details the exact setup needed for Atlan's Data Quality Studio, which typically forms the basis of a robust Snowflake integration.

Here is a definitive list of the exact permissions and credentials needed on the Snowflake side for Atlan to connect and enable its data quality features:

---

### **Snowflake Setup for Atlan Data Quality (Recommended for Primary Connection)**

This setup ensures Atlan has the necessary dedicated resources and permissions to manage data quality operations.

#### **I. Prerequisites (Performed by ACCOUNTADMIN or equivalent):**

1.  **Snowflake Edition:** Ensure you have **Snowflake Enterprise or Business Critical edition**.
2.  **Dedicated Warehouse:** Identify a **dedicated Snowflake warehouse** to be used specifical