### **DAY 14 (22/01/26)– AI-Powered Analytics: Genie & Mosaic AI**

### Learn:

- Databricks Genie (natural language → SQL)
- Mosaic AI capabilities
- Generative AI integration
- AI-assisted analysis

### 🛠️ Tasks:

1. Use Genie to query data with natural language
2. Explore Mosaic AI features
3. Build simple NLP task
4. Create AI-powered insights

In [0]:
# I am using a pretrained sentiment analysis model to demonstrate
# how unstructured text (product reviews) can be analyzed inside Databricks
# as part of AI-powered analytics (no model training here, inference only).

from transformers import pipeline
import mlflow

# I intentionally use the default pretrained DistilBERT sentiment model
# to keep this focused on inference + AI-assisted insights
classifier = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english",
    revision="714eb0f"
)


# Sample product reviews (can later be replaced with real customer reviews)
reviews = [
    "This product is amazing!",
    "Terrible quality, waste of money",
    "Very good value for the price",
    "Not worth buying at all"
]

# Running sentiment inference
results = classifier(reviews)

# I log this run in MLflow to track AI inference experiments
with mlflow.start_run(run_name="sentiment_analysis_inference"):

    # I log the model used for transparency and reproducibility
    mlflow.log_param(
        "model_name",
        "distilbert-base-uncased-finetuned-sst-2-english"
    )

    # I calculate an interpretable metric instead of fake accuracy
    avg_confidence = sum(r["score"] for r in results) / len(results)
    mlflow.log_metric("avg_confidence_score", avg_confidence)

    # I also log sentiment distribution for business interpretation
    positive_count = sum(1 for r in results if r["label"] == "POSITIVE")
    negative_count = sum(1 for r in results if r["label"] == "NEGATIVE")

    mlflow.log_metric("positive_reviews", positive_count)
    mlflow.log_metric("negative_reviews", negative_count)

# Final output to quickly inspect predictions
results


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9998860359191895},
 {'label': 'NEGATIVE', 'score': 0.9998160004615784},
 {'label': 'POSITIVE', 'score': 0.9998692274093628},
 {'label': 'NEGATIVE', 'score': 0.9997952580451965}]

Possible Insights:- 

Positive sentiment reviews align with high purchase intent

Negative sentiment flags quality issues early, before revenue drops