In [0]:
df = spark.table("gold.customer_summary")
df.show(5)

+------------+---------------+------------------+
|     country|total_customers|        avg_income|
+------------+---------------+------------------+
|     Germany|           1008|102271.41964285714|
|       Japan|            495|106217.04242424242|
|      Canada|            828|103189.84299516908|
|South Africa|            394| 54797.36040609137|
|      France|            640|     55375.9109375|
+------------+---------------+------------------+
only showing top 5 rows


### Simulating Databricks Genie (Natural Language → SQL)

Databricks Genie allows users to ask questions in plain English.
Below, business questions are translated into SQL queries,
similar to how Genie works behind the scenes.

🔹 Business Question:
"Show average income by country"

In [0]:
%sql
SELECT
  country,
  AVG(avg_income) AS avg_income
FROM gold.customer_summary
GROUP BY country
ORDER BY avg_income DESC;

country,avg_income
Japan,106217.04242424242
Australia,106057.34649122808
USA,103614.36040184175
Canada,103189.84299516908
Germany,102271.41964285714
UK,102161.52912142152
Brazil,55385.631219512194
France,55375.9109375
South Africa,54797.36040609137
India,53936.98687664042


🔹 Business Question:
"Which country has the highest number of customers?"

In [0]:
%sql
SELECT
  country,
  total_customers
FROM gold.customer_summary
ORDER BY total_customers DESC
LIMIT 1;

country,total_customers
USA,2389


🧠 AI-Assisted Insight:
Countries with higher average income tend to have higher customer concentration,
indicating income level as a potential driver of customer distribution.

### MOSAIC AI – NLP TASK

In [0]:
%pip install torch transformers

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
%restart_python

In [0]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

texts = [
    "The service quality was excellent",
    "Very poor experience, not satisfied",
    "Average support but good pricing"
]

results = classifier(texts)
results

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9998401403427124},
 {'label': 'NEGATIVE', 'score': 0.9998027682304382},
 {'label': 'POSITIVE', 'score': 0.9994316697120667}]

### LOG AI WORK USING MLFLOW

In [0]:
import mlflow

with mlflow.start_run(run_name="ai_sentiment_analysis"):
    mlflow.log_param("task", "sentiment_analysis")
    mlflow.log_param("model_type", "transformers_pipeline")
    mlflow.log_metric("text_samples", len(texts))

Note:
Databricks Genie and Mosaic AI UI are not available in Community Edition.
AI-powered analytics concepts were demonstrated using
natural language to SQL translation and Python-based NLP workflows.