✅ Goals:
**Join two Gold tables** — for example:

daily_country_vaccinations (aggregated people vaccinated)

vaccinations_by_manufacturer_cleaned (manufacturer-level data)

**Demonstrate schema evolution** — adding a new column and writing with mergeSchema.



1. 📥 **Load Gold Tables**


In [0]:
spark.sql("USE CATALOG databricks_cat")
spark.sql("USE SCHEMA gold")

df_daily =spark.table("daily_country_vaccinations")
df_manufacturer = spark.table("vaccinations_by_manufacturer")



vaccine,location,total_vaccinations
Pfizer/BioNTech,Italy,75090024728.0
Novavax,Italy,11630500.0
Oxford/AstraZeneca,Spain,519977338.0
Novavax,Germany,50932135.0
Johnson&Johnson,Germany,2184070823.0
Johnson&Johnson,Spain,104090526.0
Novavax,United States,1778541.0
Moderna,Italy,21532502087.0
Pfizer/BioNTech,France,80533831482.0
Sanofi/GSK,Italy,6721.0


2. 🔗 **Join on** location and date

In [0]:
df_joined = df_daily.join(
    df_manufacturer, 
    on=["location", "date"],
    how="inner"
)
df_joined.display()


**Add a New Column** (Simulate Schema Evolution)


In [0]:
from pyspark.sql.functions import expr

df_updated = df_joined.withColumn(
    "people_per_dose",
    expr("try_divide(people_vaccinated, total_vaccinations)")
)

df_updated.display()


location,date,people_vaccinated,vaccine,total_vaccinations,people_per_dose
Italy,2022-02-28,50717107.0,Novavax,1061.0,47801.23185673892
Italy,2022-04-14,50834366.0,Pfizer/BioNTech,88948803.0,0.5715014062640056
Italy,2022-06-11,50873775.0,Pfizer/BioNTech,90240527.0,0.5637575121874011
United States,2021-02-23,50119355.0,Moderna,31569021.0,1.5876119503357422
United States,2021-05-09,158745361.0,Pfizer/BioNTech,138657097.0,1.1448772867356367
United States,2021-10-23,220181535.0,Moderna,154526345.0,1.4248802364412358
France,2022-03-21,54388799.0,Pfizer/BioNTech,109709736.0,0.4957517990928353
Germany,2021-02-28,4153745.0,Johnson&Johnson,98.0,42385.15306122449
Italy,2022-02-14,50611811.0,Pfizer/BioNTech,86230356.0,0.5869372845915191
France,2021-10-30,51838701.0,Moderna,11201363.0,4.627892248470119


4. 💾 **Save with Schema Evolution Enable**


In [0]:
df_updated.write \
    .format("delta") \
    .option("mergeShema","true") \
    .mode("overwrite") \
    .saveAsTable("schema_evolution_sample")