# comercio_ext_auxiliares.tb_auxiliar_unidades_federativas
> ### Origem â€” `bronze/autoloader/landingbeca2026jan/balancacomercial/UF_delta`
## ðŸ“Œ DescriÃ§Ã£o do arquivo
ReferÃªncia de **UF** (unidades federativas) e regiÃ£o.
|Coluna|DescriÃ§Ã£o|
|---|---|
|`CO_UF`|CÃ³digo da UF|
|`SG_UF`|Sigla da UF|
|`NO_UF`|Nome da UF|
|`NO_REGIAO`|Nome da regiÃ£o|

## ConfiguraÃ§Ãµes
> #### **imports**
> #### **get files**
> #### **schema**

In [0]:
from pyspark.sql import functions as F
from pyspark.sql import types as T
from delta.tables import DeltaTable

# Hadoop FileSystem (para verificar existÃªncia do diretÃ³rio silver)
jvm = spark._jvm
FileSystem = jvm.org.apache.hadoop.fs.FileSystem
Path = jvm.org.apache.hadoop.fs.Path
fs = FileSystem.get(spark._jsc.hadoopConfiguration())

In [0]:
bronzePath = "/mnt/bronze/autoloader/landingbeca2026jan/balancacomercial/uf/"
silverPath = "/mnt/silver/landingbeca2026jan/comercio_ext_auxiliares/tb_auxiliar_unidades_federativas/"
silverTable = "tb_auxiliar_unidades_federativas"  # opcional: para registrar no metastore

In [0]:
silverSchema = T.StructType([
    T.StructField("CO_UF",     T.StringType(),   nullable=False),
    T.StructField("SG_UF",     T.StringType(),   nullable=False),
    T.StructField("NO_UF",     T.StringType(),   nullable=False),
    T.StructField("NO_REGIAO", T.StringType(),   nullable=False),
    T.StructField("TS_REF",    T.TimestampType(),nullable=False),
    T.StructField("NM_ORIGEM", T.StringType(),   nullable=False),
])

## ExtraÃ§Ã£o
> #### **saprk.read**

In [0]:
df_bronze_raw = spark.read.format("delta").load(bronzePath)

##NormalizaÃ§Ã£o
> #### **datatype**
> #### **regras**

In [0]:
df_normalized = (
    df_bronze_raw
    .withColumn("CO_UF",     F.upper(F.trim(F.col("CO_UF").cast(T.StringType()))))
    .withColumn("SG_UF",     F.upper(F.trim(F.col("SG_UF").cast(T.StringType()))))
    .withColumn("NO_UF",     F.col("NO_UF").cast(T.StringType()))
    .withColumn("NO_REGIAO", F.col("NO_REGIAO").cast(T.StringType()))
)

In [0]:
df_with_defaults = (
    df_normalized
    .withColumn("TS_REF",    F.current_timestamp())
    .withColumn("NM_ORIGEM", F.lit("/landingbeca2026jan/balancacomercial/UF_delta"))
)

##ValidaÃ§Ãµes
> #### **data quality**
> #### **deduplicaÃ§Ã£o**
> #### **schema fit**

In [0]:
df_valid = df_with_defaults.filter(F.col("CO_UF").isNotNull()). filter (
    F.length(F.col("SG_UF")) <= 2
  )


In [0]:
df_dedup = df_valid.dropDuplicates(["CO_UF"])

In [0]:
# Seleciona e garante ordem das colunas conforme o schema alvo
df_silver = df_dedup.select(
    "CO_UF", "SG_UF", "NO_UF", "NO_REGIAO", "TS_REF", "NM_ORIGEM"
)

##Carga
> #### **merge**

In [0]:
delta_target = DeltaTable.forName(spark, "silver_comercio_ext_auxiliares.tb_auxiliar_unidades_federativas")

merge_condition = """
    t.CO_UF = s.CO_UF
"""

(delta_target.alias("t")
    .merge(df_silver.alias("s"), merge_condition)
    .whenMatchedUpdate(set={
        "SG_UF": "s.SG_UF",
        "NO_UF": "s.NO_UF",
        "NO_REGIAO": "s.NO_REGIAO",
        "TS_REF": "s.TS_REF",
        "NM_ORIGEM": "s.NM_ORIGEM"
    })
    .whenNotMatchedInsertAll()
    .execute()
)

In [0]:
#df_silver.count()

In [0]:
#display(spark.sql("select * from silver_comercio_ext_auxiliares.tb_auxiliar_unidades_federativas"))