#Breweries Project Pipeline Silver

This project aims to create a pipeline for breweries of the AB InBev Group.


**Responsible Engineer: Ozeas Gomes <p>
Created on: 02/14/2025 <p>
Last updated: 02/14/2025 <p>**

####Installing Required Dependencies

####Importing Dependencies

In [0]:
%run /Users/ozeasjgomes@gmail.com/brewery_data_pipeline/pipeline_functions


In [0]:
from pyspark.sql.functions import col, trim, when, lit, regexp_replace
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType, DateType


####Reading data

In [0]:
gold_breweries_df = spark.read.format("delta").load("/dbfs/FileStore/project_breweries/silver/breweries_delta")

## Aggregations for the Gold Layer:
### Count of Breweries by Type (brewery_type):

In [0]:
# Aggregation by brewery type

gold_brewery_type = gold_breweries_df.groupBy("brewery_type").count() \
                             .withColumnRenamed("count", "quantidade_cervejarias")

# Displaying the result
#gold_brewery_type.display()

brewery_type,quantidade_cervejarias
brewpub,2500
beergarden,3
bar,35
proprietor,69
regional,225
contract,192
location,1
closed,216
nano,13
micro,4308


### Count of Breweries by Location (city, state, and country)

In [0]:
# Aggregation by location (city, state, country)
gold_location = gold_breweries_df.groupBy("state", "country").count() \
                          .withColumnRenamed("count", "quantidade_cervejarias")


# Displaying the result
#gold_location.display()

state,country,quantidade_cervejarias
Idaho,United States,67
Portalegre,Portugal,1
Louisiana,United States,44
Illinois,United States,257
Missouri,United States,141
Colorado,United States,448
Gangwondo,South Korea,7
Westmeath,Ireland,1
Longford,Ireland,1
Oklahoma,United States,44


### Count of Breweries by Type and Location

In [0]:
# Aggregation by brewery type and location
gold_type_location = gold_breweries_df.groupBy("brewery_type", "state", "country").count() \
                               .withColumnRenamed("count", "quantidade_cervejarias")


# Displaying the result
#gold_type_location.display()

brewery_type,state,country,quantidade_cervejarias
contract,New York,United States,22
planning,Arizona,United States,13
micro,Kentucky,United States,41
micro,Oklahoma,United States,30
brewpub,Gwangju,South Korea,1
proprietor,Georgia,United States,1
brewpub,Seoul,South Korea,14
regional,Arizona,United States,1
contract,Arkansas,United States,1
large,Washington,United States,3


### Saving the Gold Layer in Delta Lake

In [0]:
# Saving the aggregation by brewery type to the Gold layer
gold_brewery_type.write.format("delta").mode("overwrite").saveAsTable("gold_cervejarias_por_tipo")

# Saving the aggregation by location to the Gold layer
gold_location.write.format("delta").mode("overwrite").saveAsTable("gold_cervejarias_por_localizacao")

# Saving the combined aggregation (brewery type and location) to the Gold layer
gold_type_location.write.format("delta").mode("overwrite").saveAsTable("gold_cervejarias_tipo_localizacao")