# Data Engineering Nanodegree - Capstone Project

In [1]:
import re
from functools import reduce
from itertools import islice, chain, zip_longest
from datetime import date, datetime, timedelta
from pathlib import Path
import pandas as pd
import requests
from bs4 import BeautifulSoup
from IPython.display import Image
from pyspark.sql import functions as F
from pyspark.sql import SparkSession, Row
from pyspark.sql.window import Window
from pyspark import SparkFiles

In [2]:
spark = SparkSession.builder.\
config("spark.jars.repositories", "https://repos.spark-packages.org/").\
config("spark.jars.packages", "saurfang:spark-sas7bdat:2.0.0-s_2.11").\
enableHiveSupport().getOrCreate()

## Step 1: Scope the Project and Gather Data

### Introduction

When I started looking for an interesting set of data to work on, I went through the suggested sources on Project Page and I was immediately struck with choice paralysis. There are so many interesting data sets around, but I couldn't settle on one that had the required scale and was interesting to work with, so I decided to use the data from the Udacity Provided Project which is based around the I94 Immigration Data.

### Scope

Having chosen the Udacity Provided Project, the next step was to choose what to do with the data. My idea is to assume the role of a data analyst of a transportation company that wants to extract useful information out of the immigration data. Possible usages of this data are:

* Increase availability of vehicles based on increased influx of passengers on major US airports.
* Prioritize assignment of drivers based on the demographics of the passengers to ensure culture/language fit.
* Ensure the vehicles assigned to the airports are compatible with the typical conditions of the day (number of expected passengers, weather, etc.)

Given that the I94 Immigration data is not publicly available (for free), I had to design the ETL process around the limitations imposed by this restrictions:

* Do as much as possible within Udacity's workspace.
* Minimize the usage of the AWS resources so I don't run out of credits.
* Design the ETL process as if I had free access to all data sources and unlimited resources on the cloud, but only implement the parts that fit withing the restrictions imposed by the data access and budget restrictions.

With all these requirements and restricions in mind, the goal is to construct a model to represent the daily influx of immigrants for all the major cities in the US with international airports on a given day, informing about their country of origin, language and gender, alongside the typical climatological conditions for that day.

### Gather Data

#### I94 Immigration Data

(Source: https://www.trade.gov/national-travel-and-tourism-office)

In [3]:
df_i94_data = spark.read.format("com.github.saurfang.sas.spark").load("../../data/18-83510-I94-Data-2016/i94_apr16_sub.sas7bdat")

In [4]:
df_i94_data.limit(5).toPandas()

Unnamed: 0,cicid,i94yr,i94mon,i94cit,i94res,i94port,arrdate,i94mode,i94addr,depdate,...,entdepu,matflag,biryear,dtaddto,gender,insnum,airline,admnum,fltno,visatype
0,6.0,2016.0,4.0,692.0,692.0,XXX,20573.0,,,,...,U,,1979.0,10282016,,,,1897628000.0,,B2
1,7.0,2016.0,4.0,254.0,276.0,ATL,20551.0,1.0,AL,,...,Y,,1991.0,D/S,M,,,3736796000.0,296.0,F1
2,15.0,2016.0,4.0,101.0,101.0,WAS,20545.0,1.0,MI,20691.0,...,,M,1961.0,09302016,M,,OS,666643200.0,93.0,B2
3,16.0,2016.0,4.0,101.0,101.0,NYC,20545.0,1.0,MA,20567.0,...,,M,1988.0,09302016,,,AA,92468460000.0,199.0,B2
4,17.0,2016.0,4.0,101.0,101.0,NYC,20545.0,1.0,MA,20567.0,...,,M,2012.0,09302016,,,AA,92468460000.0,199.0,B2


In [5]:
df_i94_data.printSchema()

root
 |-- cicid: double (nullable = true)
 |-- i94yr: double (nullable = true)
 |-- i94mon: double (nullable = true)
 |-- i94cit: double (nullable = true)
 |-- i94res: double (nullable = true)
 |-- i94port: string (nullable = true)
 |-- arrdate: double (nullable = true)
 |-- i94mode: double (nullable = true)
 |-- i94addr: string (nullable = true)
 |-- depdate: double (nullable = true)
 |-- i94bir: double (nullable = true)
 |-- i94visa: double (nullable = true)
 |-- count: double (nullable = true)
 |-- dtadfile: string (nullable = true)
 |-- visapost: string (nullable = true)
 |-- occup: string (nullable = true)
 |-- entdepa: string (nullable = true)
 |-- entdepd: string (nullable = true)
 |-- entdepu: string (nullable = true)
 |-- matflag: string (nullable = true)
 |-- biryear: double (nullable = true)
 |-- dtaddto: string (nullable = true)
 |-- gender: string (nullable = true)
 |-- insnum: string (nullable = true)
 |-- airline: string (nullable = true)
 |-- admnum: double (nullable = 

This is data set is the main source of information for the fact tables, as my goal is to aggregate ingresses by day and airport, to get a sense of the scale of the needs that our ficticious transportation company needs to fill. What I'm most interested in is:

* The arrival date: `arrdate`
* The port of entry: `i94port`
* Country of origin:: Either `i94cit` or `i94res`
* Age: `biryear`
* Gender: `gender`

Given that the semantics behind the values of some of the fields are described in the `I94_SAS_Labels_Descriptions.SAS` file, I've created a Jupyter notebook to extract the relevant information into CSV files. For the purpose of this Project, I ssumed this information to be static. Eventually, if a more appropriate way of sourcing this information is found, I could integrate it using the homegrown tables as an exchange format, so I can still use the existing ETL processes. 

In [6]:
df_i94cntyl = spark.read.csv("i94cntyl.csv", header=True)

In [7]:
df_i94cntyl.limit(5).toPandas()

Unnamed: 0,value,description
0,582,"MEXICO Air Sea, and Not Reported (I-94, no lan..."
1,236,AFGHANISTAN
2,101,ALBANIA
3,316,ALGERIA
4,102,ANDORRA


#### US Customs and Border Protection Port of Entry Codes

(Source: https://redbus2us.com/travel/usa/us-customs-and-border-protection-cbp-codes-port-of-entry-stamp/)

(Alternative source: https://web.archive.org/web/20210422115709/https://redbus2us.com/travel/usa/us-customs-and-border-protection-cbp-codes-port-of-entry-stamp/)

Although the provided data set contains information about the CBP in the `I94_SAS_Labels_Descriptions.SAS` file, the format makes it harder to combine with other sources of information. I've created a PySpark dataframe scraping the information from the source.

In [8]:
headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0"}
response = requests.get(
    "https://redbus2us.com/travel/usa/us-customs-and-border-protection-cbp-codes-port-of-entry-stamp/",
    headers=headers)

In [9]:
doc = BeautifulSoup(response.content, "html.parser")

In [10]:
df_cbp_codes = spark.createDataFrame(
    Row(code=t[0], location=t[1])
    for t in zip_longest(
        *[
            chain.from_iterable(
                filter(
                    lambda t: re.match("^[A-Z]{3}$", t[0]),
                    (
                        tuple(map(lambda e: e.text, tr_elem.find_all("td")))
                        for tr_elem in doc.find_all("tr")
                    )
                )
            )
        ]*2
    )
)

In [11]:
df_cbp_codes.limit(5).toPandas()

Unnamed: 0,code,location
0,ABE,"Aberdeen, WA"
1,ABG,"Alburg, VT"
2,ABQ,"Albuquerque, NM"
3,ABS,"Alburg Springs, VT"
4,ADT,"Amistad Dam, TX"


In [12]:
df_cbp_codes.printSchema()

root
 |-- code: string (nullable = true)
 |-- location: string (nullable = true)



#### World Temperature Data

(Source: https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data)

The idea for this data set is to get a sense of the wheather conditions at the date of ingress, to determine the most appropriate vehicle for that day.

In [13]:
df_temperature = spark.read.csv("../../data2/GlobalLandTemperaturesByCity.csv", header=True)

In [14]:
df_temperature.limit(5).toPandas()

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
0,1743-11-01,6.068,1.737,Ã…rhus,Denmark,57.05N,10.33E
1,1743-12-01,,,Ã…rhus,Denmark,57.05N,10.33E
2,1744-01-01,,,Ã…rhus,Denmark,57.05N,10.33E
3,1744-02-01,,,Ã…rhus,Denmark,57.05N,10.33E
4,1744-03-01,,,Ã…rhus,Denmark,57.05N,10.33E


In [15]:
df_temperature.printSchema()

root
 |-- dt: string (nullable = true)
 |-- AverageTemperature: string (nullable = true)
 |-- AverageTemperatureUncertainty: string (nullable = true)
 |-- City: string (nullable = true)
 |-- Country: string (nullable = true)
 |-- Latitude: string (nullable = true)
 |-- Longitude: string (nullable = true)



#### UNData - National accounts

(Source: http://data.un.org/)

I also wanted to have an idea of the potential affluency of the passenger, so I'm going to use the UNData site's "National accounts" data set to extract the per-capita GPD of the passenger's country of origin. For some reason `SparkContext` doesn't like the file name (probably the spaces) so I had to download it manually.

In [16]:
response = requests.get("http://data.un.org/_Docs/SYB/CSV/SYB64_230_202110_GDP%20and%20GDP%20Per%20Capita.csv", stream=True)

In [17]:
with open('SYB64_230_202110_GDP_and_GDP_Per_Capita.csv', "w") as f:
    f.writelines(map(lambda l: f"{l.decode('latin1')}\n", islice(response.iter_lines(), 1, None)))  # We need to drop the first extra line

In [18]:
df_national_accounts = spark.read.csv("SYB64_230_202110_GDP_and_GDP_Per_Capita.csv", header=True)

In [19]:
df_national_accounts.limit(5).toPandas()

Unnamed: 0,Region/Country/Area,_c1,Year,Series,Value,Footnotes,Source
0,1,"Total, all countries or areas",1995,GDP in current prices (millions of US dollars),31140783,,"United Nations Statistics Division, New York, ..."
1,1,"Total, all countries or areas",2005,GDP in current prices (millions of US dollars),47623151,,"United Nations Statistics Division, New York, ..."
2,1,"Total, all countries or areas",2010,GDP in current prices (millions of US dollars),66272559,,"United Nations Statistics Division, New York, ..."
3,1,"Total, all countries or areas",2015,GDP in current prices (millions of US dollars),74985744,,"United Nations Statistics Division, New York, ..."
4,1,"Total, all countries or areas",2017,GDP in current prices (millions of US dollars),81056929,,"United Nations Statistics Division, New York, ..."


In [20]:
df_national_accounts.printSchema()

root
 |-- Region/Country/Area: string (nullable = true)
 |-- _c1: string (nullable = true)
 |-- Year: string (nullable = true)
 |-- Series: string (nullable = true)
 |-- Value: string (nullable = true)
 |-- Footnotes: string (nullable = true)
 |-- Source: string (nullable = true)



#### Airport Code Table

(Source: https://datahub.io/core/airport-codes#data)

I want to focus just on airports, so I used this airport table alognside additional information to estimate the airport through which passenger ingressed the country. I used the city (municipality) as the link between the port (identified by the CBP code) and the airport. 

In [21]:
df_airports = spark.read.csv("airport-codes_csv.csv", header=True)

In [22]:
df_airports.limit(10).toPandas()

Unnamed: 0,ident,type,name,elevation_ft,continent,iso_country,iso_region,municipality,gps_code,iata_code,local_code,coordinates
0,00A,heliport,Total Rf Heliport,11,,US,US-PA,Bensalem,00A,,00A,"-74.93360137939453, 40.07080078125"
1,00AA,small_airport,Aero B Ranch Airport,3435,,US,US-KS,Leoti,00AA,,00AA,"-101.473911, 38.704022"
2,00AK,small_airport,Lowell Field,450,,US,US-AK,Anchor Point,00AK,,00AK,"-151.695999146, 59.94919968"
3,00AL,small_airport,Epps Airpark,820,,US,US-AL,Harvest,00AL,,00AL,"-86.77030181884766, 34.86479949951172"
4,00AR,closed,Newport Hospital & Clinic Heliport,237,,US,US-AR,Newport,,,,"-91.254898, 35.6087"
5,00AS,small_airport,Fulton Airport,1100,,US,US-OK,Alex,00AS,,00AS,"-97.8180194, 34.9428028"
6,00AZ,small_airport,Cordes Airport,3810,,US,US-AZ,Cordes,00AZ,,00AZ,"-112.16500091552734, 34.305599212646484"
7,00CA,small_airport,Goldstone /Gts/ Airport,3038,,US,US-CA,Barstow,00CA,,00CA,"-116.888000488, 35.350498199499995"
8,00CL,small_airport,Williams Ag Airport,87,,US,US-CA,Biggs,00CL,,00CL,"-121.763427, 39.427188"
9,00CN,heliport,Kitchen Creek Helibase Heliport,3350,,US,US-CA,Pine Valley,00CN,,00CN,"-116.4597417, 32.7273736"


In [23]:
df_airports.printSchema()

root
 |-- ident: string (nullable = true)
 |-- type: string (nullable = true)
 |-- name: string (nullable = true)
 |-- elevation_ft: string (nullable = true)
 |-- continent: string (nullable = true)
 |-- iso_country: string (nullable = true)
 |-- iso_region: string (nullable = true)
 |-- municipality: string (nullable = true)
 |-- gps_code: string (nullable = true)
 |-- iata_code: string (nullable = true)
 |-- local_code: string (nullable = true)
 |-- coordinates: string (nullable = true)



#### National And Official Languages

(Source: https://github.com/OpenBookPrices/country-data)

As I mentioned in the introduction, one of the objectives of the analysis was to provide drivers with the best culture/language fit with the passengers, I combined the information from this source with the i94 data to determine the nationality and language of the passenger.

##### Country

In [24]:
spark.sparkContext.addFile("https://raw.githubusercontent.com/OpenBookPrices/country-data/master/data/countries.csv")

In [25]:
df_countries = spark.read.csv(f"file://{SparkFiles.get('countries.csv')}", header=True)

In [26]:
df_countries.limit(10).toPandas()

Unnamed: 0,name,alpha2,alpha3,ccTLD,countryCallingCodes,currencies,emoji,ioc,languages,status
0,Afghanistan,AF,AFG,,+93,AFN,ðŸ‡¦ðŸ‡«,AFG,pus,assigned
1,Albania,AL,ALB,,+355,ALL,ðŸ‡¦ðŸ‡±,ALB,sqi,assigned
2,Algeria,DZ,DZA,,+213,DZD,ðŸ‡©ðŸ‡¿,ALG,ara,assigned
3,American Samoa,AS,ASM,,+1 684,USD,ðŸ‡¦ðŸ‡¸,ASA,"eng,smo",assigned
4,Andorra,AD,AND,,+376,EUR,ðŸ‡¦ðŸ‡©,AND,cat,assigned
5,Angola,AO,AGO,,+244,AOA,ðŸ‡¦ðŸ‡´,ANG,por,assigned
6,Anguilla,AI,AIA,,+1 264,XCD,ðŸ‡¦ðŸ‡®,,eng,assigned
7,Antarctica,AQ,ATA,,+672,,ðŸ‡¦ðŸ‡¶,,,assigned
8,Antigua And Barbuda,AG,ATG,,+1 268,XCD,ðŸ‡¦ðŸ‡¬,ANT,eng,assigned
9,Argentina,AR,ARG,,+54,ARS,ðŸ‡¦ðŸ‡·,ARG,spa,assigned


In [27]:
df_countries.printSchema()

root
 |-- name: string (nullable = true)
 |-- alpha2: string (nullable = true)
 |-- alpha3: string (nullable = true)
 |-- ccTLD: string (nullable = true)
 |-- countryCallingCodes: string (nullable = true)
 |-- currencies: string (nullable = true)
 |-- emoji: string (nullable = true)
 |-- ioc: string (nullable = true)
 |-- languages: string (nullable = true)
 |-- status: string (nullable = true)



##### Language

In [28]:
spark.sparkContext.addFile("https://raw.githubusercontent.com/OpenBookPrices/country-data/master/data/languages.csv")

In [29]:
df_languages = spark.read.csv(f"file://{SparkFiles.get('languages.csv')}", header=True)

In [30]:
df_languages.limit(10).toPandas()

Unnamed: 0,name,alpha2,alpha3,bibliographic
0,Abkhazian,ab,abk,
1,Achinese,,ace,
2,Acoli,,ach,
3,Adangme,,ada,
4,Adygei,,ady,
5,Adyghe,,ady,
6,Afar,aa,aar,
7,Afrihili,,afh,
8,Afrikaans,af,afr,
9,Afro-Asiatic languages,,afa,


In [31]:
df_languages.printSchema()

root
 |-- name: string (nullable = true)
 |-- alpha2: string (nullable = true)
 |-- alpha3: string (nullable = true)
 |-- bibliographic: string (nullable = true)



## Step 2: Explore and Assess the Data

### I94 Immigration Data

Since the i94 immigration data is spread across multiple files according to their year and month, the first step is to list how many files are available to determine the possible timeframe of the analysis.

In [32]:
i94_data_path = Path("../../data/18-83510-I94-Data-2016")

In [33]:
list(i94_data_path.glob("**/*.sas7bdat"))

[PosixPath('../../data/18-83510-I94-Data-2016/i94_apr16_sub.sas7bdat'),
 PosixPath('../../data/18-83510-I94-Data-2016/i94_sep16_sub.sas7bdat'),
 PosixPath('../../data/18-83510-I94-Data-2016/i94_nov16_sub.sas7bdat'),
 PosixPath('../../data/18-83510-I94-Data-2016/i94_mar16_sub.sas7bdat'),
 PosixPath('../../data/18-83510-I94-Data-2016/i94_jun16_sub.sas7bdat'),
 PosixPath('../../data/18-83510-I94-Data-2016/i94_aug16_sub.sas7bdat'),
 PosixPath('../../data/18-83510-I94-Data-2016/i94_may16_sub.sas7bdat'),
 PosixPath('../../data/18-83510-I94-Data-2016/i94_jan16_sub.sas7bdat'),
 PosixPath('../../data/18-83510-I94-Data-2016/i94_oct16_sub.sas7bdat'),
 PosixPath('../../data/18-83510-I94-Data-2016/i94_jul16_sub.sas7bdat'),
 PosixPath('../../data/18-83510-I94-Data-2016/i94_feb16_sub.sas7bdat'),
 PosixPath('../../data/18-83510-I94-Data-2016/i94_dec16_sub.sas7bdat')]

As we can see, only information from 2016 is available so any other piece of time sensitive data would have to be compatible (or assumed to be). The next step is to analyze the state of the data available for each of the columns of interest. We're only going to analyze one of the files and assume the rest share the same characteristics.

Due to S3 storage restrictions I had to limit the number of files read. I also selected only the relevant columns.

In [34]:
df_i94_data = reduce(
    lambda accum, df: accum.union(df),
    map(
        lambda f: spark.read.format("com.github.saurfang.sas.spark").load(f).select(
            ["i94port", "i94cit", "arrdate", "i94mode", "biryear", "gender"]
        ),
        [
            "../../data/18-83510-I94-Data-2016/i94_apr16_sub.sas7bdat",
            "../../data/18-83510-I94-Data-2016/i94_may16_sub.sas7bdat",
        ]
    )
)

In [35]:
df_i94_data.columns

['i94port', 'i94cit', 'arrdate', 'i94mode', 'biryear', 'gender']

In [36]:
df_i94_data.count()  # Number of rows

6540562

In [37]:
df_i94_data.select([F.count(F.when(F.col("i94mode") == 1, True))]).show()  # Number of rows for Air travel

+--------------------------------------------+
|count(CASE WHEN (i94mode = 1) THEN true END)|
+--------------------------------------------+
|                                     6301766|
+--------------------------------------------+



There was no obvious global identifier for the I94 record, so I relied on a generated one.

### The arrival date

In [38]:
df_i94_data.select("arrdate").limit(5).toPandas()

Unnamed: 0,arrdate
0,20573.0
1,20551.0
2,20545.0
3,20545.0
4,20545.0


As mentioned in the Udacity forums, the "arrdate" represents the arrival date as the number of days since Jan 1st 1960.

In [39]:
df_i94_data.select([F.count(F.when(F.isnan("arrdate") | F.isnull("arrdate"), True))]).show()

+--------------------------------------------------------------------+
|count(CASE WHEN (isnan(arrdate) OR (arrdate IS NULL)) THEN true END)|
+--------------------------------------------------------------------+
|                                                                   0|
+--------------------------------------------------------------------+



In [40]:
df_i94_data.withColumn("arrdate", F.expr("date_add(to_date('1960-01-01'), arrdate)")).select("arrdate").limit(10).show()

+----------+
|   arrdate|
+----------+
|2016-04-29|
|2016-04-07|
|2016-04-01|
|2016-04-01|
|2016-04-01|
|2016-04-01|
|2016-04-01|
|2016-04-01|
|2016-04-01|
|2016-04-01|
+----------+



#### Port of entry

The port of entry identification required the combination of multiple sources:

* The `i94port` field in the i94 data set.
* The CBP codes table.
* The airport codes table.

Since there was not a direct link between the port and the airport, I used the CBP code to determine a municipality, and from there I chose an airport based on this information. The first step is to split the locaiton from the CBP codes table to make it easier to join with the airports. 

In [41]:
df_cbp_codes = df_cbp_codes\
    .withColumn("municipality", F.udf(lambda s: s.split(",")[0].strip() if "," in s else None)(F.col("location")))\
    .withColumn("state", F.udf(lambda s: s.split(",")[1].strip() if "," in s else None)(F.col("location")))

In [42]:
df_cbp_codes.limit(5).toPandas()

Unnamed: 0,code,location,municipality,state
0,ABE,"Aberdeen, WA",Aberdeen,WA
1,ABG,"Alburg, VT",Alburg,VT
2,ABQ,"Albuquerque, NM",Albuquerque,NM
3,ABS,"Alburg Springs, VT",Alburg Springs,VT
4,ADT,"Amistad Dam, TX",Amistad Dam,TX


I was only interested in the international US airports, and I neeed to split the iso region to get US state, that was going to be used to link to a CBP code. Since there were multiple airports per municipality, I chose the first one. This is not entirely correct, but I couldn't find a definitive source of information that links the i94 information with a specific airport.

In [43]:
municipality_state_window = Window.partitionBy("state", "municipality").orderBy(F.asc("ident"))

In [44]:
df_airports = df_airports\
    .filter(F.col("iso_country") == "US")\
    .filter(F.col("type") == "large_airport")\
    .filter(F.col("iata_code").isNotNull())\
    .withColumn("state", F.udf(lambda s: re.sub(r"US-", "", s))(F.col("iso_region")))\
    .withColumn("row_number", F.row_number().over(municipality_state_window))\
    .filter(F.col("row_number") == 1)\
    .drop("row_number")

In [45]:
df_airports.limit(5).toPandas()

Unnamed: 0,ident,type,name,elevation_ft,continent,iso_country,iso_region,municipality,gps_code,iata_code,local_code,coordinates,state
0,KIND,large_airport,Indianapolis International Airport,797,,US,US-IN,Indianapolis,KIND,IND,IND,"-86.294403, 39.7173",IN
1,KMCI,large_airport,Kansas City International Airport,1026,,US,US-MO,Kansas City,KMCI,MCI,MCI,"-94.713898, 39.2976",MO
2,KMCO,large_airport,Orlando International Airport,96,,US,US-FL,Orlando,KMCO,MCO,MCO,"-81.30899810791016, 28.429399490356445",FL
3,KERI,large_airport,Erie International Tom Ridge Field,732,,US,US-PA,Erie,KERI,ERI,ERI,"-80.1738667488, 42.0831270134",PA
4,KMGM,large_airport,Montgomery Regional (Dannelly Field) Airport,221,,US,US-AL,Montgomery,KMGM,MGM,MGM,"-86.39399719, 32.30059814",AL


In [46]:
df_i94_data.select([F.count(F.when((F.col("i94port") == "") | F.isnull("i94port"), True))]).show()

+------------------------------------------------------------------+
|count(CASE WHEN ((i94port = ) OR (i94port IS NULL)) THEN true END)|
+------------------------------------------------------------------+
|                                                                 0|
+------------------------------------------------------------------+



There were no missing entries for the port of entry (at least from the data perspective). The next step was to get the most common ports of entry to determine whether we can use the information (combined with other sources of data) or not.

In [47]:
df_i94_data.createOrReplaceTempView("i94_data")

In [48]:
spark.sql("""\
SELECT i94port, COUNT(*)
FROM i94_data
WHERE i94mode == 1.0
GROUP BY 1
ORDER BY 2 DESC
LIMIT 50
""").show()

+-------+--------+
|i94port|count(1)|
+-------+--------+
|    NYC| 1107406|
|    MIA|  725672|
|    LOS|  671710|
|    SFR|  355210|
|    HHW|  311961|
|    CHI|  305348|
|    NEW|  304263|
|    ORL|  292313|
|    HOU|  214967|
|    ATL|  198455|
|    AGA|  181375|
|    WAS|  176069|
|    LVG|  164176|
|    FTL|  163759|
|    DAL|  160198|
|    BOS|  140514|
|    SEA|   99822|
|    DET|   74190|
|    SAI|   56093|
|    PHI|   51867|
+-------+--------+
only showing top 20 rows



In order for this information to be useful I needed to translate the `i94port` codes into something we can use to identify the port. This information is provided in the `I94_SAS_Labels_Descriptions.SAS` file, but I've decided to use a clear source of CBP codes mentioned in the first section.

In [49]:
df_cbp_codes.createOrReplaceTempView("df_cbp_codes")

In [50]:
spark.sql("""\
SELECT d.i94port, c.code, c.municipality, c.state
FROM i94_data d
JOIN df_cbp_codes c ON d.i94port = c.code
WHERE d.i94mode == 1.0
LIMIT 10
""").show()

+-------+----+------------+-----+
|i94port|code|municipality|state|
+-------+----+------------+-----+
|    BGM| BGM|      Bangor|   ME|
|    BGM| BGM|      Bangor|   ME|
|    BGM| BGM|      Bangor|   ME|
|    BGM| BGM|      Bangor|   ME|
|    BGM| BGM|      Bangor|   ME|
|    BGM| BGM|      Bangor|   ME|
|    BGM| BGM|      Bangor|   ME|
|    BGM| BGM|      Bangor|   ME|
|    BGM| BGM|      Bangor|   ME|
|    BGM| BGM|      Bangor|   ME|
+-------+----+------------+-----+



This was still not very useful as we're interested in international airports. Sadly, the i94 data does not include airport identification codes, so I made the assumption that the entry (at least over air) was made on a large airport. With this information in mind, I could combine the i94port with the airport codes using the municipality and state as identifiers.

In [51]:
df_airports.createOrReplaceTempView("df_airports")

In [52]:
spark.sql("""\
SELECT d.i94port, c.code, c.municipality, c.state, a.name
FROM i94_data d
JOIN df_cbp_codes c ON d.i94port = c.code
JOIN df_airports A ON c.municipality = a.municipality AND c.state = a.state
WHERE d.i94mode == 1.0
LIMIT 10
""").toPandas()

Unnamed: 0,i94port,code,municipality,state,name
0,BGM,BGM,Bangor,ME,Bangor International Airport
1,BGM,BGM,Bangor,ME,Bangor International Airport
2,BGM,BGM,Bangor,ME,Bangor International Airport
3,BGM,BGM,Bangor,ME,Bangor International Airport
4,BGM,BGM,Bangor,ME,Bangor International Airport
5,BGM,BGM,Bangor,ME,Bangor International Airport
6,BGM,BGM,Bangor,ME,Bangor International Airport
7,BGM,BGM,Bangor,ME,Bangor International Airport
8,BGM,BGM,Bangor,ME,Bangor International Airport
9,BGM,BGM,Bangor,ME,Bangor International Airport


At this point I had a table of i94 records of air travellers linked to a specific airport.

#### Country of origin

I needed to map an entry record with their country and (hopefully) their native language. I used the i94 data set `i94cit` column combined with the information in the description file about the field (stored in `df_i94cntyl`) and the language information. I could have just as easily used the `i94res` column as the source.

In [53]:
df_i94_data.select([F.count(F.when((F.col("i94cit") == "") | F.isnull("i94cit"), True))]).show()

+----------------------------------------------------------------+
|count(CASE WHEN ((i94cit = ) OR (i94cit IS NULL)) THEN true END)|
+----------------------------------------------------------------+
|                                                               0|
+----------------------------------------------------------------+



No missing data on on the `i94cit` column. So I could safely join with the `df_i94cntyl` and `df_countries` tables.

In [54]:
df_i94cntyl.createOrReplaceTempView("df_i94cntyl")

In [55]:
df_countries.createOrReplaceTempView("df_countries")

In [56]:
spark.sql("""\
SELECT d.i94cit, c1.description, c2.languages
FROM i94_data d
LEFT JOIN df_i94cntyl c1 ON d.i94cit = c1.value
LEFT JOIN df_countries c2 ON LOWER(c1.description) = LOWER(c2.name)
WHERE d.i94mode == 1.0
LIMIT 10
""").toPandas()

Unnamed: 0,i94cit,description,languages
0,254.0,,
1,101.0,ALBANIA,sqi
2,101.0,ALBANIA,sqi
3,101.0,ALBANIA,sqi
4,101.0,ALBANIA,sqi
5,101.0,ALBANIA,sqi
6,101.0,ALBANIA,sqi
7,101.0,ALBANIA,sqi
8,101.0,ALBANIA,sqi
9,101.0,ALBANIA,sqi


There were entries without country information, so I had to create a specific entry in the dimension tables for those cases.

#### Age

I used `biryear` to determine the age of the passenger.

In [57]:
df_i94_data.select([F.count(F.when(F.isnan("biryear") | F.isnull("biryear"), True))]).show()

+--------------------------------------------------------------------+
|count(CASE WHEN (isnan(biryear) OR (biryear IS NULL)) THEN true END)|
+--------------------------------------------------------------------+
|                                                                1406|
+--------------------------------------------------------------------+



There are missing entries, so I had to take this into account when designing the model. The goal was to assign a passenger to a "bucket" based on their age, so the transport company can better accomodate their needs.

* Child (`CH`): Less than 12 years old (age < 12)
* Teenager (`TE`): Between 12 and 18 years old (12 <= age < 18)
* Adult  (`AD`): Between 18 and 65 years old (18 <= 65)
* Older Adult (`OA`): Over 65 years old (65 < age)
* Unknown (`UN`)

#### Gender

I used `gender` to determine the gender of the passenger.

In [58]:
df_i94_data.select([F.count(F.when((F.col("gender") == "") | F.isnull("gender"), True))]).show()

+----------------------------------------------------------------+
|count(CASE WHEN ((gender = ) OR (gender IS NULL)) THEN true END)|
+----------------------------------------------------------------+
|                                                          916875|
+----------------------------------------------------------------+



In [59]:
df_i94_data.select("gender").distinct().show()

+------+
|gender|
+------+
|     F|
|  null|
|     M|
|     U|
|     X|
+------+



There are a lot of records with missing gender information, we can't trust this value entirely. I couldn't find a defitinion for the gender values, so I assumed the following:

* M: Male
* F: Female
* U: Unknown
* X: Other
* null: Missing value

#### World Temperature Data

The first thing I need to do is to check whether we have data available for 2016 as that's the only year available for the immigration data. Before I can do that, I need to convert the date column 

In [60]:
df_temperature = df_temperature.withColumn("dt", F.expr("to_date(dt)"))

In [61]:
df_temperature.limit(5).toPandas()

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
0,1743-11-01,6.068,1.737,Ã…rhus,Denmark,57.05N,10.33E
1,1743-12-01,,,Ã…rhus,Denmark,57.05N,10.33E
2,1744-01-01,,,Ã…rhus,Denmark,57.05N,10.33E
3,1744-02-01,,,Ã…rhus,Denmark,57.05N,10.33E
4,1744-03-01,,,Ã…rhus,Denmark,57.05N,10.33E


In [62]:
df_temperature.printSchema()

root
 |-- dt: date (nullable = true)
 |-- AverageTemperature: string (nullable = true)
 |-- AverageTemperatureUncertainty: string (nullable = true)
 |-- City: string (nullable = true)
 |-- Country: string (nullable = true)
 |-- Latitude: string (nullable = true)
 |-- Longitude: string (nullable = true)



In [63]:
df_temperature.select([
    F.count(
        F.when(
            (F.col("dt") >= date(2016, 1, 1)) & (F.col("dt") < date(2017, 1, 1)),
            True
        )
    )
]).show()

+---------------------------------------------------------------------------------------+
|count(CASE WHEN ((dt >= DATE '2016-01-01') AND (dt < DATE '2017-01-01')) THEN true END)|
+---------------------------------------------------------------------------------------+
|                                                                                      0|
+---------------------------------------------------------------------------------------+



Since there was no data available for the specific time period that we're looking for, we're going to take the weekly average of the last five years available, for cities within the US. This is far from ideal, but I wanted to make us of the provided data.

In [64]:
max_year = df_temperature.select([
    F.max(
        F.year(F.col("dt"))
    ).alias('max_year')
]).collect()[0].max_year
print(max_year)

2013


In [65]:
df_temperature\
    .where(F.col("Country") == "United States")\
    .where((F.year(F.col("dt")) > max_year - 5) & (F.year(F.col("dt")) <= max_year))\
    .groupBy([F.col("City"), F.weekofyear(F.col("dt")).alias("WeekOfYear")]).agg(F.mean("AverageTemperature").alias("AverageTemperature"))\
    .orderBy("City", "WeekOfYear").show()

+-------+----------+------------------+
|   City|WeekOfYear|AverageTemperature|
+-------+----------+------------------+
|Abilene|         1| 6.333500000000001|
|Abilene|         5|              7.81|
|Abilene|         9|13.662799999999999|
|Abilene|        13|19.519666666666666|
|Abilene|        14|           16.5495|
|Abilene|        17|           22.3795|
|Abilene|        18|            22.968|
|Abilene|        22|          28.87575|
|Abilene|        23|27.691999999999997|
|Abilene|        26|            29.511|
|Abilene|        27|27.887999999999998|
|Abilene|        30|29.221999999999998|
|Abilene|        31|           29.5425|
|Abilene|        35|          24.82625|
|Abilene|        36|            22.424|
|Abilene|        39|             18.38|
|Abilene|        40|              16.1|
|Abilene|        44|           12.6795|
|Abilene|        48| 6.989333333333334|
|Abilene|        49|             3.908|
+-------+----------+------------------+
only showing top 20 rows



I decided to categorized the temperature as follows:

* Very Cold (`VC`): average temperature < 5
* Cold (`CO`): 5 <= average temperature < 15
* Mild (`MI`): 15 <= average temperature < 25
* Hot (`HO`): 25 <= average temperature < 35
* Very Hot (`VH`): 35 <= average temperature
* Unknown (`UN`)

I had to include an `Unknown` category as there were a gaps in the data set. Norice that there's no state associated with the city, so I assumed the city names to be unique.

#### UNData - National accounts

I'm only interested in the gdp per capita, for each country for the year "2017" (the closest to the available i94 data set).

In [66]:
df_national_accounts.limit(5).toPandas()

Unnamed: 0,Region/Country/Area,_c1,Year,Series,Value,Footnotes,Source
0,1,"Total, all countries or areas",1995,GDP in current prices (millions of US dollars),31140783,,"United Nations Statistics Division, New York, ..."
1,1,"Total, all countries or areas",2005,GDP in current prices (millions of US dollars),47623151,,"United Nations Statistics Division, New York, ..."
2,1,"Total, all countries or areas",2010,GDP in current prices (millions of US dollars),66272559,,"United Nations Statistics Division, New York, ..."
3,1,"Total, all countries or areas",2015,GDP in current prices (millions of US dollars),74985744,,"United Nations Statistics Division, New York, ..."
4,1,"Total, all countries or areas",2017,GDP in current prices (millions of US dollars),81056929,,"United Nations Statistics Division, New York, ..."


In [67]:
df_national_accounts = df_national_accounts\
    .where(~F.col("Region/Country/Area").isin(1, 2, 15, 202, 14, 17, 18, 11, 19, 21, 419, 29, 13, 5, 142, 143, 30, 35, 34, 145, 150, 151, 154, 39, 155, 9, 53, 54, 57, 61))\
    .where(F.col("Series") == "GDP per capita (US dollars)")\
    .where(F.col("Year") == 2017)\
    .select([
        F.col("_c1").alias("country"),
        F.udf(lambda s: s.replace(",", ""))(F.col("Value")).cast("double").alias("gdp_per_capita")
    ])

In [68]:
df_national_accounts.limit(5).toPandas()

Unnamed: 0,country,gdp_per_capita
0,Afghanistan,513.0
1,Albania,4514.0
2,Algeria,4110.0
3,Andorra,38963.0
4,Angola,4096.0


In [69]:
df_national_accounts.printSchema()

root
 |-- country: string (nullable = true)
 |-- gdp_per_capita: double (nullable = true)



In [70]:
df_national_accounts.count()

212

In [71]:
df_national_accounts.select([F.count(F.when(F.isnan("gdp_per_capita") | F.isnull("gdp_per_capita"), True))]).show()

+----------------------------------------------------------------------------------+
|count(CASE WHEN (isnan(gdp_per_capita) OR (gdp_per_capita IS NULL)) THEN true END)|
+----------------------------------------------------------------------------------+
|                                                                                 0|
+----------------------------------------------------------------------------------+



## Step 3: Define the Data Model

I used [dbdiagram.io](https://dbdiagram.io) to generate the following warehouse DB schema diagram.

In [72]:
Image(url= "./images/Database Schema.png", width=800, height=800)

## dimDate

A dimension to represent a date (a specific point in time with daily granularity).

| Column       | Type | PK | Description |
|--------------|------|:--:|-------------|
| date_id      | int  | Y  | An integer with format `YYYYMMDD` that identifies a record of the dimension |
| year         | int  |    | Year |
| month        | int  |    | Month |
| day          | int  |    | Day of month |
| day_of_week  | int  |    | An integer between 0 (Sunday) and 6 that represents the day of the week |
| week_of_year | int  |    | Week of the year |

This table will be generated without using any data sources.

```python
end = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0) + timedelta(days=30)
start = end - timedelta(days=3660)

dim_date_df = spark.sql(
    f"SELECT SEQUENCE({int(start.timestamp())}, {int(end.timestamp())}, 86400) AS date_timestamp")\
    .withColumn("date_timestamp", F.explode("date_timestamp"))\
    .select(F.to_date(F.col("date_timestamp").cast("timestamp")).alias("date"))\
    .withColumn("date_id", F.date_format(F.col("date"), "YYYYMMDD").cast("int"))\
    .withColumn("year", F.year(F.col("date")).cast("short"))\
    .withColumn("month", F.month(F.col("date")).cast("short"))\
    .withColumn("day", F.dayofmonth(F.col("date")).cast("short"))\
    .withColumn("day_of_week", F.dayofweek(F.col("date")).cast("short"))\
    .withColumn("week_of_year", F.weekofyear(F.col("date")).cast("short"))\
    .drop("date")\
    .orderBy("date_id")
```

## dimCountry

A dimension to represent the country of origin of the passenger.

| Column         | Type      | PK | Description |
|----------------|-----------|:--:|-------------|
| country_id     | char      | Y  | The ISO identifier of the country |
| name           | varchar   |    | Name of the country |
| languages      | varchar   |    | A list of languages spoken |
| gdp_per_capita | double    |    | GDP per capita |

This table is generated combining the dataframes with information from the contries, their languages and the economic data.

```python
dim_country_df = countries_df.select(["alpha3", "name", "languages"])\
    .withColumnRenamed("alpha3", "country_id")\
    .withColumn("languages", F.explode(F.col("languages")))\
    .join(
        languages_df,
        F.col("languages") == languages_df.alpha3,
        "left"
    ).select(
        "country_id",
        countries_df.name,
        languages_df.name.alias("languages")
    ).groupBy("country_id").agg(
        F.first("name").alias("name"),
        F.collect_list("languages").alias("languages")
    ).join(
        national_accounts_df,
        F.lower(F.col("name")) == F.lower(national_accounts_df.country),
        "left"
    ).withColumn("gdp_per_capita", F.coalesce("gdp_per_capita", F.lit(0)))\
    .withColumn("languages", F.udf(lambda languages: ",".join(languages) if languages else "Unknown")(F.col("languages")))\
    .drop("country")
```

## dimAirpoirt

A dimension to represent the airport on which the passenger arrived.

| Column       | Type      | PK | Description |
|--------------|-----------|:--:|-------------|
| airport_id   | char      | Y  | The 4 character airport identifier |
| name         | varchar   |    | Name of the airport |
| municipality | varchar   |    | Municipality/city |
| state        | char(2)   |    | State |

This table is constructed just using the airport information with minimal changes:

```python
airports_df = spark.read.csv(AIRPORT_CODES_FILE, header=True)\
    .select(["ident", "name", "iso_country", "iso_region", "municipality", "type", "iata_code"])\
    .filter(F.col("iso_country") == "US")\
    .filter(F.col("type") == "large_airport")\
    .filter(F.col("iata_code").isNotNull())\
    .withColumn("state", F.udf(lambda s: re.sub(r"US-", "", s))(F.col("iso_region")))\
    .withColumn("row_number", F.row_number().over(municipality_state_window))\
    .filter(F.col("row_number") == 1)\
    .withColumnRenamed("ident", "airport_id")\
    .drop("iata_code", "iso_country", "iso_region", "row_number", "type")
```

## factIngress

This fact table represent the ingress into the United States of a person.

| Column             | Type     | PK | Description |
|--------------------|----------|:--:|-------------|
| ingress_id         | char(36) | Y  | Unique identifier for the fact record |
| date_id            | int      |    | Date of ingress |
| country_id         | char(2)  |    | Country of origin |
| airport_id         | char(4)  |    | Airport |
| gender             | char     |    | Gender. Possible values:<br><ul><li>`M` Male</li><li>`F` Female</li><li>`X` Other</li><li>`U` Unknown/Missing</li></ul> |
| age_bucket         | char(2)  |    | Age bucket. Possible values:<br><ul><li>`CH` Child - Age under 12</li><li>`TE` Teenager - Age between 12 and 18</li><li>`AD` Adult - Age between 18 and 65.</li><li>`OA` Older Adult - Age over 65</li><li>`UN` Unknown</li></ul> |
| temperature_bucket | char(2)  |    | Average weekly temperature bucket. Possible values: <br><ul><li>`VC` Very cold - Temperature below 5C</li><li>`CO` Cold - Temperature between 5C and 15C</li><li>`MI` Mild - Temperature between 15C and 25C</li><li>`HO` Hot - Temperature between 25C and 35C</li><li>`VH` Very Hot - Temperature over 35C</li><li>`UN` Unknown/Missing</li></ul> |

This is the most complex table to generate, as it incorporates information from most of the available dataframes:

```python
fact_ingress_df = i94_data_df.join(
        cbp_codes_df,
        F.col("i94port") == cbp_codes_df.code
    ).join(
        airports_df,
        (cbp_codes_df.state == airports_df.state) & (cbp_codes_df.municipality == airports_df.municipality) 
    ).join(
        temperatures_df,
        (F.lower(cbp_codes_df.municipality) == F.lower(temperatures_df.city))
        & (F.weekofyear(F.col("arrdate")) == temperatures_df.week_of_year),
        "left"
    ).withColumn("temperature_bucket", F.coalesce(temperatures_df.temperature_bucket, F.lit("UN")))\
    .join(
        i94cntyl_df,
        F.col("i94cit") == i94cntyl_df.value
    ).join(
        countries_df,
        F.lower(i94cntyl_df.description) == F.lower(countries_df.name)
    ).withColumn("ingress_id", F.udf(lambda: str(uuid.uuid4()))())\
    .select(
        "ingress_id",
        "date_id",
        countries_df.alpha3.alias("country_id"),
        airports_df.airport_id.alias("airport_id"),
        "gender",
        "age_bucket",
        "temperature_bucket"
    )
```

## Step 4: Run ETL to Model the Data

## Introduction

As I mentioned in the first section, the main factor behid the technical decisions on the ETL process was cost efficiency. The second factor was data availability. My main source of information is the I94 data, which is not publicly available and I'm not sure if I'm allowed to copy o transfer the information outside of the context of Udacity's cloud.

These are the steps involved in the ETL process (a more detailed explanation will be embedded in the Python scripts):

1. Ensure the runtime environment is configured as expected:
  * Verify AWS basic configuration.
  * Verify access to target S3 bucket, create it if needed.
  * Verify access to Redshift cluster, create it if needed.
  * Ensure the warehouse schema exists, create it if needed.
1. Fetch all the information that is not available in the workspace and doesn't have straight way of consuming the data directly through an API.
1. With Spark, load the information into DataFrames, filtering, adapting and or transforming approprietly for each case.
1. Perform minor data health checks on the loaded DataFrames.
1. Construct the dimension and fact tables from the source material.
1. Perform minor data health checks on the generated tables.
1. Save the generated tables as parquet files using the appropriate modes and partitions.
1. Upload the parquet files onto S3.
1. Copy the information from the dimension tables parquet files into the warehouse dimension tables.
1. Perform data health checks on the dimension tables.
1. Copy the information from the fact tables parquet files into the warehouse fact tables.
1. Perform data health checks on the fact tables.

Theses steps should ensure that the cost incurred on my AWS account is minimal. The extra health checks are there to halt the process before any AWS services are consumed.

## Running the scripts

Due to the restrictions in accessing the i94 data set, the ETL scripts need to be run within the context of a Udacity workspace (or an equivalent environment). The user needs to set the appropriate values on `etl.cfg`:

```
[AWS]
region = us-west-2
aws_access_key_id = 
aws_secret_access_key = 

[S3]
output_bucket = nd027-capstone-project

[CLUSTER]
cluster_type = single-node
num_nodes = 4
node_type = dc2.large
identifier = capstone-project-cluster
host = 
db_name = capstone-project
db_user = etluser
db_password = 
db_port = 5439

[IAM_ROLE]
role_name = EtlRole
arn = 
```

The mandatory fields are `aws_access_key_id`, `aws_secret_access_key`, and `db_password`. Once everything is in place the user needs to run:

```
python etl_prepare.py
```

Which should generate an output similar to:

```
root@968a6805982f:/home/workspace# python etl_prepare.py 
Validating configuration...
Preparing runtime environment...
Updating IAM configuration.
Creating cluster 'capstone-project-cluster'...
Waiting for cluster 'capstone-project-cluster' to become available...
{'ClusterIdentifier': 'capstone-project-cluster', 'NodeType': 'dc2.large', 'ClusterStatus': 'available', 'MasterUsername': 'etluser', 'DBName': 'capstone-project', 'Endpoint': {'Address': 'capstone-project-cluster.c3ypjoewlvec.us-west-2.redshift.amazonaws.com', 'Port': 5439}, 'ClusterCreateTime': datetime.datetime(2021, 12, 21, 10, 49, 15, 229000, tzinfo=tzlocal()), 'AutomatedSnapshotRetentionPeriod': 1, 'ClusterSecurityGroups': [], 'VpcSecurityGroups': [{'VpcSecurityGroupId': 'sg-0549c4695373f931b', 'Status': 'active'}], 'ClusterParameterGroups': [{'ParameterGroupName': 'default.redshift-1.0', 'ParameterApplyStatus': 'in-sync'}], 'ClusterSubnetGroupName': 'default', 'VpcId': 'vpc-0ade1658cab5cef52', 'AvailabilityZone': 'us-west-2d', 'PreferredMaintenanceWindow': 'fri:11:30-fri:12:00', 'PendingModifiedValues': {}, 'ClusterVersion': '1.0', 'AllowVersionUpgrade': True, 'NumberOfNodes': 1, 'PubliclyAccessible': True, 'Encrypted': False, 'ClusterPublicKey': 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCgxdOqd6dAmYtB3v7Y9wbvdF41Wwy95qkJtIebEHXyoYA11rw75oPGFTccOB6ROHITuhkHhu5qV99u40ruLJoEqiqOu9e0qziEsPG3PCuJQzX3SALTA3OlEYOUMEFnfOdfvuAWHdURb26BWRhKf6gUDZIUWopVk50KeteaDiasTTfY7Q90F3AHjw6X6tf/CMxftJs5cIxUWAhM2B8yFFFyQ73p7MqwOUsGqaZ3o0nJZhqnrsxoHbPtpgvFumdnoX1JxCgtSAqHJiz7fSgq5xxjhX0x1sf/wqDzUaZOCpqc+DWqEmYEECk72Ole7k6549Ddpye+8XXrtDBT5rPUIWJx Amazon-Redshift\n', 'ClusterNodes': [{'NodeRole': 'SHARED', 'PrivateIPAddress': '172.31.48.153', 'PublicIPAddress': '54.212.105.106'}], 'ClusterRevisionNumber': '34272', 'Tags': [], 'EnhancedVpcRouting': False, 'IamRoles': [{'IamRoleArn': 'arn:aws:iam::239473144879:role/EtlRole', 'ApplyStatus': 'in-sync'}], 'MaintenanceTrackName': 'current'}
Updating security group to allow external access to cluster 'capstone-project-cluster'.
Updating Redshift configuration.
Creating dimenension and tables...
Done.
```

To create the necesary components on AWS, and update `etl.cfg` to match the runtime configuration. This takes a while as it needs to wait until the redshift cluster is available. Once it finishes, the ETL process is run with:

```
python etl_run.py
```

Which should generate an output similar to

```
root@968a6805982f:/home/workspace# python etl_run.py 
Validating configuration...
Checking runtime environment...
Running ETL process environment...
https://repos.spark-packages.org/ added as a remote repository with the name: repo-1
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark-2.4.3-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
saurfang#spark-sas7bdat added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-521f8a68-b9a0-4dbe-9e82-c279329fb7a7;1.0
        confs: [default]
        found saurfang#spark-sas7bdat;2.0.0-s_2.11 in repo-1
        found com.epam#parso;2.0.8 in central
        found org.slf4j#slf4j-api;1.7.5 in central
        found org.apache.logging.log4j#log4j-api-scala_2.11;2.7 in central
        found org.scala-lang#scala-reflect;2.11.8 in central
:: resolution report :: resolve 708ms :: artifacts dl 27ms
        :: modules in use:
        com.epam#parso;2.0.8 from central in [default]
        org.apache.logging.log4j#log4j-api-scala_2.11;2.7 from central in [default]
        org.scala-lang#scala-reflect;2.11.8 from central in [default]
        org.slf4j#slf4j-api;1.7.5 from central in [default]
        saurfang#spark-sas7bdat;2.0.0-s_2.11 from repo-1 in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   5   |   0   |   0   |   0   ||   5   |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-521f8a68-b9a0-4dbe-9e82-c279329fb7a7
        confs: [default]
        0 artifacts copied, 5 already retrieved (0kB/23ms)
21/12/21 11:03:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Generating staging PySpark dataframes...
Validating staging PySpark dataframes...                                                                                                                                                                                            
21/12/21 11:03:53 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.                                      
Generating dimension and fact tables parquet files...                                                                                                                                                                               
21/12/21 11:05:07 WARN CSVDataSource: CSV header does not conform to the schema.                                                                                                                                                    
 Header: Region/Country/Area, , Year, Series
 Schema: Region/Country/Area, _c1, Year, Series
Expected: _c1 but found: 
CSV file: file:///home/workspace/SYB64_230_202110_GDP_and_GDP_Per_Capita.csv
21/12/21 11:05:11 WARN CSVDataSource: CSV header does not conform to the schema.                                                                                                                                                    
 Header: Region/Country/Area, , Year, Series, Value
 Schema: Region/Country/Area, _c1, Year, Series, Value
Expected: _c1 but found: 
CSV file: file:///home/workspace/SYB64_230_202110_GDP_and_GDP_Per_Capita.csv
Uploading dimension and fact tables to S3...
Uploading 'parquet_files/dim_country/part-00129-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00041-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00192-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00005-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00103-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00083-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00128-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00096-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00161-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00059-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00194-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00065-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/_SUCCESS' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00147-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00117-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00002-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00028-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00135-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00182-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00089-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00099-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00082-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00119-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00107-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00140-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00159-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00168-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00058-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00175-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00163-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00110-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00026-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00088-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00013-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00095-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00183-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00086-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00143-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00038-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00130-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00170-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00030-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00137-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00154-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00191-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00071-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00094-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00155-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00166-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00104-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00171-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00185-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00033-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00053-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00179-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00034-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00116-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00067-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00190-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00087-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00069-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00172-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00008-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00019-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00077-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00073-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00133-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00156-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00072-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00152-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00113-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00180-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00039-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00126-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00132-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00031-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00153-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00092-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00196-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00091-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00001-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00162-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00050-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00138-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00108-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00049-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00109-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00079-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00136-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00076-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00056-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00165-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00112-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00134-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00097-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00051-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00036-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00046-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00007-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00016-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00176-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00045-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00122-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00081-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00193-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00157-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00189-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00074-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00004-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00098-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00198-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00015-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00125-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00048-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00186-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00006-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00012-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00055-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00070-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00100-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00148-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00029-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00027-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00145-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00111-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00057-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00164-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00044-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00101-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00025-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00142-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00149-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00146-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00118-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00127-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00062-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00032-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00187-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00158-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00000-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00068-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_country/part-00035-8e7ab829-bcc0-4a19-a02d-340391d2139b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00052-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00151-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00053-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00117-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00051-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00156-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00080-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00055-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00042-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/_SUCCESS' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00071-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00110-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00027-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00000-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00118-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00073-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00149-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00085-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00176-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00190-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00087-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00061-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00054-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00056-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00074-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00098-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00023-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00044-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00180-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00159-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00091-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00028-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00022-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00181-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00138-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00103-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00198-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00108-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00168-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00024-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00057-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00170-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00029-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00189-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00191-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00020-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00002-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00047-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00119-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00038-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00160-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00021-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00030-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00116-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00167-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00178-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00193-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00130-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00161-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00078-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00195-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00048-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00150-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00145-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00089-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00125-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00139-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00079-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00025-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00034-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00094-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00124-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00109-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00135-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00120-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00100-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00011-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00037-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00162-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00155-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00001-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00104-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00107-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00035-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00183-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00185-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00046-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00106-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00013-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00184-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00097-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00143-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00196-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00090-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00067-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00172-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00131-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00083-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00126-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00014-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00064-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00134-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00175-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00187-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00007-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00099-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00153-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00049-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00140-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00065-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00092-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00084-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00165-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00010-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00157-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00188-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00018-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00072-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00164-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00177-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00114-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00093-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00173-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00182-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00033-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00096-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00158-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00004-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00179-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00009-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00105-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00070-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00122-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00154-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00128-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00015-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00141-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00043-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00062-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00132-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00050-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00121-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00197-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00003-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00068-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/fact_ingress/part-00008-ba695105-2375-479e-96fc-4a31fb0edea4-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00084-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00122-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00188-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00135-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00115-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00131-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00041-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00161-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00012-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00160-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00022-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00117-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00093-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00046-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00173-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00129-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00073-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00107-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00062-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00108-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/_SUCCESS' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00137-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00127-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00158-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00026-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00015-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00189-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00186-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00168-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00056-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00144-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00139-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00049-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00065-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00166-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00167-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00195-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00010-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00153-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00083-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00054-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00198-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00019-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00130-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00086-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00181-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00102-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00159-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00121-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00048-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00197-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00082-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00068-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00036-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00169-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00105-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00025-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00177-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00007-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00199-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00170-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00157-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00109-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00097-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00076-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00044-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00133-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00094-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00162-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00148-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00191-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00087-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00058-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00141-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00003-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00174-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00128-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00053-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00064-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00051-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00008-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00089-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00085-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00164-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00050-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00060-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00070-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00031-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00088-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00030-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00071-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00179-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00114-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00059-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00001-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00079-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00021-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00098-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00192-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00075-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00045-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00113-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00136-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00185-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00069-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00091-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00123-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00134-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00178-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00032-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00040-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00182-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00119-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00156-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00118-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00018-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00020-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00017-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00011-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00147-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00180-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00055-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00101-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00096-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00039-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00002-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00004-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00138-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00116-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00067-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00163-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00077-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00112-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00078-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00033-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00125-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00013-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00183-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00124-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00090-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00024-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00029-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00154-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00126-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00145-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00057-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00000-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00132-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00047-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00006-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00042-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00035-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00152-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00016-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00063-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00028-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00140-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00005-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00080-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00100-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00149-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00190-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00150-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00034-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00023-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00176-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00194-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00043-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00081-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00009-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00142-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00037-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00196-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00171-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00038-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00120-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00187-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00095-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00175-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00103-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00184-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00111-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00061-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00066-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00104-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00110-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00143-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00072-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00052-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00092-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00099-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00014-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00027-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00151-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00193-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00146-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00172-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00155-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00106-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00074-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_date/part-00165-a98356b6-62cd-4e63-aba6-9da6272ef35b-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00150-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00181-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00112-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00175-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00053-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00034-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00197-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00157-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/_SUCCESS' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00057-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00035-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00110-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00061-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00162-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00078-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00152-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00024-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00193-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00064-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00166-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00115-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00081-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00092-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00158-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00106-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00188-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00086-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00151-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00075-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00012-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00028-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00089-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00084-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00054-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00184-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00187-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00044-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00155-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00037-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00134-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00090-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00113-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00008-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00141-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00124-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00055-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00021-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00145-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00062-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00104-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00014-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00003-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00126-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00041-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00194-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00083-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00125-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00149-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00105-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00142-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00109-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00056-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00013-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00163-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00066-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00172-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00009-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00131-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00198-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00147-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00079-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00127-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00022-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00011-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00047-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00159-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00190-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00099-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00199-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00077-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00065-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00135-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00168-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00129-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00031-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00137-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00070-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00060-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00120-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00179-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00132-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00195-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00026-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00156-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00042-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00000-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00029-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00114-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00016-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00052-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00186-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00130-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00059-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00100-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00085-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00154-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Uploading 'parquet_files/dim_airport/part-00039-871a4d47-a25e-42e5-82d9-3726e29f78d5-c000.snappy.parquet' to 's3.Bucket(name='nd027-capstone-project')'
Truncating dimension tables...
Loading dimension tables...
Loading fact tables...
Validating dimansion and fact tables...
Done.
```

## Step 5: Complete Project Write Up

## Introduction

The previous sections described the journey that took me towards this conclusion. The main driving force behind most decisions were cost saving and data availability, hence the artificial limitiations.

One of the major limitations was with the accessibility of the I94 data. I couldn't find any publicly available sources for it, so I was restricted to whatever was available through the Udacitiy's workspace. If a public source were available, I could have loaded that information directly into Redshift, saving a lot time in the process.

This fact also limited the tools available, as the only reasoable tool to manage the date was the Spark instance running within the Udacity workspace. Of course I could have spun up an EMR and loaded the information there, but that would have increased the cost dramatically.

The choice of Redshift came out of familiarity with the tool. Having a service that can handle the scale, and allows me to fetch directly from S3 is hard ignore. I suppose I could have used any of AWS's relational databases, but there's also the matter of personal preference.

The model itself is a result of a combination of the available data, for example, having to average the temperature to provide an estimate of the temperature during ingress, or having to join using cities and states instead of a well defined key. The primary key of the fact table is an artifact of the method used to ingest the rows (using Redshift's `COPY` command) since an autoicrementing key could not be used. I could have loaded the data into a stage table and then `SELECT` into the fact table, but that would have increased the run cost.

## Other scenarios

### The data was increased by 100x.

Since the data we're working with is related to immigration, with a 100x scale increase, the most pressing problem would be humanitarian rather than technological. That being said, I think a reasonable path would be to use EMR to prepare the data (with some careful partitioning) and a larger Redshift cluster. This speaks to the flexibility of the tools.

### The pipelines would be run on a daily basis by 7 am every day.

This would be and ideal case for Airflow, to orchestrate the ETL process, but we would also need to change the methodology to avoid truncating dimension tables and ease the ingestion of the fact table rows. As it stands now, it's intended to be used once a year.

### The database needed to be accessed by 100+ people.

It depends on the use case. For the most part, Redshift could handle the load, assuming it has been scaled appropriately. If this wasn't enough, we could just expose the most common metrics as static content. 

## Conclusions

This capstone project was both incredibly fun, fulfilling and...frustrating. The interesting part is that whenever I had a problem, there was something that was shown in a lecture, or presented as an excersize throughout the courses that could be used. In the end, I learned a lot, and I gained knowledge that I had the chance to apply on my current job.