# Entity Merge

In this notebook, we merge the bird_airport records with the icao and state values when those are present in the `faker_airports` table. We output a new table, `airline_stg.Airport`, with the merged results.

In [None]:
%%bigquery
select * from airline_stg.bird_airports
order by code
limit 5

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,code,name,city,state,country
0,01A,Afognak Lake Airport,Afognak Lake,AK,US
1,03A,Bear Creek Mining Strip,Granite Mountain,AK,US
2,04A,Lik Mining Camp,Lik,AK,US
3,05A,Little Squaw Airport,Little Squaw,AK,US
4,06A,Kizhuyak Bay,Kizhuyak,AK,US


In [None]:
%%bigquery
select * from airline_raw.faker_airports
limit 5

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,airport,iata,icao,city,state,country,load_time
0,Tarapoto airport,TPP,,Tarapoto,San Martin,Peru,2024-01-22 02:03:23.692748+00:00
1,El Loa airport,CJC,,Calama,Antofagasta,Chile,2024-01-22 02:03:23.692748+00:00
2,La Florida airport,LSC,,Compañía Alta,Coquimbo,Chile,2024-01-22 02:03:23.692748+00:00
3,Hefei-Luogang airport,HFE,,Hefei,Anhui,China,2024-01-22 02:03:23.692748+00:00
4,Guizhou,KWE,,Guiyang,Guizhou,China,2024-01-22 02:03:23.692748+00:00


In [None]:
%%bigquery
select (select count(*) from airline_raw.faker_airports
        where icao is not null) as faker_icao_count,
        (select count(*) from airline_stg.bird_airports) as bird_count

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,faker_icao_count,bird_count
0,326,6510


In [None]:
%%bigquery
  select count(*) as state_count
  from airline_stg.bird_airports b
  join airline_raw.faker_airports f
  on b.code = f.iata
  where f.state is not null
  and b.state is null

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,state_count
0,319


Note: `faker_airports` has 319 states which are missing from `bird_airports`. We will merge those values in the **state** section of this notebook.

# icao

Merge `faker_airports.icao` into `bird_airports.icao` to enrich the airport records with the icao code when it is missing from `bird_airports`. Only a small subset will be enriched (326/6510).

In [None]:
%%bigquery
  select b.*, f.icao from airline_stg.bird_airports b
  join airline_raw.faker_airports f
  on b.code = f.iata
  where f.icao is not null
  limit 5

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,code,name,city,state,country,icao
0,THR,Mehrabad International,Tehran,,Iran,OIII
1,MCT,Muscat International,Muscat,,Oman,OOMS
2,JUL,Juliaca Airport,Juliaca,,Peru,SPJL
3,LIM,Jorge Chavez International,Lima,,Peru,SPIM
4,AQP,Rodriguez Ballon International,Arequipa,,Peru,SPQU


In [None]:
%%bigquery
  create or replace table airline_stg.Airport as
    select b.code as iata, f.icao as icao,
    b.name, b.city, b.state, b.country, b.data_source, b.load_time
    from airline_stg.bird_airports b
    left join airline_raw.faker_airports f
    on b.code = f.iata;

Query is running:   0%|          |

Note: the left join is important to avoid losing records from `airline_stg.bird_airports`.




In [None]:
%%bigquery
  select * from airline_stg.Airport
  where icao is not null
  limit 5

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,iata,icao,name,city,state,country,data_source,load_time
0,THR,OIII,Mehrabad International,Tehran,,Iran,bird,2024-01-26 22:23:29.069725+00:00
1,MCT,OOMS,Muscat International,Muscat,,Oman,bird,2024-01-26 22:23:29.069725+00:00
2,LIM,SPIM,Jorge Chavez International,Lima,,Peru,bird,2024-01-26 22:23:29.069725+00:00
3,AQP,SPQU,Rodriguez Ballon International,Arequipa,,Peru,bird,2024-01-26 22:23:29.069725+00:00
4,CIX,SPHI,Capt. Jose A. Quinones Gonzales International,Chiclayo,,Peru,bird,2024-01-26 22:23:29.069725+00:00


The only problem is that the `data_source` should no longer be "bird" for the merged records. Update them to "bird_faker", to indicate that they came from both data sources:  

In [None]:
%%bigquery
update airline_stg.Airport set data_source = 'bird_faker' where icao is not null;

Query is running:   0%|          |

In [None]:
%%bigquery
  select * from airline_stg.Airport
  where icao is not null
  limit 5

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,iata,icao,name,city,state,country,data_source,load_time
0,STL,KSTL,St Louis Lambert International,St. Louis,MO,US,bird_faker,2024-01-26 22:23:29.069725+00:00
1,EWR,KEWR,Newark Liberty International,Newark,NJ,US,bird_faker,2024-01-26 22:23:29.069725+00:00
2,LGA,KLGA,LaGuardia,New York,NY,US,bird_faker,2024-01-26 22:23:29.069725+00:00
3,HOU,KHOU,William P Hobby,Houston,TX,US,bird_faker,2024-01-26 22:23:29.069725+00:00
4,SLC,KSLC,Salt Lake City International,Salt Lake City,UT,US,bird_faker,2024-01-26 22:23:29.069725+00:00


In [None]:
%%bigquery
  select count(*) icao_count
  from airline_stg.Airport
  where icao is not null

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,icao_count
0,326


In [None]:
%%bigquery
  select (select count(*) from airline_stg.Airport) as merged_table_count,
  (select count(*) from airline_stg.bird_airports) as bird_table_count

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,merged_table_count,bird_table_count
0,6510,6510


# state

Find all the `Airport` records with a null state, whose state exists in `faker_airports`. Merge `faker_airports.state` into `Airport.state` to enrich those records. Only a small subset of the records will be enriched (319/6510).

In [None]:
%%bigquery
select a.*, f.state state_from_faker
from airline_raw.faker_airports f
join airline_stg.Airport a
on f.iata = a.iata
where f.state is not null
and a.state is null
order by iata
limit 5

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,iata,icao,name,city,state,country,data_source,load_time,state_from_faker
0,ACE,GCRR,Lanzarote,Arrecife,,Spain,bird_faker,2024-01-26 22:23:29.069725+00:00,Canary Islands
1,ADZ,SKSP,Gustavo Rojas Pinilla,San Andres Island,,Colombia,bird_faker,2024-01-26 22:23:29.069725+00:00,San Andres y Providencia
2,AEP,SABE,Aeroparque Jorge Newbery,Buenos Aires,,Argentina,bird_faker,2024-01-26 22:23:29.069725+00:00,Ciudad de Buenos Aires
3,AER,URSS,Adler/Sochi Airport,Adler/Sochi,,Russia,bird_faker,2024-01-26 22:23:29.069725+00:00,Krasnodarskiy Kray
4,AGP,LEMG,Malaga Airport,Malaga,,Spain,bird_faker,2024-01-26 22:23:29.069725+00:00,Andalucia


In [None]:
%%bigquery
  update airline_stg.Airport a set state =
    (select distinct state
     from airline_raw.faker_airports f
     where f.iata = a.iata
     and f.state is not null), data_source = 'bird_faker'
  where state is null;

Query is running:   0%|          |

In [None]:
%%bigquery
  select * from airline_stg.Airport
  where iata in ('ACE', 'ADZ', 'AEP', 'AER', 'AGP')

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,iata,icao,name,city,state,country,data_source,load_time
0,AGP,LEMG,Malaga Airport,Malaga,Andalucia,Spain,bird_faker,2024-01-26 22:23:29.069725+00:00
1,ACE,GCRR,Lanzarote,Arrecife,Canary Islands,Spain,bird_faker,2024-01-26 22:23:29.069725+00:00
2,AER,URSS,Adler/Sochi Airport,Adler/Sochi,Krasnodarskiy Kray,Russia,bird_faker,2024-01-26 22:23:29.069725+00:00
3,ADZ,SKSP,Gustavo Rojas Pinilla,San Andres Island,San Andres y Providencia,Colombia,bird_faker,2024-01-26 22:23:29.069725+00:00
4,AEP,SABE,Aeroparque Jorge Newbery,Buenos Aires,Ciudad de Buenos Aires,Argentina,bird_faker,2024-01-26 22:23:29.069725+00:00


# Primary Key

In [None]:
%%bigquery
alter table airline_stg.Airport
  add primary key (iata) not enforced;

Query is running:   0%|          |

In [None]:
%%bigquery
select iata, count(*) duplicate_records
from airline_stg.Airport
group by iata
having count(*) > 1
order by count(*) desc

Query is running:   0%|          |

Downloading: |          |

Unnamed: 0,iata,duplicate_records


# Foreign Keys

In [None]:
%%bigquery
alter table airline_stg.Flight add foreign key (origin)
  references airline_stg.Airport (iata) not enforced;

Query is running:   0%|          |

In [None]:
%%bigquery
select count(*) as orphan_records
from airline_stg.Flight
where origin not in (select iata from airline_stg.Airport)

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,orphan_records
0,0


In [None]:
%%bigquery
alter table airline_stg.Flight add foreign key (dest)
  references airline_stg.Airport (iata) not enforced;

Query is running:   0%|          |

In [None]:
%%bigquery
select count(*) as orphan_records
from airline_stg.Flight
where dest not in (select iata from airline_stg.Airport)

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,orphan_records
0,0


# Cleanup

In [None]:
%%bigquery
drop table airline_stg.bird_airports

Query is running:   0%|          |