## Information Visualization on Airport and Routes Datasets

The data is gotten from [OpenFlight](https://openflights.org/data.html)

## objectives

The goal of this project is to create  visuals that reveal and communicates insights from the dataset. The visuals willbe used to reveal obvious insights, such as:

1. How many airports are in each country?
2. Which airports are the busiest?
3. What are the routes with the least stops between airports X and Y?

There are also some less obvious questions, such as:

    1. Are there any interesting patterns or trends in the data?
    2. What insight can we gain from multi-variate visualisations of the data?
    3. Which location has the densest flight connections?
    4. Which routes are likely to cause jet lags?

In [127]:
# if you don't have altair installed you can uncomment the next line of code and run
#pip install altair vega_datasets

In [128]:
# lets import necessary libraries

import warnings

warnings.filterwarnings("ignore")

import altair as alt
import pandas as pd

In [129]:
# reading in the first dataset
df1 = pd.read_csv('Airport.csv')

## Exploratory Data analysis 

In [130]:
df1.head(3)

Unnamed: 0,Airport_Id,Name,City,Country,IATA,ICAO,Latitude,Longitude,Altitude,Timezone,DST,Tz_Database_Timezone,Type,Source
0,2,Madang Airport,Madang,Papua New Guinea,MAG,AYMD,-5.20708,145.789002,20.0,10,U,Pacific/Port_Moresby,airport,OurAirports
1,3,Mount Hagen Kagamuga Airport,Mount Hagen,Papua New Guinea,HGU,AYMH,-5.82679,144.296005,5388.0,10,U,Pacific/Port_Moresby,airport,OurAirports
2,4,Nadzab Airport,Nadzab,Papua New Guinea,LAE,AYNZ,-6.569803,146.725977,239.0,10,U,Pacific/Port_Moresby,airport,OurAirports


In [131]:
df1.tail(5)

Unnamed: 0,Airport_Id,Name,City,Country,IATA,ICAO,Latitude,Longitude,Altitude,Timezone,DST,Tz_Database_Timezone,Type,Source
12663,14108.0,Krechevitsy Air Base,Novgorod,Russia,\N,ULLK,58.625,31.385,85.0,\N,\N,\N,airport,OurAirports
12664,14109.0,Desierto de Atacama Airport,Copiapo,Chile,CPO,SCAT,-27.2612,-70.779198,670.0,\N,\N,\N,airport,OurAirports
12665,14110.0,Melitopol Air Base,Melitopol,Ukraine,\N,UKDM,46.880001,35.305,0.0,\N,\N,\N,airport,OurAirports
12666,14111.0,Lincoln Train Station LNK,Lincoln,United States,\N,\N,40.815833,-96.713889,1176.0,-5,A,\N,\N,\N
12667,,,,,,,,,,,,,,


In [132]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12668 entries, 0 to 12667
Data columns (total 14 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Airport_Id            12668 non-null  object 
 1   Name                  12667 non-null  object 
 2   City                  12618 non-null  object 
 3   Country               12667 non-null  object 
 4   IATA                  12667 non-null  object 
 5   ICAO                  12666 non-null  object 
 6   Latitude              12667 non-null  float64
 7   Longitude             12667 non-null  float64
 8   Altitude              12667 non-null  float64
 9   Timezone              12667 non-null  object 
 10  DST                   12667 non-null  object 
 11  Tz_Database_Timezone  12667 non-null  object 
 12  Type                  12667 non-null  object 
 13  Source                12667 non-null  object 
dtypes: float64(3), object(11)
memory usage: 1.4+ MB


It is observed that the last row consist of null values which is needed to be removed

In [133]:
df1 = df1[:-1]

In [134]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12667 entries, 0 to 12666
Data columns (total 14 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Airport_Id            12667 non-null  object 
 1   Name                  12667 non-null  object 
 2   City                  12618 non-null  object 
 3   Country               12667 non-null  object 
 4   IATA                  12667 non-null  object 
 5   ICAO                  12666 non-null  object 
 6   Latitude              12667 non-null  float64
 7   Longitude             12667 non-null  float64
 8   Altitude              12667 non-null  float64
 9   Timezone              12667 non-null  object 
 10  DST                   12667 non-null  object 
 11  Tz_Database_Timezone  12667 non-null  object 
 12  Type                  12667 non-null  object 
 13  Source                12667 non-null  object 
dtypes: float64(3), object(11)
memory usage: 1.4+ MB


The dataset consists of 14 features and 12667 observations.

There are null values in the city and the icao column

From the dataset description text, it shows that the type column cosist of different values(airport,unknown,station,port and \\N), we need to confirm that and extract the data that has airport type

In [135]:
#chhecking for unique values in column 'type'
df1['Type'].unique()

array(['airport', 'unknown', 'station', 'port', '\\N'], dtype=object)

In [136]:
#extracting airport type only so as to concentrate on airport data only
df1 = df1[df1['Type']=='airport'] 

In [137]:
#airport data only
df1

Unnamed: 0,Airport_Id,Name,City,Country,IATA,ICAO,Latitude,Longitude,Altitude,Timezone,DST,Tz_Database_Timezone,Type,Source
0,2,Madang Airport,Madang,Papua New Guinea,MAG,AYMD,-5.207080,145.789002,20.0,10,U,Pacific/Port_Moresby,airport,OurAirports
1,3,Mount Hagen Kagamuga Airport,Mount Hagen,Papua New Guinea,HGU,AYMH,-5.826790,144.296005,5388.0,10,U,Pacific/Port_Moresby,airport,OurAirports
2,4,Nadzab Airport,Nadzab,Papua New Guinea,LAE,AYNZ,-6.569803,146.725977,239.0,10,U,Pacific/Port_Moresby,airport,OurAirports
3,5,Port Moresby Jacksons International Airport,Port Moresby,Papua New Guinea,POM,AYPY,-9.443380,147.220001,146.0,10,U,Pacific/Port_Moresby,airport,OurAirports
4,6,Wewak International Airport,Wewak,Papua New Guinea,WWK,AYWK,-3.583830,143.669006,19.0,10,U,Pacific/Port_Moresby,airport,OurAirports
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12661,14106,Rogachyovo Air Base,Belaya,Russia,\N,ULDA,71.616699,52.478298,272.0,\N,\N,\N,airport,OurAirports
12662,14107,Ulan-Ude East Airport,Ulan Ude,Russia,\N,XIUW,51.849998,107.737999,1670.0,\N,\N,\N,airport,OurAirports
12663,14108,Krechevitsy Air Base,Novgorod,Russia,\N,ULLK,58.625000,31.385000,85.0,\N,\N,\N,airport,OurAirports
12664,14109,Desierto de Atacama Airport,Copiapo,Chile,CPO,SCAT,-27.261200,-70.779198,670.0,\N,\N,\N,airport,OurAirports


## 1. How many Airport are in each Country 

Since the dataset consists of airport data only, then we can answer the first question

The dataset we have for visualization consists of 8,263 rows. Meanwhile, the limitation for Altair is the size of the data, the data for visualization must not exceed 5,000 rows.

We will be using only some portion of the data but the data will be picked randomly so as to have distributed sample in order to have insightful visual

In [138]:
#using randomly picked 60 percent of the dataset
df3 = df1.sample(frac=0.6)

#geting world map background visual

#pip install vega_datasets

In [139]:
from vega_datasets import data
data.world_110m.url

'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/world-110m.json'

In [140]:
#visualizing the numbers of airport in each country on a map


countries = alt.topo_feature(data.world_110m.url, 'countries')
hover = alt.selection_multi(fields=['count'],bind='legend')
#world map background
background = alt.Chart(countries).mark_geoshape(
    fill='lightgray',
    stroke='white'
).properties(
    width=500,
    height=300
).project("equirectangular")

# airport positions on background
points = alt.Chart(df3).transform_aggregate(
    latitude='mean(Latitude)',
    longitude='mean(Longitude)',
    count='count()',
    groupby=['Country']
).mark_circle().encode(
    longitude='longitude:Q',
    latitude='latitude:Q',
    size=alt.Size('count:Q', title='Numbers of Airports'),
    color=alt.value('indigo'),
    tooltip=['Country:N','count:Q']
).properties(
    title='Number of airports in Each Countries'
).add_selection(hover)

(background+points)

Hover the mouse on each point to show the Country and Numbers or Airport in the Country

1. The visual above shows United States as the country with the highest number of airport.
2. The countries that has the least number of airport are located most in an island

In [141]:
#numbers of airport in each country for the whole dataset
airport_in_country = df1['Country'].value_counts()
airport_in_country = pd.Series(airport_in_country)
df2 = pd.DataFrame(airport_in_country.index, columns=['country'])
df2['no_of_airport'] = airport_in_country.values
df2

Unnamed: 0,country,no_of_airport
0,United States,1752
1,Canada,459
2,Australia,343
3,Russia,271
4,Brazil,269
...,...,...
232,Isle of Man,1
233,Jersey,1
234,West Bank,1
235,Gambia,1


There are 237 different country in the dataset, the country with the highest no of airport is the united state with 1752 airport while the country with the least no of airport is wake island with one airport the no of airport is much. 

we will be visualizing the top 20 country with the highest numbers of airport 

In [142]:
top_20_countries = df2[:20]

top_20_countries

Unnamed: 0,country,no_of_airport
0,United States,1752
1,Canada,459
2,Australia,343
3,Russia,271
4,Brazil,269
5,Germany,257
6,China,251
7,France,221
8,United Kingdom,177
9,Indonesia,158


## 2. Which Airport is the busiest ?

To know the busiest air port we need to know the no of passenger deplaned and enplaned but that was not provided but we can achieve this by calculating the no of airline in the source airport and destination airport

In [143]:
# reading in the second dataset
route = pd.read_csv('Route.csv')

In [144]:
#print out the first five rows
route.head(5)

Unnamed: 0,Airline,Airline_id,Source_airport,Source_airport_id,Destination_airport,Destination_airport_id,Codeshare,Stops,Equipment
0,2B,410,AER,2965,KZN,2990,,0.0,CR2
1,2B,410,ASF,2966,KZN,2990,,0.0,CR2
2,2B,410,ASF,2966,MRV,2962,,0.0,CR2
3,2B,410,CEK,2968,KZN,2990,,0.0,CR2
4,2B,410,CEK,2968,OVB,4078,,0.0,CR2


In [145]:
route.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 67664 entries, 0 to 67663
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Airline                 67664 non-null  object 
 1   Airline_id              67663 non-null  object 
 2   Source_airport          67663 non-null  object 
 3   Source_airport_id       67663 non-null  object 
 4   Destination_airport     67663 non-null  object 
 5   Destination_airport_id  67663 non-null  object 
 6   Codeshare               14597 non-null  object 
 7   Stops                   67663 non-null  float64
 8   Equipment               67645 non-null  object 
dtypes: float64(1), object(8)
memory usage: 4.6+ MB


There are 9 columns and 67664 entries in the dataset.

Some of the column contans null value.

Since we need the source_arport and destination_airport column to calculate the no of departed and arrived airlines, we don't need to work on the null value because the column we need does not contain null values. 

In [146]:
# let's check the no of airlines departed from each airport
source = {}
for row in route['Source_airport']:
    if row in source:
        source[row] += 1
    else:
        source[row] = 1
# let's check the no of airlines that arrive to each airport
destination = {}
for row in route['Destination_airport']:
    if row in destination:
        destination[row] += 1
    else:
        destination[row] = 1

In [147]:
# let's add the no of airline that arrive to and depart from each airport together in the source variable
for c in destination:
    if c in source:
        source[c] += destination[c]
    else:
        source[c] = destination[c]
            
    

In [148]:
# sorted list of airport and the total no of airline in descending order of no of airline
sorted_source=sorted(source.items(), key=lambda x:x[1], reverse=True)
sorted_source = [list(row) for row in sorted_source]

We need to get the busiest airport but it is represented by the IATA code in the df2 dataset. 

Therefore, we would need to compare the IATA code in this dataset with df2 IATA and Name column to get the name of the airport.

In [149]:
# lets get the longitude, latitude, country, IATA and the name column from df1 dataset
country = df1[['IATA','Name','Country','Longitude','Latitude']]
country = country.values.tolist()
# comparing the IATA code, get the name and append the airport name and the no of airline
i=[]
for row in country:
    for x in sorted_source:
        if row[0]==x[0]:
            i.append([row[1],x[1],row[0],row[2],row[3],row[4]])
i = sorted(i,key = lambda x:x[1],reverse=True)
airport = []
no_of_airline = []
IATA_code = []
airport_location = []
latitude = []
longitude = []
for row in i:
    airport.append(row[0])
    no_of_airline.append(row[1])
    IATA_code.append(row[2])
    airport_location.append(row[3])
    longitude.append(row[4])
    latitude.append(row[5])

# converting to pandas dataframe for easy visualization
busiest = pd.DataFrame({'airport':airport, 'no_of_airline': no_of_airline, 'IATA_code':IATA_code, 'airport_location':airport_location,
                       'longitude':longitude, 'latitude':latitude})
busy = busiest[:20]
busy

Unnamed: 0,airport,no_of_airline,IATA_code,airport_location,longitude,latitude
0,Hartsfield Jackson Atlanta International Airport,1826,ATL,United States,-84.428101,33.6367
1,Chicago O'Hare International Airport,1108,ORD,United States,-87.9048,41.9786
2,Beijing Capital International Airport,1069,PEK,China,116.584999,40.080101
3,London Heathrow Airport,1051,LHR,United Kingdom,-0.461941,51.4706
4,Charles de Gaulle International Airport,1041,CDG,France,2.55,49.012798
5,Frankfurt am Main Airport,990,FRA,Germany,8.570556,50.033333
6,Los Angeles International Airport,990,LAX,United States,-118.407997,33.942501
7,Dallas Fort Worth International Airport,936,DFW,United States,-97.038002,32.896801
8,John F Kennedy International Airport,911,JFK,United States,-73.7789,40.639801
9,Amsterdam Airport Schiphol,903,AMS,Netherlands,4.76389,52.308601


The table above shows the top 20 busiest airport with 'Hartsfield Jackson Atlanta International Airport' as the most busiest airport having 1826 airline arrived and departed alltogether

In [150]:
# visualizing the top 10 busiest airport

countries = alt.topo_feature(data.world_110m.url, 'countries')
brush=alt.selection_multi(fields=['airport'],bind = 'legend')
brush1=alt.selection_multi(fields=['airport_location'], bind = 'legend')

base = alt.Chart(countries).mark_geoshape(
fill = 'lightgrey',
stroke = 'white'
).project('equirectangular')
main = alt.Chart(busy).mark_point().encode(
    latitude = 'latitude:Q',
    longitude = 'longitude:Q',
    size=alt.Size('no_of_airline', legend=None),
    color = alt.condition(brush|brush1,alt.Color('airport:N',legend=alt.Legend(title='Airports')),alt.value('lightgrey')),
    shape= alt.Shape('airport_location',legend=alt.Legend(title='Country',orient='left')),
tooltip = ('no_of_airline','airport','IATA_code','airport_location')).add_selection(brush).add_selection(brush1).properties(width=500, height=300)



(base+main)

Hover on each point to show the details of the point, click on the airport legend to locate the airport on the visual and click on the country legend to show the airports in each country

The above visual shows the 10 busiest airports and the most busiest airport is 'Hartsfield Jackson Atlanta International Airport' with the total of 1826 airlines departing from and arriving at the airport

## 3. What are the Routes with the least stops between airport X and Y

In [151]:
route['Stops'].value_counts()

0.0    67652
1.0       11
Name: Stops, dtype: int64

In [152]:
det = route[['Source_airport','Destination_airport','Stops']]
det.value_counts().sort_values(ascending=False)

Source_airport  Destination_airport  Stops
ORD             ATL                  0.0      20
ATL             ORD                  0.0      19
ORD             MSY                  0.0      13
HKT             BKK                  0.0      13
JFK             LHR                  0.0      12
                                              ..
TTN             RDU                  0.0       1
                MDW                  0.0       1
SEA             BZN                  0.0       1
TTN             MCO                  0.0       1
AAE             ALG                  0.0       1
Length: 37606, dtype: int64

In [153]:
# removing duplicate values
det = det.drop_duplicates(keep='first')
det1=det.sample(frac=0.6)
det['Stops'].value_counts()

0.0    37595
1.0       11
Name: Stops, dtype: int64

In [154]:
y = det['Stops'].value_counts().tolist()
x = det['Stops'].unique()
x = x[:-1]
stops = pd.DataFrame({'stops':x,'no_of_routes':y})

In [155]:
# visualizing no of routes and no of stops
alt.Chart(stops).mark_area().encode(
alt.Y(field='no_of_routes',type='quantitative'),
alt.X(field='stops',type='nominal'),
tooltip = ('stops','no_of_routes')
).properties(width=200)

The stops between two airport in the dataset are  0 and 1 which represent direct and no of stop which is 1 but the percentage of direct routes in the given dataset is higher than the routes with stops.

The number of direct routes (0) are 37595 which is 99.9% and we have 11 routes with 1 stop which is aproximately 0.1%       

## Less obvious Questions 

1. Are there any interesting patterns or trends in the data?
2. What insight can we gain from multi-variate visualisations of the data?
3. Which location has the densest flight connections?
4. Which routes are likely to cause jet lags?

### Observations

It is observed that United States has the 1,752 airports which is the country with most airports.

The airports that has the highest no of direct flight connection are located in United States.

Most of the busiest airports are located in United States.

Considering the airport data, it is approximately 65% of the total data.
As United States is one of the largest country, the distances are long and planes are faster. We can say United States invest more in airport and travel domestically by air more than smaller countries, whose most airport are mostly for international travel

Most of the Country wiith least numbers of Airport are located on the island

### Insights from multi-variate visualizations of the data 
The multi-variate visualization for this data is the top 10 busiest airport visual which shows that most of the busiest airports are located in United States

## Which location has the densest flight connections ?

In [156]:
conn = route[['Source_airport','Destination_airport']]

In [157]:
flight1 = []
flight = conn.value_counts().tolist()
for row in conn.value_counts().index.tolist():
    row=list(row)
    flight1.append(row)
flight2 = pd.DataFrame(flight1, columns = ['Source_airport','Destination_airport'])
flight2['no_of_flight_conn']=flight
flight2

Unnamed: 0,Source_airport,Destination_airport,no_of_flight_conn
0,ORD,ATL,20
1,ATL,ORD,19
2,ORD,MSY,13
3,HKT,BKK,13
4,HKG,BKK,12
...,...,...,...
37590,IST,AQJ,1
37591,IST,AJI,1
37592,IST,AGP,1
37593,IST,ADF,1


In [158]:
# calculating the no of flight(to and fro) for each connection between two locations
des = list(flight2['Destination_airport'])
src = list(flight2['Source_airport'])
fcon = list(flight2['no_of_flight_conn'])
add = {}
for x,y,z in zip(des,src,fcon):
    a = y+'-'+x
    b = x+'-'+y
    if (a not in add) and (b in add):
        add[b] += z
    else:
        add[a] = z
add = dict(sorted(add.items(), key=lambda x:x[1], reverse=True))
sd = add.keys()
loc_x = []
loc_y = []
no_of_conn = list(add.values())
for i in sd:
    j = i.split('-')
    loc_x.append(j[0])
    loc_y.append(j[1])

In [159]:
flight_connection1 = pd.DataFrame({'location_x':loc_x, 'location_y':loc_y, 'no_of_flight_connection':no_of_conn})
flight_connection1


Unnamed: 0,location_x,location_y,no_of_flight_connection
0,ORD,ATL,39
1,HKG,BKK,24
2,ATL,MIA,24
3,JFK,LHR,24
4,HKT,BKK,23
...,...,...,...
19252,IRG,AUU,1
19253,INV,EDI,1
19254,IST,JRO,1
19255,IST,KAN,1


In [160]:
# calculating the no of flight connection for each location
unique_des = list(flight2['Destination_airport'].unique())
unique_src = list(flight2['Source_airport'].unique())
# checking the no of flight connection for source airport
loc = {}
for row,rew in zip(add.keys(),no_of_conn):
    for raw in unique_src:
        if raw in row:
            if raw in loc:
                loc[raw] += rew
            else:
                loc[raw] = rew
loc1 = {}
# checking the no of flight connection for destination airport
for row,rew in zip(add.keys(),no_of_conn):
    for raw in unique_des:
        if raw in row:
            if raw in loc1:
                loc1[raw] += rew
            else:
                loc1[raw] = rew
# combining the result for source and destination airport
for row,rew in zip(loc1.keys(),list(loc1.values())):
    if row not in loc:
        loc[row] = rew


In [161]:
loc = dict(sorted(loc.items(), key=lambda x:x[1], reverse=True))
location = list(loc.keys())
no_of_flight = list(loc.values())
flight_connection = pd.DataFrame({'airport':location, 'no_of_flight_connection':no_of_flight})
# getting the location of each airport
ctry = []
for rew in flight_connection['airport']:
    f = list(df1['Country'][df1['IATA']== rew])
    ctry.append(f)

flight_connection['country'] = ctry
flight_connection

Unnamed: 0,airport,no_of_flight_connection,country
0,ATL,1826,[United States]
1,ORD,1108,[United States]
2,PEK,1069,[China]
3,LHR,1051,[United Kingdom]
4,CDG,1041,[France]
...,...,...,...
3420,UII,1,[Honduras]
3421,QFX,1,[]
3422,FMI,1,[Congo (Kinshasa)]
3423,KYK,1,[United States]


In [162]:
flight_connection['country'].value_counts()

TypeError: unhashable type: 'list'

Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
  File "pandas\_libs\hashtable_class_helper.pxi", line 5231, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'list'


[United States]    607
[Canada]           207
[China]            177
[Brazil]           124
[Russia]           114
                  ... 
[Burundi]            1
[Grenada]            1
[Nicaragua]          1
[Turkmenistan]       1
[Niue]               1
Name: country, Length: 226, dtype: int64

In [163]:
#getting the airport location, numbers of airline, longitude and latitude column from 'busiest' dataset
i = busiest['airport_location']
j = busiest['no_of_airline']
k = busiest['longitude']
l = busiest['latitude']

densest_location = {}
long = {}
lat = {}
con = {}

#Getting the numbers of airport in each country and mean longitude and latitude for each location
for w,x,y,z in zip(i,j,k,l):
    if w in densest_location:
        densest_location[w] += x
        long[w] += y
        lat[w] += z
        con[w] += 1
    else:
        densest_location[w] = x
        long[w] = y
        lat[w] = z
        con[w] = 1
        
p = list(long.values())
q = list(lat.values())
r = list(con.values())

m_longitude = []
m_latitude = []

for x,y,z in zip(p,q,r):
    m_longitude.append(x/z)
    m_latitude.append(y/z)
    
location = densest_location.keys()
no_flight_connection = list(densest_location.values())


densest = pd.DataFrame({'location':location,'no_of_flight_connection':no_flight_connection, 'latitude':m_latitude, 'longitude':m_longitude})
densest

Unnamed: 0,location,no_of_flight_connection,latitude,longitude
0,United States,26398,45.334488,-113.725334
1,China,16359,33.773252,110.455868
2,United Kingdom,5300,54.586718,-3.074396
3,France,3856,45.316255,1.194778
4,Germany,4688,51.106073,9.706741
...,...,...,...,...
220,Lesotho,2,-29.462299,27.552500
221,American Samoa,2,-14.331000,-170.710007
222,Tuvalu,2,-8.525000,179.195999
223,Cocos (Keeling) Islands,2,-12.188300,96.833900


The location with the densest flight connection is United States with 26,398 flight connection.

The airport that has the densest flight connection with counts of 1826 flight connection is located in United States. 

United States has the highest no of airport flight connection with counts of 607.

Therefore, the location with densest flight connection is United States.

In [164]:
dense = densest[:20]

In [165]:
countries = alt.topo_feature(data.world_110m.url, 'countries')
click = alt.selection_multi(fields=['location'], bind='legend')

background1 = alt.Chart(countries).mark_geoshape(
    fill='lightgray',
    stroke='white'
).properties(
    width=500,
    height=300
).project("equirectangular")

# airport positions on background
points1 = alt.Chart(dense).mark_circle().encode(
    longitude='longitude:Q',
    latitude='latitude:Q',
    size=alt.Size('no_of_flight_connection:N', title='Number of Airports',legend=None),
    color= alt.condition(click,alt.Color('location:N'),alt.value('grey')),
    tooltip=['location:N','no_of_flight_connection:Q']
).properties(title='Top 20 Densest Location').add_selection(click)
(background1+points1)

Click on the legend to filter for each location