# Write a Web Service

•  Wrap the output of the second exercise in a web service that returns the data in JSON format (instead of printing to the standard output).

• The web service should accept a parameter n>0. For the top 10 airports, n is 10. For the X top airports, n is X

## Step 1: Let's start with a sample

#### Pandas dataframe and json output

In [30]:
%%writefile top_arrival_airports_2013.py
import streamlit as st
import pandas as pd

st.title('Top arrival airports in 2013')
st.subheader('This web service will allow you to get the top arrival airports in terms of passengers in 2013 on a JSON format')
st.markdown('Please insert the number of TOP airports you want to get. For instance, for the TOP 10 airports you will have to specify 10.')

try:
    n = int(st.text_input("Insert a number:"))
    
    bookings_sample = pd.read_csv('bookings.sample.csv.bz2', compression='bz2', sep='^', usecols=['year','arr_port','pax'])
    
    bookings_sample_2013 = bookings_sample[bookings_sample['year'] == 2013]
    top_airports = bookings_sample_2013.groupby('arr_port')['pax'].sum().sort_values(ascending=False).head(n)
    
    st.table(top_airports)
    result_json = top_airports.to_json()
    st.json(result_json)

except ValueError:
    st.error('Only numbers allowed as input')

Overwriting top_arrival_airports_2013.py


#### Only json output

In [31]:
%%writefile top_arrival_airports_2013.py
import streamlit as st
import pandas as pd

st.title('Top arrival airports in 2013')
st.subheader('This web service will allow you to get the top arrival airports in terms of passengers in 2013 on a JSON format')
st.markdown('Please insert the number of TOP airports you want to get. For instance, for the TOP 10 airports you will have to specify 10.')

try:
    
    n = int(st.text_input("Insert a number:"))

    bookings_sample = pd.read_csv('bookings.sample.csv.bz2', compression='bz2', sep='^', usecols=['year','arr_port','pax'])

    bookings_sample_2013 = bookings_sample[bookings_sample['year'] == 2013]
    top_airports = bookings_sample_2013.groupby('arr_port')['pax'].sum().sort_values(ascending=False).head(n)

    result_json = top_airports.to_json()
    st.json(result_json)
    

except ValueError:
    st.error('Only numbers allowed as input')

Overwriting top_arrival_airports_2013.py


## Step 2: Let's do it now with the whole dataset using chunks

#### Pandas dataframe and json output

In [33]:
%%writefile top_arrival_airports_2013.py
import streamlit as st
import pandas as pd

st.title('Top arrival airports in 2013')
st.subheader('This web service will allow you to get the top arrival airports in terms of passengers in 2013 on a JSON format')
st.markdown('Please insert the number of TOP airports you want to get. For instance, for the TOP 10 airports you will have to specify 10.')

try:

    n = int(st.text_input("Insert a number:"))

    chksize = 100000
    reader = pd.read_csv('/home/dsc/Data/challenge/bookings_without_duplicates.csv' , sep='^', usecols=['year','arr_port','pax'], iterator=True, chunksize=chksize)
    all_chunks= []

    for df in reader:
        df = df[df['year'] == 2013]
        result_chunk = df.groupby('arr_port')['pax'].sum()
        all_chunks.append(result_chunk)
    
    pax_per_airport_2013 = pd.concat(all_chunks)
    top_airports = pax_per_airport_2013.reset_index().groupby('arr_port')['pax'].sum().sort_values(ascending=False).head(n)
    st.table(top_airports)

    result_json = top_airports.to_json()
    st.json(result_json)

except ValueError:
    st.error('Only numbers allowed as input')

Overwriting top_arrival_airports_2013.py


#### Only json output

In [34]:
%%writefile top_arrival_airports_2013.py
import streamlit as st
import pandas as pd

st.title('Top arrival airports in 2013')
st.subheader('This web service will allow you to get the top arrival airports in terms of passengers in 2013 on a JSON format')
st.markdown('Please insert the number of TOP airports you want to get. For instance, for the TOP 10 airports you will have to specify 10.')

try:

    n = int(st.text_input("Insert a number:"))

    chksize = 100000
    reader = pd.read_csv('/home/dsc/Data/challenge/bookings_without_duplicates.csv' , sep='^', usecols=['year','arr_port','pax'], iterator=True, chunksize=chksize)
    all_chunks= []

    for df in reader:
        df = df[df['year'] == 2013]
        result_chunk = df.groupby('arr_port')['pax'].sum()
        all_chunks.append(result_chunk)
    
    pax_per_airport_2013 = pd.concat(all_chunks)
    top_airports = pax_per_airport_2013.reset_index().groupby('arr_port')['pax'].sum().sort_values(ascending=False).head(n)

    result_json = top_airports.to_json()
    st.json(result_json)

except ValueError:
    st.error('Only numbers allowed as input')

Overwriting top_arrival_airports_2013.py


## Step 3: Let's do it again with our csv uploaded online

In order to reduce the csv uploaded online I will do the groupby analysis in local and I will load the data already sorted.

In [44]:
chksize = 100000
reader = pd.read_csv('/home/dsc/Data/challenge/bookings_without_duplicates.csv' , sep='^', usecols=['year','arr_port','pax'], iterator=True, chunksize=chksize)
all_chunks= []

for df in reader:
    df = df[df['year'] == 2013]
    result_chunk = df.groupby('arr_port')['pax'].sum()
    all_chunks.append(result_chunk)
    
pax_per_airport_2013 = pd.concat(all_chunks)
pax_per_airport_2013_sorted = pax_per_airport_2013.reset_index().groupby('arr_port')['pax'].sum().sort_values(ascending=False)

In [45]:
pax_per_airport_2013_sorted.head(25)

arr_port
LHR         9040.0
MCO         7223.0
LAX         7191.0
LAS         7079.0
JFK         6788.0
CDG         6513.0
BKK         6006.0
SFO         5929.0
MIA         5896.0
DXB         5647.0
ORD         5482.0
FCO         4576.0
IST         4442.0
DFW         4336.0
CUN         4279.0
LGA         4212.0
BCN         4206.0
MAD         4175.0
ATL         4101.0
EWR         3836.0
BOS         3673.0
DEL         3605.0
BOM         3392.0
SYD         3381.0
DEN         3361.0
Name: pax, dtype: float64

In [46]:
pax_per_airport_2013_sorted.shape

(2274,)

In [47]:
pax_per_airport_2013_sorted.to_csv('pax_per_airport_2013_sorted.csv', sep='^')

In [48]:
!ls

 bookings.sample.csv.bz2
'Exercise 1 - Counting the number of lines in a big file.ipynb'
'Exercise 2 - Top 10 arrival airports in 2013 .ipynb'
'Exercise 3 - Number of searches for Madrid, Barcelona and Malaga.ipynb'
'Exercise 4 - Searches with bookings match.ipynb'
'Exercise 5 - Write a Web Service.ipynb'
 pax_per_airport_2013_sorted.csv
 README.md
 searches.sample.csv.bz2
 top_arrival_airports_2013.py


CSV file is already available online. Let's prepare our web service again with it.

In [51]:
%%writefile top_arrival_airports_2013.py
import streamlit as st
import pandas as pd

st.title('Top arrival airports in 2013')
st.subheader('This web service will allow you to get the top arrival airports in terms of passengers in 2013 on a JSON format')
st.markdown('Please insert the number of TOP airports you want to get. For instance, for the TOP 10 airports you will have to specify 10.')

try:

    n = int(st.text_input("Insert a number:"))

    pax_per_airport_2013 = pd.read_csv('https://github.com/Laurajmoreno/DS_Challenge/blob/main/pax_per_airport_2013_sorted.csv?raw=true')
    top_airports = pax_per_airport_2013.reset_index().head(n)

    result_json = top_airports.to_json()
    st.json(result_json)

except ValueError:
    st.error('Only numbers allowed as input')

Overwriting top_arrival_airports_2013.py


In [52]:
test = pd.read_csv('pax_per_airport_2013_sorted.csv',sep='^')
test.head()

Unnamed: 0,arr_port,pax
0,LHR,9040.0
1,MCO,7223.0
2,LAX,7191.0
3,LAS,7079.0
4,JFK,6788.0
