<a href="https://colab.research.google.com/github/SnSabu/Machine-Learning-Projects/blob/main/Real_Time_Flight_Data_Using_Duffel_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Extracting Real Time Flight Data Using Duffel API



> *By Sneha Sabu*



In this notebook, we will extract flight data such as price, date and duration of different airlines using Duffel API. You will need to create a free account to get the access token used for requesting the prices. For simplicity, a static future travel date is given for which the data will be extracted. The input data contains the origin airport location, destination airport location and date of travel.

In [None]:
import re
import datetime
import time
import random
from duffel_api import Duffel


#Getting closest airports to hubs and employees
Office_Airports = pd.read_csv("Closest_airport_to_hubs.csv")
Emp_Airport = pd.read_csv("closest_airports_employees.csv")


hub_airport = pd.concat([Office_Airports['Airport 1'], Office_Airports['Airport 2']])
hub_airport = pd.DataFrame(hub_airport, columns=['Destination Airport'])

emp_airport = pd.concat([pd.Series(Emp_Airport['Airport 1'].unique()), pd.Series(Emp_Airport['Airport 2'].unique())])
emp_airport = pd.DataFrame(emp_airport, columns=['Origin Airport'])


hub_airport['key'] = 1
emp_airport['key'] = 1
airport_combinations = pd.merge(emp_airport, hub_airport, on ='key').drop("key", 1)


#Data cleaning
airport_combinations = airport_combinations.dropna().reset_index(drop=True)
airport_combinations= airport_combinations.drop_duplicates().reset_index(drop=True)
#Removing rows with same origin and destination airport location
mask = airport_combinations['Origin Airport'] != airport_combinations['Destination Airport']
# select only the rows where the mask is True
airport_combinations = airport_combinations.loc[mask]

airport_combinations['departure_date'] = '2023-03-17'


#creating access token with Duffel API Key
access_token = 'duffel_test_2d04g-56TplXGi5pPmsg2idnrQxG0VkT-caelLdpFcX'
client = Duffel(access_token = access_token)

#Calling the Duffel API
# Define a function to concatenate the values in each row
def airport_prices(row):
    destination = row['Destination Airport']
    origin = row['Origin Airport']
    departure_date = row['departure_date']

    slices = [
    {
        "origin": origin,
        "destination": destination,
        "departure_date": departure_date,
    },
    ]
    offer_request = (
        client.offer_requests.create()
        .passengers([{"type": "adult"}])
        .slices(slices)
        .return_offers()
        .execute()
    )
    offers = offer_request.offers
    #Getting the top airline details
        
    if offers[0].owner.name == "Duffel Airways" and len(offers) > 1 and offers[1] is not None:
      i = 1
    else:
      i = 0
    
#    while i < len(offers):
#        dept_time = offers[i].slices[0].segments[0].departing_at.time()
#        if 6 <= dept_time.hour < 21:
#            #print(f"Offer {i} has a departure time between 6AM and 9PM")            
#            break    
#        i += 1

#    if i >= len(offers):
#      i=1

    offer = offers[i] 

    #Extracting the flight duration in minutes
    duration_string = offer.slices[0].segments[0].duration
        
    # Regular expression to match hours and minutes in duration strings
    pattern = re.compile(r"PT(?:(\d+)H)?(?:(\d+)M)?")
    match = pattern.match(duration_string)
    hours = int(match.group(1)) if match.group(1) else 0
    minutes = int(match.group(2)) if match.group(2) else 0
    total_minutes =  hours * 60 + minutes
    #duration = datetime.timedelta(seconds=datetime.datetime.strptime(duration_string, "PT%HH%MM").timestamp())
    # Convert the timedelta to minutes and output as a numeric type
    #total_minutes = int(duration.total_seconds() // 60)
    
    #Extracting flight info
    flight_price = float(offer.total_amount)
    airline = offer.owner.name
    departure_time = offers[i].slices[0].segments[0].departing_at.time()
    arrival_time = offers[i].slices[0].segments[0].arriving_at.time()
    
    return total_minutes, flight_price, airline, departure_time, arrival_time

#for index, row in airport_combinations.iloc[1059:].iterrows():
for index, row in airport_combinations.iterrows():
    print("This is the air travel details for the row number: {} ".format(index))
    airport_combinations.loc[index, ['Duration', 'Price', 'Airlines', 'Departure Time', 'Arrival Time']] = airport_prices(row)
    print(row)
    

# Print the resulting dataframe
print(airport_combinations)

# Save the DataFrame to a CSV file
airport_combinations.to_csv('Flight_Prices_2023_03_17_Any_hr.csv', index=False)
