#### Dataset Conditions

####

-> Data set Structure

* All Numerical Values must be whole numbers.

* Number of flights has to be 100,000 & evenly spread out through out the year 2023 & 2022.

*  List of some International airports in the world, for Origin_Airport & Destination_Airport Attributes

*  Use short forms for airport names

* There has to be only 100 flight IDs with their own fixed travel routes. The 100,000 rows generated should only pick a flight ID from these pre-defined 100 flight IDs.

* Create varying Flight_Duration (units: hours) , based on the travel distance between origin and destination airports.

* Flight_Time hs to be consistent, There are usually morning, afternoon, evening, night & early hour flights with their own fixed timings. I want you to allocate such fixed timings to each flight

* The randomly generated dates should cover the entire 2023 & 2022 calendar years, allocate flights accordingly per month.

* Flight_Duration can vary from 2 hours to 16 hours depending on the distance between 2 hours.

-> Festival Conditions

* Include  major international festivals/holidays

* Festivals only take place once per calendar year

* The 2 days before a special event, all flights are at their max operating capacity. For example, if the holiday is on 18th, 16th & 17th witness peak Passengers_Booked and on 18th the flights only operate at below average bookings.

* On the date of the festival there's a 40-50% dip than the avg number of Passengers_Booked, as many travelers have already reached their destination on the date of the festival.

* The day after the festival, the Passengers_Booked gets back to normal 


-> Weather Conditions

* The  Weather_Conditions should only have two options, good or bad. 

* Depending on the flight travel date, correlate it with the season at that point of the year. If a certain month falls under monsoon season, that time frame is likely to experience more bad weather. Similarly if a certain date or month falls under summer, flights in that time frame are less likely to experience bad weather.

* The No_show attribute is dependent on Weather_Conditions, there are less number of passengers in Bad weather conditions and more passengers in good weather conditions.

* The No_show attribute has to show the number of passengers who did not show up. If 100 passengers booked tickets and only 80 got on the flight, clearly display the number 20 in No_show


-> Food Conditions

* NonVeg_Meal + Veg_Meal + Jain_Meal = Meals_Loaded

* Meals_Loaded should be slightly higher than Passengers_Booked

* Meals_Wasted should never exceed Meals_Loaded

* Meals_Wasted are higher if there are more Passengers_Booked & it leads to an increase in the number of No_show, incase of bad weather.

* Meals_Wasted can not be negative, it can be zero if No_show is zero. However keep in mind that Meals_Wasted = 0 is a very rare occurence as Meals_Loaded is higher than Passengers_Booked

* Jain_Food has to be high if there's a Jain Festival

*  Flights early in the morning or late at night might have different meal consumption patterns compared to midday flights. Similarly, weekdays vs. weekends or holiday seasons could affect passenger numbers and meal consumption.

* In long flight jounerys people are more likely to consume meals, where as in shoter flights in odd timings passengers are less likely to consume meals

#### Importing Libraries

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import random

# Basic parameters
total_flights = 100000
years = [2022, 2023]
flight_ids = [f"FLIGHT_{str(i).zfill(3)}" for i in range(1, 101)]
flight_times = ['Morning', 'Afternoon', 'Evening', 'Night', 'Early Hours']


#### Airport Codes & Flight Routes

In [2]:
# Assuming a simplified list of international airport codes; replace with a full list as needed
airport_codes = ['LAX', 'JFK', 'CDG', 'DXB', 'HND', 'LHR', 'SIN', 'SYD', 'FRA', 'YYZ']

# Generating fixed routes for each flight ID
np.random.seed(42)  # Ensure reproducibility
routes = [(np.random.choice(airport_codes), np.random.choice(airport_codes)) for _ in flight_ids]
flight_route_map = dict(zip(flight_ids, routes))


#### Flight Duration based on Routes

In [3]:
# Simplify by assigning random, whole number durations; in practice, this could reflect the distance between airports
flight_durations = {flight_id: np.random.randint(2, 17) for flight_id in flight_ids}


#### Distributing Flights across various dates

In [4]:
# Generate dates to evenly distribute flights across 2022 and 2023
date_start = datetime(2022, 1, 1)
date_end = datetime(2023, 12, 31)
all_dates = pd.date_range(start=date_start, end=date_end).tolist()
flight_dates = np.random.choice(all_dates, total_flights, replace=True)


#### Allocating Flight Times

In [5]:
# Allocate fixed flight times to each flight ID (simplified to random allocation for this example)
flight_time_map = {flight_id: np.random.choice(flight_times) for flight_id in flight_ids}


#### Passenger Bookings & No Show Conditions

In [6]:
# Randomly generate passengers booked (whole numbers) and no-shows (whole numbers); adjust logic for festival and weather conditions as needed
passengers_booked = np.random.randint(50, 301, size=total_flights)  # Between 50 and 300 passengers, whole numbers
weather_conditions = np.random.choice(['Good', 'Bad'], size=total_flights)
# For no-shows, ensure whole numbers are generated
no_shows = np.where(weather_conditions == 'Good', np.random.randint(0, 11, total_flights), np.random.randint(10, 21, total_flights))


#### Special Events

In [None]:
# For simplicity, randomly assign 'Yes' or 'No' to flights; customize logic for actual event dates and impacts
special_events = np.random.choice(['Yes', 'No'], size=total_flights)


#### Generating Meal Data

In [None]:
# Simplified meal calculation ensuring whole numbers; adjust based on flight time, no-shows
nonveg_meals = (passengers_booked * 0.5).astype(int)
veg_meals = (passengers_booked * 0.3).astype(int)
jain_meals = (passengers_booked * 0.2).astype(int)
meals_loaded = nonveg_meals + veg_meals + jain_meals
# Ensure whole numbers for meals wasted, with logic simplified for demonstration
meals_wasted = np.where(weather_conditions == 'Bad', (no_shows * 1.5).astype(int), no_shows)


#### Compiling & Saving Dataset

In [None]:
# Compile all data into a DataFrame ensuring all numerical values are whole numbers
flights_data = pd.DataFrame({
    'Flight_ID': np.random.choice(flight_ids, total_flights),
    'Origin_Airport': [flight_route_map[id][0] for id in np.random.choice(flight_ids, total_flights)],
    'Destination_Airport': [flight_route_map[id][1] for id in np.random.choice(flight_ids, total_flights)],
    'Flight_Duration': [flight_durations[id] for id in np.random.choice(flight_ids, total_flights)],
    'Flight_Date': [date.date() for date in flight_dates],
    'Flight_Time': [flight_time_map[id] for id in np.random.choice(flight_ids, total_flights)],
    'Passengers_Booked': passengers_booked,
    'No_show': no_shows,
    'Weather_Conditions': weather_conditions,
    'Special_Event': special_events,
    'NonVeg_Meal': nonveg_meals,
    'Veg_Meal': veg_meals,
    'Jain_Meal': jain_meals,
    'Meals_Loaded': meals_loaded,
    'Meals_Wasted': meals_wasted
})

# Save to CSV
flights_data.to_csv('synthetic_flight_data.csv', index=False)

