# Lab:  Airline Routes Analysis

1. **Data Overview:**  
   - Load the CSV file and print the first 10 rows.  
   - Identify and describe the columns in the dataset. What does each column represent?

2. **Descriptive Statistics:**  
   - Calculate the mean, median, and standard deviation for both flight distances and ticket costs.  
   - What do these statistics tell you about the variability in flight distances and ticket pricing?

3. **Graph Construction:**  
   - Build a directed graph using NetworkX from the dataset.  
   - Discuss the benefits of visualizing airline routes as a graph. How does this help in understanding the overall network connectivity?

4. **Route Analysis:**  
   - Using Dijkstra’s algorithm, determine the shortest route (by flight distance) between `JFK` and `LAX`.  
   - Determine the cheapest route (by ticket cost) for the same pair of airports.  
   - Compare the two paths: Why might they differ? What factors could influence a traveler’s decision when choosing a route?

5. **Connection Metrics:**  
   - Compute the number of outbound flights per airport. Which airport appears to be the busiest in terms of departures?  
   - Compute the number of inbound flights per airport. Is there an airport that stands out as a major arrival hub?

6. **Cost Analysis:**  
   - For each airport, calculate the average and standard deviation of ticket costs for outbound flights.  
   - What might a high standard deviation indicate about ticket pricing at a specific airport?

7. **Visualization and Correlation:**  
   - Plot a histogram to show the distribution of ticket costs. What does the shape of the histogram suggest about the data?  
   - Create a scatter plot of flight distance versus ticket cost. Is there an observable correlation between the two? Discuss any trends you observe.

8. **Reflection and Extension:**  
   - What insights can you draw from the differences between the shortest (by distance) route and the cheapest (by cost) route?  
   - If you were a data analyst for an airline company, how might you use this data to inform business strategies such as pricing adjustments and route planning?  
   - *(Optional)* Modify the dataset—either by generating additional routes or adjusting existing values—and observe how the statistics and graph change. Report your findings and insights.


In [3]:
#This code generates a file with random flights...

import random
import csv

# List of sample airports.
airports = ['JFK', 'ATL', 'ORD', 'DFW', 'LAX', 'MIA', 'SFO', 'SEA', 'DEN', 'BOS']

# Set a seed for reproducibility.
random.seed(42)

# Create 400 entries.
num_entries = 400

with open('airline_routes.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    # Write header.
    writer.writerow(["source", "destination", "flight_distance_km", "ticket_cost"])
    for _ in range(num_entries):
        src = random.choice(airports)
        dest = random.choice(airports)
        # Ensure source and destination are not the same.
        while dest == src:
            dest = random.choice(airports)
        # Random flight distance between 300 km and 5000 km.
        flight_distance = random.randint(300, 5000)
        # Random ticket cost between $100 and $800.
        ticket_cost = random.randint(100, 800)
        writer.writerow([src, dest, flight_distance, ticket_cost])