# Flight Data Exploration

This notebook will help you explore the flight data by showing the first 10 entries and basic information about the dataset.

## 1. Import Required Libraries

First, we need to import the libraries we'll use for data manipulation and analysis.

In [1]:
# Import pandas for data manipulation
import pandas as pd

# Import numpy for numerical operations
import numpy as np

# Set display options to show more columns
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

## 2. Load the Dataset

Now let's load the flight data from the CSV file in the data folder.

In [2]:
# Load the flight data
df = pd.read_csv('data/flights.csv')

print("Dataset loaded successfully!")
print(f"Data loaded from: data/flights.csv")

Dataset loaded successfully!
Data loaded from: data/flights.csv


## 3. Display Basic Dataset Information

Let's get some basic information about our dataset - how many rows and columns it has.

In [3]:
# Display basic information about the dataset
print("Dataset Shape (rows, columns):", df.shape)
print(f"Total number of rows: {df.shape[0]:,}")
print(f"Total number of columns: {df.shape[1]}")
print("\nDataset Info:")
df.info()

Dataset Shape (rows, columns): (271940, 20)
Total number of rows: 271,940
Total number of columns: 20

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 271940 entries, 0 to 271939
Data columns (total 20 columns):
 #   Column             Non-Null Count   Dtype  
---  ------             --------------   -----  
 0   Year               271940 non-null  int64  
 1   Month              271940 non-null  int64  
 2   DayofMonth         271940 non-null  int64  
 3   DayOfWeek          271940 non-null  int64  
 4   Carrier            271940 non-null  object 
 5   OriginAirportID    271940 non-null  int64  
 6   OriginAirportName  271940 non-null  object 
 7   OriginCity         271940 non-null  object 
 8   OriginState        271940 non-null  object 
 9   DestAirportID      271940 non-null  int64  
 10  DestAirportName    271940 non-null  object 
 11  DestCity           271940 non-null  object 
 12  DestState          271940 non-null  object 
 13  CRSDepTime         271940 non-nu

## 4. Show First 10 Entries

Now let's look at the first 10 rows of our flight data to understand what information we have.

In [4]:
# Display the first 10 rows of the dataset
print("First 10 entries of the flight dataset:")
print("=" * 50)
df.head(10)

First 10 entries of the flight dataset:


Unnamed: 0,Year,Month,DayofMonth,DayOfWeek,Carrier,OriginAirportID,OriginAirportName,OriginCity,OriginState,DestAirportID,DestAirportName,DestCity,DestState,CRSDepTime,DepDelay,DepDel15,CRSArrTime,ArrDelay,ArrDel15,Cancelled
0,2013,9,16,1,DL,15304,Tampa International,Tampa,FL,12478,John F. Kennedy International,New York,NY,1539,4,0.0,1824,13,0,0
1,2013,9,23,1,WN,14122,Pittsburgh International,Pittsburgh,PA,13232,Chicago Midway International,Chicago,IL,710,3,0.0,740,22,1,0
2,2013,9,7,6,AS,14747,Seattle/Tacoma International,Seattle,WA,11278,Ronald Reagan Washington National,Washington,DC,810,-3,0.0,1614,-7,0,0
3,2013,7,22,1,OO,13930,Chicago O'Hare International,Chicago,IL,11042,Cleveland-Hopkins International,Cleveland,OH,804,35,1.0,1027,33,1,0
4,2013,5,16,4,DL,13931,Norfolk International,Norfolk,VA,10397,Hartsfield-Jackson Atlanta International,Atlanta,GA,545,-1,0.0,728,-9,0,0
5,2013,7,28,7,UA,12478,John F. Kennedy International,New York,NY,14771,San Francisco International,San Francisco,CA,1710,87,1.0,2035,183,1,0
6,2013,10,6,7,WN,13796,Metropolitan Oakland International,Oakland,CA,12191,William P Hobby,Houston,TX,630,-1,0.0,1210,-3,0,0
7,2013,7,28,7,EV,12264,Washington Dulles International,Washington,DC,14524,Richmond International,Richmond,VA,2218,4,0.0,2301,15,1,0
8,2013,10,8,2,AA,13930,Chicago O'Hare International,Chicago,IL,11298,Dallas/Fort Worth International,Dallas/Fort Worth,TX,1010,8,0.0,1240,-10,0,0
9,2013,5,12,7,UA,12478,John F. Kennedy International,New York,NY,12892,Los Angeles International,Los Angeles,CA,1759,40,1.0,2107,10,0,0


## 5. Examine Column Names and Data Types

Let's examine what columns we have and their data types, and check for any missing values.

In [5]:
# Display column names
print("Column Names:")
print("=" * 30)
for i, col in enumerate(df.columns, 1):
    print(f"{i:2d}. {col}")

print("\n" + "=" * 50)
print("Data Types:")
print("=" * 30)
print(df.dtypes)

print("\n" + "=" * 50)
print("Missing Values in First 10 Rows:")
print("=" * 30)
missing_values = df.head(10).isnull().sum()
print(missing_values[missing_values > 0] if missing_values.sum() > 0 else "No missing values in first 10 rows")

Column Names:
 1. Year
 2. Month
 3. DayofMonth
 4. DayOfWeek
 5. Carrier
 6. OriginAirportID
 7. OriginAirportName
 8. OriginCity
 9. OriginState
10. DestAirportID
11. DestAirportName
12. DestCity
13. DestState
14. CRSDepTime
15. DepDelay
16. DepDel15
17. CRSArrTime
18. ArrDelay
19. ArrDel15
20. Cancelled

Data Types:
Year                   int64
Month                  int64
DayofMonth             int64
DayOfWeek              int64
Carrier               object
OriginAirportID        int64
OriginAirportName     object
OriginCity            object
OriginState           object
DestAirportID          int64
DestAirportName       object
DestCity              object
DestState             object
CRSDepTime             int64
DepDelay               int64
DepDel15             float64
CRSArrTime             int64
ArrDelay               int64
ArrDel15               int64
Cancelled              int64
dtype: object

Missing Values in First 10 Rows:
No missing values in first 10 rows


## Summary

This notebook has shown you:
1. How to import necessary libraries
2. How to load a CSV file into a pandas DataFrame
3. How to check basic information about your dataset
4. How to display the first 10 entries using `head(10)`
5. How to examine column names, data types, and missing values

You can now explore your flight data further by running additional analysis or creating visualizations!