
Python Fundamentals Final Assignment

Titanic Dataset Analysis

Dataset: Titanic - Machine Learning from Disaster (Kaggle)

Purpose: This program analyzes the Titanic dataset to extract insights about passengers,
         including survival rates, demographics, and fare information. It demonstrates
         fundamental Python concepts including file I/O, data structures, functions,
         loops, and conditional logic.

Author: Arwa abulaila

Date: January 2026

Program: Remote Work & Freelancing Technical Training Program – Data Analysis Track
-----------------------------------------------------------------



Task 1: Program Setup and Documentation
**bold text**

● Use the print() function to display:

● Assignment title

● Your name

● Add:

● Single-line comments using #

● A docstring at the beginning of your program explaining:

● The dataset used

● The purpose of the program


● Declare variables to store:
○ File name
○ Counters and summary values

In [None]:
# TASK 1: PROGRAM SETUP AND DOCUMENTATION
# ============================================================================

# Display assignment information
print("=" * 70)
print("PYTHON FUNDAMENTALS FINAL ASSIGNMENT")
print("Titanic Dataset Analysis")
print("Student Name: Arwa abulaila")
print("=" * 70)
print()


PYTHON FUNDAMENTALS FINAL ASSIGNMENT
Titanic Dataset Analysis
Student Name: Arwa abulaila



In [None]:
# Declare variables for file handling and counters
file_name = "train.csv"  # CSV file name
total_passengers = 0  # Counter for total passengers
total_age = 0  # Sum of ages for average calculation
age_count = 0  # Count of passengers with valid age data

**Task 2: Reading and Preparing the Data**

● Open the CSV file

● Read all lines from the file

● Split each line into individual values

● Store the dataset as a nested list

● Display the number of records in the dataset


In [None]:
def read_csv_file(filename):
    """
    Reads a CSV file and returns the data as a nested list.

    Parameters:
        filename (str): Name of the CSV file to read

    Returns:
        list: Nested list containing all rows from the CSV file
    """
    try:
        with open(filename, 'r', encoding='utf-8') as file:
            # Read all lines from the file
            lines = file.readlines()

            # Initialize nested list to store dataset
            dataset = []

            # Process each line
            for line in lines:
                # Remove newline characters and split by comma
                row = line.strip().split(',')
                dataset.append(row)

            return dataset
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found.")
        print("Please upload the train.csv file to Google Colab.")
        return []


# Read the dataset
print("Reading dataset...")
titanic_data = read_csv_file(file_name)


Reading dataset...


In [None]:
for row_index in range(50):
    if row_index < len(titanic_data):
        print(titanic_data [row_index])
    else:
        break

['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']
['1', '0', '3', '"Braund', ' Mr. Owen Harris"', 'male', '22', '1', '0', 'A/5 21171', '7.25', '', 'S']
['2', '1', '1', '"Cumings', ' Mrs. John Bradley (Florence Briggs Thayer)"', 'female', '38', '1', '0', 'PC 17599', '71.2833', 'C85', 'C']
['3', '1', '3', '"Heikkinen', ' Miss. Laina"', 'female', '26', '0', '0', 'STON/O2. 3101282', '7.925', '', 'S']
['4', '1', '1', '"Futrelle', ' Mrs. Jacques Heath (Lily May Peel)"', 'female', '35', '1', '0', '113803', '53.1', 'C123', 'S']
['5', '0', '3', '"Allen', ' Mr. William Henry"', 'male', '35', '0', '0', '373450', '8.05', '', 'S']
['6', '0', '3', '"Moran', ' Mr. James"', 'male', '', '0', '0', '330877', '8.4583', '', 'Q']
['7', '0', '1', '"McCarthy', ' Mr. Timothy J"', 'male', '54', '0', '0', '17463', '51.8625', 'E46', 'S']
['8', '0', '3', '"Palsson', ' Master. Gosta Leonard"', 'male', '2', '3', '1', '349909', '21.075', '', 'S']
['9'

In [None]:
#Check if data was loaded successfully
if not titanic_data:
    print("Failed to load dataset. Please ensure 'titanic.csv' is uploaded.")
else:
    # Display number of records
    print(f"Successfully loaded {len(titanic_data)} records")
    print(f"Number of data rows (excluding header): {len(titanic_data) - 1}")
    print()

Successfully loaded 892 records
Number of data rows (excluding header): 891



# **Task 3: Data Types and Type Conversion**

● Identify numeric and text values in the dataset

● Convert numeric fields (such as age and fare) to the correct data types

● Use the type() function to check variable types

● Handle missing or incorrect values using logical conditions

In [None]:
# Identify numeric and text values in the dataset

print("Identifying data types in raw dataset:\n")

# Show header
header = tuple(titanic_data[0])
print("Columns:", header)
print()

# Show sample raw row
sample_row = titanic_data[1]

for col, value in zip(header, sample_row):
    print(f"{col}: {value} --> {type(value)}")


Identifying data types in raw dataset:

Columns: ('PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked')

PassengerId: 1 --> <class 'str'>
Survived: 0 --> <class 'str'>
Pclass: 3 --> <class 'str'>
Name: "Braund --> <class 'str'>
Sex:  Mr. Owen Harris" --> <class 'str'>
Age: male --> <class 'str'>
SibSp: 22 --> <class 'str'>
Parch: 1 --> <class 'str'>
Ticket: 0 --> <class 'str'>
Fare: A/5 21171 --> <class 'str'>
Cabin: 7.25 --> <class 'str'>
Embarked:  --> <class 'str'>


In [None]:
# Convert numeric fields to correct data types

processed_data = []

for i in range(1, len(titanic_data)):
    row = titanic_data[i]

    # The 'Name' field can contain commas, causing split(',') to create extra elements.
    # Based on the sample_row and header, the 'Name' field is split across row[3] and row[4],
    # and all subsequent fields are shifted by one index.
    passenger = {
        'PassengerId': int(row[0]) if row[0] else 0,
        'Survived': int(row[1]) if row[1] else 0,
        'Pclass': int(row[2]) if row[2] else 0,
        'Name': row[3] + ', ' + row[4], # Reconstruct the name from the split parts
        'Sex': row[5] if len(row) > 5 else '', # Corrected index for Sex
        'Age': float(row[6]) if row[6] else None, # Corrected index for Age (was row[5])
        'SibSp': int(row[7]) if row[7] else 0, # Corrected index
        'Parch': int(row[8]) if row[8] else 0, # Corrected index
        'Ticket': row[9], # Corrected index
        'Fare': float(row[10]) if row[10] else 0.0, # Corrected index
        'Cabin': row[11] if len(row) > 11 else '', # Corrected index and length check
        'Embarked': row[12] if len(row) > 12 else '' # Corrected index and length check
    }

    processed_data.append(passenger)

print(f"Processed {len(processed_data)} passenger records.")

Processed 891 passenger records.


In [None]:
# Verify data types after conversion

print("Data Type Verification:\n")

sample = processed_data[0]

print("PassengerId:", type(sample['PassengerId']))
print("Survived:", type(sample['Survived']))
print("Pclass:", type(sample['Pclass']))
print("Age:", type(sample['Age']))
print("Fare:", type(sample['Fare']))
print("Name:", type(sample['Name']))
print("Sex:", type(sample['Sex']))


Data Type Verification:

PassengerId: <class 'int'>
Survived: <class 'int'>
Pclass: <class 'int'>
Age: <class 'float'>
Fare: <class 'float'>
Name: <class 'str'>
Sex: <class 'str'>


In [None]:
# Handle missing or incorrect values

missing_age_count = 0
missing_fare_count = 0

for passenger in processed_data:
    if passenger['Age'] is None:
        missing_age_count += 1

    if passenger['Fare'] == 0:
        missing_fare_count += 1

print(f"Passengers with missing age: {missing_age_count}")
print(f"Passengers with missing or zero fare: {missing_fare_count}")


Passengers with missing age: 177
Passengers with missing or zero fare: 15


In [None]:
# Handle missing or incorrect values by replacing them with default values

# Calculate average age (excluding missing)
ages = [p['Age'] for p in processed_data if p['Age'] is not None]
average_age = sum(ages) / len(ages)

fixed_age = 0
fixed_fare = 0
fixed_embarked = 0

for passenger in processed_data:

    # Replace missing age with average age
    if passenger['Age'] is None:
        passenger['Age'] = average_age
        fixed_age += 1

    # Replace incorrect fare
    if passenger['Fare'] <= 0:
        passenger['Fare'] = 0.0
        fixed_fare += 1

    # Replace missing embarkation
    if passenger['Embarked'] == '' or passenger['Embarked'].isspace():
        passenger['Embarked'] = 'Unknown'
        fixed_embarked += 1

print("Missing / Incorrect Values Fixed:")
print(f"Ages replaced with average: {fixed_age}")
print(f"Invalid fares corrected: {fixed_fare}")
print(f"Missing embarkation replaced: {fixed_embarked}")


Missing / Incorrect Values Fixed:
Ages replaced with average: 177
Invalid fares corrected: 15
Missing embarkation replaced: 2


# **Task 4: Calculations and Operations**

● Perform arithmetic calculations such as:

○ Total number of passengers

○ Average age

○ Highest and lowest fare

● Use comparison and logical operators to filter data

● Demonstrate the correct order of operations using parentheses

In [None]:
# Total number of passengers
total_passengers = len(processed_data)
print(f"Total passengers: {total_passengers}")

Total passengers: 891


In [None]:
# Average age (exclude missing values)
ages = [p['Age'] for p in processed_data if p['Age'] is not None]
average_age = sum(ages) / len(ages)
print(f"Average age: {average_age:.2f}")

Average age: 29.70


In [None]:
# Highest and lowest fare
fares = [p['Fare'] for p in processed_data if p['Fare'] > 0]
highest_fare = max(fares)
lowest_fare = min(fares)
print(f"Highest fare: ${highest_fare:.2f}")
print(f"Lowest fare: ${lowest_fare:.2f}")

Highest fare: $512.33
Lowest fare: $4.01


In [None]:
# Filter passengers using comparison and logical operators

# Passengers older than 60 who survived
senior_survivors = []

for passenger in processed_data:
    if passenger['Age'] is not None and passenger['Age'] > 60 and passenger['Survived'] == 1:
        senior_survivors.append(passenger)

print(f"Passengers older than 60 who survived: {len(senior_survivors)}")


Passengers older than 60 who survived: 5


This code filters passengers who belong to either first or second class using comparison and logical operators, then counts how many passengers meet the condition.

In [None]:
# Passengers in Class 1 OR Class 2

upper_class_passengers = []

for passenger in processed_data:
    if passenger['Pclass'] == 1 or passenger['Pclass'] == 2:
        upper_class_passengers.append(passenger)

print(f"Passengers in Class 1 or 2: {len(upper_class_passengers)}")


Passengers in Class 1 or 2: 400


In [None]:
# Demonstrate correct order of operations using parentheses

# Survival rate calculation
survivors = sum(1 for p in processed_data if p['Survived'] == 1)

# Without parentheses (just for demonstration)
rate_without_parentheses = survivors / total_passengers * 100

# With parentheses (correct & clearer)
rate_with_parentheses = (survivors / total_passengers) * 100

print(f"Survival rate without parentheses: {rate_without_parentheses:.2f}%")
print(f"Survival rate with parentheses: {rate_with_parentheses:.2f}%")


Survival rate without parentheses: 38.38%
Survival rate with parentheses: 38.38%


# Task 5: User Input and Conditional Logic

● Ask the user to enter a value (for example, passenger class or gender)

● Validate the user input

● Use if, elif, and else statements to filter the data based on the input

● Display the results using formatted output (f-strings)

In [None]:

def validate_class_input():
    """
    Validates user input for passenger class using a while loop.

    Returns:
        int: Valid passenger class (1, 2, or 3)
    """
    while True:
        try:
            user_class = input("Enter passenger class to analyze (1, 2, or 3): ")
            class_num = int(user_class)

            # Validate the input is within range
            if class_num in [1, 2, 3]:
                return class_num
            else:
                print("Invalid input. Please enter 1, 2, or 3.")
        except ValueError:
            print("Invalid input. Please enter a number.")

# Get user input
print("=" * 70)
print("FILTER DATA BY PASSENGER CLASS")
print("=" * 70)

selected_class = validate_class_input()

# Filter data based on user input
filtered_passengers = []

for passenger in processed_data:
    if passenger['Pclass'] == selected_class:
        filtered_passengers.append(passenger)

# Display filtered results
print(f"\n--- Class {selected_class} Passengers ---")
print(f"Total passengers in Class {selected_class}: {len(filtered_passengers)}")

if filtered_passengers:
    # Calculate statistics for filtered data
    class_survivors = sum(1 for p in filtered_passengers if p['Survived'] == 1)
    class_survival_rate = (class_survivors / len(filtered_passengers)) * 100

    class_ages = [p['Age'] for p in filtered_passengers if p['Age'] is not None]
    class_avg_age = sum(class_ages) / len(class_ages) if class_ages else 0

    class_fares = [p['Fare'] for p in filtered_passengers if p['Fare'] > 0]
    class_avg_fare = sum(class_fares) / len(class_fares) if class_fares else 0

    # Use f-strings for formatted output
    print(f"Survival Rate: {class_survival_rate:.1f}%")
    print(f"Average Age: {class_avg_age:.1f} years")
    print(f"Average Fare: ${class_avg_fare:.2f}")
print()


FILTER DATA BY PASSENGER CLASS
Enter passenger class to analyze (1, 2, or 3): 1

--- Class 1 Passengers ---
Total passengers in Class 1: 216
Survival Rate: 63.0%
Average Age: 37.0 years
Average Fare: $86.15



# Task 6: Working with Sequences

● Access specific data values using indexing

● Extract subsets of data using slicing


● Modify lists using built-in list functions such as:

○ append()
○ remove()
○ sort()
○ reverse()

In [None]:
print("=" * 70)
print("SEQUENCE OPERATIONS")
print("=" * 70)

# Create a list of passenger names
passenger_names = [p['Name'] for p in processed_data[:20]]

# Indexing: Access specific elements
print(f"First passenger: {passenger_names[0]}")
print(f"Last passenger: {passenger_names[-1]}")
print(f"Fifth passenger: {passenger_names[4]}")

SEQUENCE OPERATIONS
First passenger: "Braund,  Mr. Owen Harris"
Last passenger: "Masselmani,  Mrs. Fatima"
Fifth passenger: "Allen,  Mr. William Henry"


In [None]:
# Slicing: Extract subsets
print(f"\nFirst 5 passengers: {passenger_names[:5]}")
print(f"Last 3 passengers: {passenger_names[-3:]}")
print(f"Passengers 5 to 10: {passenger_names[5:10]}")


First 5 passengers: ['"Braund,  Mr. Owen Harris"', '"Cumings,  Mrs. John Bradley (Florence Briggs Thayer)"', '"Heikkinen,  Miss. Laina"', '"Futrelle,  Mrs. Jacques Heath (Lily May Peel)"', '"Allen,  Mr. William Henry"']
Last 3 passengers: ['"Williams,  Mr. Charles Eugene"', '"Vander Planke,  Mrs. Julius (Emelia Maria Vandemoortele)"', '"Masselmani,  Mrs. Fatima"']
Passengers 5 to 10: ['"Moran,  Mr. James"', '"McCarthy,  Mr. Timothy J"', '"Palsson,  Master. Gosta Leonard"', '"Johnson,  Mrs. Oscar W (Elisabeth Vilhelmina Berg)"', '"Nasser,  Mrs. Nicholas (Adele Achem)"']


In [None]:
# List modification operations
sample_list = passenger_names[:10].copy()
print(f"\nOriginal list length: {len(sample_list)}")


Original list length: 10


In [None]:
# append(): Add new element
sample_list.append("New Passenger")
print(f"After append: {len(sample_list)} items")

After append: 11 items


In [None]:
# remove(): Remove specific element
if "New Passenger" in sample_list:
    sample_list.remove("New Passenger")
    print(f"After remove: {len(sample_list)} items")

After remove: 10 items


In [None]:
# sort(): Sort the list
fares_list = [p['Fare'] for p in processed_data[:10]]
fares_list.sort()
print(f"\nSorted fares (first 10): {fares_list}")



Sorted fares (first 10): [7.25, 7.925, 8.05, 8.4583, 11.1333, 21.075, 30.0708, 51.8625, 53.1, 71.2833]


In [None]:
# reverse(): Reverse the list
fares_list.reverse()
print(f"Reversed fares: {fares_list}")
print()

Reversed fares: [71.2833, 53.1, 51.8625, 30.0708, 21.075, 11.1333, 8.4583, 8.05, 7.925, 7.25]



# Task 7: Using Data Structures

● Use a tuple to store column headers

● Use a set to find unique values in a column (such as embarkation port)

● Use a dictionary to store summary information (for example, number of
passengers by class)

● Iterate through the dictionary to display results

In [None]:
print("=" * 70)
print("DATA STRUCTURES")
print("=" * 70)


# Tuple: Store column headers (immutable)
column_headers = ('PassengerId', 'Survived', 'Pclass', 'Name', 'Sex',
                  'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked')
print(f"Column Headers (Tuple): {column_headers}")
print(f"Number of columns: {len(column_headers)}")


DATA STRUCTURES
Column Headers (Tuple): ('PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked')
Number of columns: 12


In [None]:
# Set: Find unique values in embarkation port
embarkation_ports = set()
for passenger in processed_data:
    if passenger['Embarked'] and passenger['Embarked'].strip():
        embarkation_ports.add(passenger['Embarked'])

print(f"\nUnique Embarkation Ports (Set): {embarkation_ports}")
print(f"Number of unique ports: {len(embarkation_ports)}")


Unique Embarkation Ports (Set): {'C', 'Q', 'S', 'Unknown'}
Number of unique ports: 4


In [None]:
 #Dictionary: Store passengers by class
passengers_by_class = {
    1: 0,
    2: 0,
    3: 0
}

# Count passengers in each class
for passenger in processed_data:
    pclass = passenger['Pclass']
    if pclass in passengers_by_class:
        passengers_by_class[pclass] += 1

In [None]:
# Iterate through dictionary and display results
print("\nPassengers by Class (Dictionary):")
for class_num, count in passengers_by_class.items():
    percentage = (count / total_passengers) * 100 if total_passengers > 0 else 0
    print(f"  Class {class_num}: {count} passengers ({percentage:.1f}%)")
print()


Passengers by Class (Dictionary):
  Class 1: 216 passengers (24.2%)
  Class 2: 184 passengers (20.7%)
  Class 3: 491 passengers (55.1%)



# Task 8: Loops and Flow Control

● Use for loops to process the dataset

● Use a while loop to validate user input

● Apply nested loops where appropriate

● Use break and continue to control loop execution

In [None]:
# For loop: Calculate gender distribution
gender_count = {'male': 0, 'female': 0}

for passenger in processed_data:
    gender = passenger['Sex'].lower()
    if gender in gender_count:
        gender_count[gender] += 1

print("Gender Distribution:")
for gender, count in gender_count.items():
    print(f"  {gender.capitalize()}: {count}")

Gender Distribution:
  Male: 577
  Female: 314


In [None]:
# Survival Analysis by Class and Gender (Simpler Version)
print("\nSurvival Analysis by Class and Gender:")

for class_num in [1, 2, 3]:
    print(f"\nClass {class_num}:")
    for gender in ['male', 'female']:
        # Filter passengers matching class and gender
        matching = [p for p in processed_data if p['Pclass'] == class_num and p['Sex'].lower() == gender]

        survived = sum(p['Survived'] for p in matching)
        total = len(matching)

        if total > 0:
            rate = (survived / total) * 100
            print(f"  {gender.capitalize()}: {survived}/{total} survived ({rate:.1f}%)")



Survival Analysis by Class and Gender:

Class 1:
  Male: 45/122 survived (36.9%)
  Female: 91/94 survived (96.8%)

Class 2:
  Male: 17/108 survived (15.7%)
  Female: 70/76 survived (92.1%)

Class 3:
  Male: 47/347 survived (13.5%)
  Female: 72/144 survived (50.0%)


In [None]:
# While loop with break: Find first passenger above certain age
print("\nFinding first passenger above age 70...")
target_age = 70
index = 0
found = False

while index < len(processed_data):
    passenger = processed_data[index]
    age = passenger['Age']

    if age is not None and age > target_age:
        print(f"  Found: {passenger['Name']}, Age: {age}")
        found = True
        break  # Exit loop when found

    index += 1

if not found:
    print(f"  No passenger found above age {target_age}")
print()


Finding first passenger above age 70...
  Found: "Goldschmidt,  Mr. George B", Age: 71.0



# Task 9: Functions and Modular Programming

● Create at least two user-defined functions, such as:

○ A function to read data from the file

○ A function to calculate summary statistics

○ A function to filter data based on user input

● Use parameters and return values

● Save at least one function in a separate Python file and import it into your
notebook

In [None]:
%%writefile my_functions.py
def filter_by_gender(data, gender):

    filtered = []
    for passenger in data:
        if passenger['Sex'].lower() == gender.lower():
            filtered.append(passenger)
    return filtered

def filter_by_class(data, pclass):
    """
    Filters passengers by passenger class based on user input.

    Parameters:
        data (list): List of passenger dictionaries
        pclass (int): Passenger class to filter (1, 2, or 3)

    Returns:
        list: Filtered list of passengers in the specified class
    """
    filtered = []
    for passenger in data:
        if passenger['Pclass'] == pclass:
            filtered.append(passenger)
    return filtered
def calculate_statistics(data):
    """
    Calculates comprehensive statistics for a dataset.

    Parameters:
        data (list): List of passenger dictionaries

    Returns:
        dict: Dictionary containing various statistics
    """
    if not data:
        return {}

    stats = {
        'total': len(data),
        'survived': sum(1 for p in data if p['Survived'] == 1),
        'avg_age': 0,
        'avg_fare': 0,
        'male': sum(1 for p in data if p['Sex'].lower() == 'male'),
        'female': sum(1 for p in data if p['Sex'].lower() == 'female')
    }

    ages = [p['Age'] for p in data if p['Age'] is not None]
    stats['avg_age'] = sum(ages) / len(ages) if ages else 0

    fares = [p['Fare'] for p in data if p['Fare'] > 0]
    stats['avg_fare'] = sum(fares) / len(fares) if fares else 0

    stats['survival_rate'] = (stats['survived'] / stats['total'] * 100) if stats['total'] > 0 else 0

    return stats


Overwriting my_functions.py


In [None]:
import my_functions
import importlib
importlib.reload(my_functions)  # لتحديث الملف بعد أي تعديل

from my_functions import filter_by_gender, filter_by_class




In [None]:
# Demonstrate the custom functions
print("=" * 70)
print("MODULAR PROGRAMMING - USING CUSTOM FUNCTIONS")
print("=" * 70)


# DEMONSTRATION 1: Filter by gender using USER-DEFINED FUNCTION
print("--- User-Defined Function: filter_by_gender() ---")
female_passengers = filter_by_gender(processed_data, 'female')
male_passengers = filter_by_gender(processed_data, 'male')
print(f"Female passengers: {len(female_passengers)}")
print(f"Male passengers: {len(male_passengers)}")

MODULAR PROGRAMMING - USING CUSTOM FUNCTIONS
--- User-Defined Function: filter_by_gender() ---
Female passengers: 314
Male passengers: 577


In [None]:
# DEMONSTRATION 2: Filter by class using USER-DEFINED FUNCTION
print("\n--- User-Defined Function: filter_by_class() ---")
first_class = filter_by_class(processed_data, 1)
second_class = filter_by_class(processed_data, 2)
third_class = filter_by_class(processed_data, 3)

print(f"First Class passengers: {len(first_class)}")
print(f"Second Class passengers: {len(second_class)}")
print(f"Third Class passengers: {len(third_class)}")


--- User-Defined Function: filter_by_class() ---
First Class passengers: 216
Second Class passengers: 184
Third Class passengers: 491


In [None]:
# DEMONSTRATION 3: Calculate statistics using USER-DEFINED FUNCTION
print("\n--- User-Defined Function: calculate_statistics() ---")
print("\nFemale Passenger Statistics:")
female_stats = calculate_statistics(female_passengers)
for key, value in female_stats.items():
    if 'rate' in key or 'avg' in key:
        print(f"  {key}: {value:.2f}")
    else:
        print(f"  {key}: {value}")

print("\nMale Passenger Statistics:")
male_stats = calculate_statistics(male_passengers)
for key, value in male_stats.items():
    if 'rate' in key or 'avg' in key:
        print(f"  {key}: {value:.2f}")
    else:
        print(f"  {key}: {value}")


--- User-Defined Function: calculate_statistics() ---

Female Passenger Statistics:
  total: 314
  survived: 233
  avg_age: 28.22
  avg_fare: 44.48
  male: 0
  female: 314
  survival_rate: 74.20

Male Passenger Statistics:
  total: 577
  survived: 109
  avg_age: 30.51
  avg_fare: 26.21
  male: 577
  female: 0
  survival_rate: 18.89


In [None]:
%%writefile statistics_module.py
def calculate_median(numbers):

    if not numbers:
        return 0

    sorted_numbers = sorted(numbers)
    n = len(sorted_numbers)

    # If odd number of elements, return middle
    if n % 2 == 1:
        return sorted_numbers[n // 2]
    # If even number of elements, return average of two middle values
    else:
        mid1 = sorted_numbers[n // 2 - 1]
        mid2 = sorted_numbers[n // 2]
        return (mid1 + mid2) / 2

def calculate_mode(values):

    if not values:
        return None

    # Count frequency of each value
    frequency = {}
    for value in values:
        if value in frequency:
            frequency[value] += 1
        else:
            frequency[value] = 1

    # Find value with highest frequency
    max_freq = 0
    mode_value = None

    for value, freq in frequency.items():
        if freq > max_freq:
            max_freq = freq
            mode_value = value

    return mode_value

def calculate_range(numbers):

    if not numbers:
        return 0

    return max(numbers) - min(numbers)

def calculate_variance(numbers):

    if not numbers or len(numbers) < 2:
        return 0

    # Calculate mean
    mean = sum(numbers) / len(numbers)

    # Calculate sum of squared differences
    squared_diff = sum((x - mean) ** 2 for x in numbers)

    # Return variance
    return squared_diff / len(numbers)

def calculate_standard_deviation(numbers):

    variance = calculate_variance(numbers)
    return variance ** 0.5

def get_percentile(numbers, percentile):

    if not numbers or percentile < 0 or percentile > 100:
        return 0

    sorted_numbers = sorted(numbers)
    index = (percentile / 100) * (len(sorted_numbers) - 1)

    # If index is integer, return that element
    if index.is_integer():
        return sorted_numbers[int(index)]
    # Otherwise, interpolate between two nearest values
    else:
        lower = sorted_numbers[int(index)]
        upper = sorted_numbers[int(index) + 1]
        fraction = index - int(index)
        return lower + (upper - lower) * fraction

Overwriting statistics_module.py


In [None]:
print("\n--- Using Imported Statistics Module ---")
import statistics_module
import importlib
importlib.reload(statistics_module)  # لتحديث الملف بعد أي تعديل
median_age = statistics_module.calculate_median(ages)
mode_class = statistics_module.calculate_mode([p['Pclass'] for p in processed_data])
print(f"Median Age: {median_age:.2f} years")
print(f"Most Common Passenger Class: {mode_class}")
print()



--- Using Imported Statistics Module ---
Median Age: 29.70 years
Most Common Passenger Class: 3



In [None]:
print("=" * 70)
print("GENERATING SUMMARY REPORT")
print("=" * 70)

# Create comprehensive summary report
report_filename = "titanic_analysis_report.txt"

with open(report_filename, 'w') as report:
    report.write("=" * 70 + "\n")
    report.write("TITANIC DATASET ANALYSIS - SUMMARY REPORT\n")
    report.write("=" * 70 + "\n\n")

    report.write(f"Student Name: Arwa abulaila\n")
    report.write(f"Analysis Date: January 2026\n")
    report.write(f"Dataset: Titanic - Machine Learning from Disaster\n\n")

    report.write("--- OVERALL STATISTICS ---\n")
    report.write(f"Total Passengers: {total_passengers}\n")

    report.write(f"Average Age: {average_age:.2f} years\n")
    report.write(f"Median Age: {median_age:.2f} years\n")
    report.write(f"Highest Fare: £{highest_fare:.2f}\n")
    report.write(f"Lowest Fare: £{lowest_fare:.2f}\n\n")

    report.write("--- PASSENGERS BY CLASS ---\n")
    for class_num, count in passengers_by_class.items():
        pct = (count / total_passengers) * 100 if total_passengers > 0 else 0
        report.write(f"Class {class_num}: {count} passengers ({pct:.1f}%)\n")

    report.write("\n--- GENDER DISTRIBUTION ---\n")
    report.write(f"Male: {male_stats['total']} ({male_stats['survival_rate']:.1f}% survived)\n")
    report.write(f"Female: {female_stats['total']} ({female_stats['survival_rate']:.1f}% survived)\n")

    report.write("\n--- EMBARKATION PORTS ---\n")
    for port in sorted(embarkation_ports):
        port_passengers = sum(1 for p in processed_data if p['Embarked'] == port)
        report.write(f"{port}: {port_passengers} passengers\n")

    report.write("\n" + "=" * 70 + "\n")
    report.write("END OF REPORT\n")
    report.write("=" * 70 + "\n")

print(f"Summary report saved to: {report_filename}")
print("\n" + "=" * 70)
print("ANALYSIS COMPLETE")
print("=" * 70)
print("\nDeliverables:")
print("1. ✓ Google Colab Notebook (.ipynb)")
print("2. ✓ Summary Report (titanic_analysis_report.txt)")
print("3. ✓ Python Module (statistics_module.py)")
print("\nAll tasks completed successfully!")

GENERATING SUMMARY REPORT
Summary report saved to: titanic_analysis_report.txt

ANALYSIS COMPLETE

Deliverables:
1. ✓ Google Colab Notebook (.ipynb)
2. ✓ Summary Report (titanic_analysis_report.txt)
3. ✓ Python Module (statistics_module.py)

All tasks completed successfully!
