# Task 01 - Track A: Python Basics & Dataset Exploration

**Course:** Database Applications Development  
**Lesson:** 01 - Introduction to JupyterLab & Python Fundamentals  


---

## Instructions

Complete all exercises in this notebook. Each section includes example code to help guide you. Read the examples carefully, then complete the exercises below them.

**Submission:**
1. Save this notebook as `dbAppsTask01TrackA.ipynb`
2. Add, commit, and push to your `databaseApplications` repository on GitHub
3. Verify the file appears correctly on GitHub

---

## Titanic Dataset - Data Dictionary

You'll be working with the Titanic dataset throughout this task. Here's what each column means:

| Column | Description | Data Type | Example Values |
|--------|-------------|-----------|----------------|
| **pclass** | Passenger class (ticket class) | Integer | 1 = First Class<br>2 = Second Class<br>3 = Third Class |
| **survived** | Survival status | Integer | 0 = Did not survive<br>1 = Survived |
| **name** | Passenger's full name | String | "Braund, Mr. Owen Harris" |
| **sex** | Gender | String | "male" or "female" |
| **age** | Age in years | Float | 22.0, 38.0, 26.0 |
| **sibsp** | Number of siblings/spouses aboard | Integer | 0, 1, 2, etc. |
| **parch** | Number of parents/children aboard | Integer | 0, 1, 2, etc. |
| **ticket** | Ticket number | String | "A/5 21171", "PC 17599" |
| **fare** | Passenger fare (ticket price) | Float | 7.25, 71.28, 8.05 |
| **cabin** | Cabin number | String | "C85", "E46", "B96 B98" |
| **embarked** | Port of embarkation | String | C = Cherbourg<br>Q = Queenstown<br>S = Southampton |
| **boat** | Lifeboat number | String | "13", "4", "D" |
| **body** | Body identification number | Integer | For victims recovered |
| **home.dest** | Home/destination | String | "New York, NY", "Montreal, PQ" |

**Note:** Some columns may have missing values (NaN = Not a Number), which is common in real-world datasets.

---

## Part 1: Variables and Data Types

Variables store data for later use. Python has several built-in data types.

### Example: Creating Variables

In [None]:
# Example of different data types
student_name = "Alice Johnson"  # String (text)
student_age = 17                # Integer (whole number)
student_gpa = 3.75              # Float (decimal number)
is_passing = True               # Boolean (True/False)

# Print with descriptive labels
print("Name:", student_name)
print("Age:", student_age)
print("GPA:", student_gpa)
print("Passing:", is_passing)

### Exercise 1.1: Create Your Own Variables

Create the following variables with your own information:
- `my_name` - your full name (string)
- `my_age` - your age (integer)
- `my_gpa` - a GPA value (float)
- `enrolled` - whether you're enrolled in this course (boolean)

Print each variable with a descriptive label.

In [2]:
student_name = "Max Kirsh"  # String (text)
student_age = 17                # Integer (whole number)
student_gpa = 3.75              # Float (decimal number)
is_passing = True               # Boolean (True/False)

# Print with descriptive labels
print("Name:", student_name)
print("Age:", student_age)
print("GPA:", student_gpa)
print("Passing:", is_passing)

Name: Max Kirsh
Age: 17
GPA: 3.75
Passing: True


### Example: Checking Data Types

In [None]:
# Use type() to check what kind of data is stored
course_name = "Database Applications"
course_code = 145085

print("Type of course_name:", type(course_name))  # <class 'str'>
print("Type of course_code:", type(course_code))  # <class 'int'>

### Exercise 1.2: Check Data Types

Use the `type()` function to check the data type of each variable you created in Exercise 1.1. Print the results.

In [3]:
# Use type() to check what kind of data is stored
course_name = "Database Applications"
course_code = 145085

print("Type of course_name:", type(course_name))  # 
print("Type of course_code:", type(course_code))  # 


Type of course_name: <class 'str'>
Type of course_code: <class 'int'>


---

## Part 2: Basic Operations

Python can perform calculations and manipulate strings.

### Example: Mathematical Operations

In [None]:
# Calculate total cost for movie tickets
ticket_price = 10.50
number_of_tickets = 4

total_cost = ticket_price * number_of_tickets

print("Total cost for", number_of_tickets, "tickets: $", total_cost)

### Exercise 2.1: Calculate Ticket Prices

A theater charges:
- Adult tickets: $12.50
- Child tickets: $7.50

Calculate the total cost for a family with:
- 2 adults
- 3 children

Store the result in a variable called `total_cost` and print it with a descriptive message.

In [4]:
# Your code here
adult_price = 12.50 
child_price = 7.50
num_adults = 2
num_children = 3

total_cost = adult_price * num_adults + child_price * num_children

print(total_cost)

47.5


### Example: String Concatenation

In [None]:
# Combining strings together
first = "John"
last = "Smith"
age = 25

# Method 1: Using + operator
full_name = first + " " + last
print(full_name)

# Method 2: Using f-strings (formatted string literals)
message = f"Hello! My name is {first} {last} and I am {age} years old."
print(message)

### Exercise 2.2: String Concatenation

Create three variables:
- `first_name`
- `last_name`
- `favorite_subject`

Combine them to create and print a sentence like:  
"Hello! My name is [first] [last] and my favorite subject is [subject]."

You can use either the + operator or f-strings.

In [5]:
# Combining strings together
first = "Max"
last = "Kirsh"
age = 17

# Method 1: Using + operator
full_name = first + " " + last
print(full_name)

# Method 2: Using f-strings (formatted string literals)
message = f"Hello! My name is {first} {last} and I am {age} years old."
print(message)

Max Kirsh
Hello! My name is Max Kirsh and I am 17 years old.


---

## Part 3: Working with Comments

Comments explain what your code does. They're ignored by Python but help humans understand your work.

### Example: Using Comments

In [None]:
# Calculate the area of a circle
# Formula: area = π × radius²

radius = 5          # Circle radius in centimeters
pi = 3.14159        # Approximation of π (pi)

area = pi * radius ** 2  # ** means "to the power of"

print("Area of circle:", area, "square cm")

### Exercise 3.1: Add Comments

The code below calculates the area of a rectangle. Add comments to explain each step.

In [None]:
length = 10
width = 5
area = length * width
print("The area is:", area)


---

## Part 4: Introduction to Pandas

Pandas is Python's most powerful library for working with data. We'll use it to load and explore the Titanic dataset.

### Example: Importing Pandas

In [None]:
# Import pandas with the standard alias 'pd'
import pandas as pd

# Now we can use pandas functions by typing pd.function_name()
print("Pandas version:", pd.__version__)

### Exercise 4.1: Import Pandas

Import the pandas library using the standard alias `pd`.

In [8]:
# Import pandas with the standard alias 'pd'
import pandas as pd

# Now we can use pandas functions by typing pd.function_name()
print("Pandas version:", pd.__version__)

Pandas version: 2.2.3


### Example: Loading a CSV File

In [9]:
# Load a CSV file into a DataFrame
# Replace 'sample.csv' with your actual filename
data = pd.read_csv('sample.csv')

# The data is now stored in a DataFrame object
# A DataFrame is like a spreadsheet - it has rows and columns

FileNotFoundError: [Errno 2] No such file or directory: 'sample.csv'

### Exercise 4.2: Load the Titanic Dataset

Use pandas to read the `Titanic Dataset.csv` file into a DataFrame called `titanic`.

**Note:** Make sure the CSV file is in the same folder as this notebook!

In [10]:
# Load a CSV file into a DataFrame
# Replace 'sample.csv' with your actual filename
data = pd.read_csv('Titanic Dataset.csv')

# The data is now stored in a DataFrame object
# A DataFrame is like a spreadsheet - it has rows and columns
     


### Example: Viewing the First Rows

In [11]:
# View the first 5 rows of any DataFrame
data.head()

# You can specify how many rows to show
data.head(10)  # Shows first 10 rows

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.3375,B5,S,2,,"St Louis, MO"
1,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"
2,1,0,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1,2,113781,151.55,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1,2,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
5,1,1,"Anderson, Mr. Harry",male,48.0,0,0,19952,26.55,E12,S,3,,"New York, NY"
6,1,1,"Andrews, Miss. Kornelia Theodosia",female,63.0,1,0,13502,77.9583,D7,S,10,,"Hudson, NY"
7,1,0,"Andrews, Mr. Thomas Jr",male,39.0,0,0,112050,0.0,A36,S,,,"Belfast, NI"
8,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53.0,2,0,11769,51.4792,C101,S,D,,"Bayside, Queens, NY"
9,1,0,"Artagaveytia, Mr. Ramon",male,71.0,0,0,PC 17609,49.5042,,C,,22.0,"Montevideo, Uruguay"


### Exercise 4.3: View the First Rows

Display the first 5 rows of the Titanic dataset using the `.head()` method.

In [12]:
# View the first 5 rows of any DataFrame
data.head()

# You can specify how many rows to show
data.head(10)  # Shows first 10 rows


Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.3375,B5,S,2,,"St Louis, MO"
1,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"
2,1,0,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1,2,113781,151.55,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1,2,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
5,1,1,"Anderson, Mr. Harry",male,48.0,0,0,19952,26.55,E12,S,3,,"New York, NY"
6,1,1,"Andrews, Miss. Kornelia Theodosia",female,63.0,1,0,13502,77.9583,D7,S,10,,"Hudson, NY"
7,1,0,"Andrews, Mr. Thomas Jr",male,39.0,0,0,112050,0.0,A36,S,,,"Belfast, NI"
8,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53.0,2,0,11769,51.4792,C101,S,D,,"Bayside, Queens, NY"
9,1,0,"Artagaveytia, Mr. Ramon",male,71.0,0,0,PC 17609,49.5042,,C,,22.0,"Montevideo, Uruguay"


### Example: Dataset Information

In [13]:
# Get information about the DataFrame structure
data.info()

# This shows:
# - Number of rows and columns
# - Column names
# - Data types of each column
# - How many non-null (non-missing) values in each column

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 14 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   pclass     1309 non-null   int64  
 1   survived   1309 non-null   int64  
 2   name       1309 non-null   object 
 3   sex        1309 non-null   object 
 4   age        1046 non-null   float64
 5   sibsp      1309 non-null   int64  
 6   parch      1309 non-null   int64  
 7   ticket     1309 non-null   object 
 8   fare       1308 non-null   float64
 9   cabin      295 non-null    object 
 10  embarked   1307 non-null   object 
 11  boat       486 non-null    object 
 12  body       121 non-null    float64
 13  home.dest  745 non-null    object 
dtypes: float64(3), int64(4), object(7)
memory usage: 143.3+ KB


### Exercise 4.4: Dataset Information

Use the `.info()` method to see the structure of the Titanic dataset.

In [14]:
# Get information about the DataFrame structure
data.info()

# This shows:
# - Number of rows and columns
# - Column names
# - Data types of each column
# - How many non-null (non-missing) values in each column


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 14 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   pclass     1309 non-null   int64  
 1   survived   1309 non-null   int64  
 2   name       1309 non-null   object 
 3   sex        1309 non-null   object 
 4   age        1046 non-null   float64
 5   sibsp      1309 non-null   int64  
 6   parch      1309 non-null   int64  
 7   ticket     1309 non-null   object 
 8   fare       1308 non-null   float64
 9   cabin      295 non-null    object 
 10  embarked   1307 non-null   object 
 11  boat       486 non-null    object 
 12  body       121 non-null    float64
 13  home.dest  745 non-null    object 
dtypes: float64(3), int64(4), object(7)
memory usage: 143.3+ KB


### Exercise 4.5: Answer Questions

Based on the `.info()` output, answer these questions in the markdown cell below:

1. How many rows (passengers) are in the dataset?
2. How many columns are in the dataset?
3. What is the data type of the 'age' column?
4. What is the data type of the 'survived' column?

**Your Answers:**

1. Number of rows: 
2. Number of columns: 
3. Data type of 'age': 
4. Data type of 'survived': 

---

## Part 5: Basic DataFrame Exploration

Let's explore the Titanic dataset using pandas methods.

### Example: Common DataFrame Methods

In [None]:
# Useful DataFrame methods:

data.tail()       # View last rows
data.columns      # Get column names
data.shape        # Get (rows, columns) as a tuple
data.describe()   # Get statistics for numerical columns

### Exercise 5.1: View Last Rows

Display the last 5 rows of the dataset using the `.tail()` method.

In [15]:
data.tail()       # View last rows


Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
1304,3,0,"Zabour, Miss. Hileni",female,14.5,1,0,2665,14.4542,,C,,328.0,
1305,3,0,"Zabour, Miss. Thamine",female,,1,0,2665,14.4542,,C,,,
1306,3,0,"Zakarian, Mr. Mapriededer",male,26.5,0,0,2656,7.225,,C,,304.0,
1307,3,0,"Zakarian, Mr. Ortin",male,27.0,0,0,2670,7.225,,C,,,
1308,3,0,"Zimmerman, Mr. Leo",male,29.0,0,0,315082,7.875,,S,,,


### Exercise 5.2: Get Column Names

Print all the column names in the dataset using `.columns`.

In [11]:
data.columns      # Get column names


Index(['pclass', 'survived', 'name', 'sex', 'age', 'sibsp', 'parch', 'ticket',
       'fare', 'cabin', 'embarked', 'boat', 'body', 'home.dest'],
      dtype='object')

### Exercise 5.3: Get Dataset Shape

Print the shape (rows, columns) of the dataset using `.shape`.

**Hint:** This will return a tuple like (1309, 14) meaning 1309 rows and 14 columns.

In [16]:
data.shape        # Get (rows, columns) as a tuple


(1309, 14)

### Example: Descriptive Statistics

In [None]:
# Get basic statistics about numerical columns
data.describe()

# This shows:
# count - number of non-missing values
# mean - average value
# std - standard deviation
# min - minimum value
# 25% - first quartile
# 50% - median (middle value)
# 75% - third quartile
# max - maximum value

### Exercise 5.4: Basic Statistics

Use the `.describe()` method to see basic statistics about numerical columns in the Titanic dataset.

In [12]:
data.describe()   # Get statistics for numerical columns


(1309, 14)

### Exercise 5.5: Interpret Statistics

Based on the `.describe()` output, answer these questions:

1. What is the average (mean) age of passengers?
2. What is the maximum fare paid?
3. What percentage of passengers survived? (Hint: Look at the 'survived' column mean - it will be a decimal between 0 and 1)

**Your Answers:**

1. Average age: 
2. Maximum fare: 
3. Survival rate (as percentage): 

---

## Submission Checklist

Before submitting, make sure you have:

- [ ] Completed all exercises
- [ ] Run all cells successfully (no errors)
- [ ] Added your name and date at the top
- [ ] Saved the notebook as `dbAppsTask01TrackA.ipynb`
- [ ] Pushed to your `databaseApplications` repository on GitHub
- [ ] Verified the file appears on GitHub

**Great work on your first database applications task!**