# Python Programming Tutorial: From Basics to Functions

Welcome to this comprehensive Python tutorial! This notebook will guide you through the fundamental concepts of Python programming, starting from the very basics and progressing to functions.

## What You'll Learn:
1. Variables and Data Types
2. Basic Operations
3. Input and Output
4. Strings and String Methods
5. Lists and Tuples
6. Dictionaries
7. Control Flow (if statements, loops)
8. Functions

Let's get started!

## 1. Variables and Data Types

In Python, variables are containers that store data values. Python has several built-in data types:
- **int**: Integer numbers (whole numbers)
- **float**: Decimal numbers
- **str**: Text (strings)
- **bool**: True or False values

Let's see how to create variables and check their types:

In [7]:
# Creating variables of different types
name = "Alice"           # String
age = 25                 # Integer
height = 5.6             # Float
is_student = True        # Boolean

# Print the variables and their types
print(f"Name: {name}, Type: {type(name)}")
print(f"Age: {age}, Type: {type(age)}")
print(f"Height: {height}, Type: {type(height)}")
print(f"Is Student: {is_student}, Type: {type(is_student)}")

Name: Alice, Type: <class 'str'>
Age: 25, Type: <class 'int'>
Height: 5.6, Type: <class 'float'>
Is Student: True, Type: <class 'bool'>


In [17]:
print(f"hello world, {name}, {age}")

hello world, Alice, 25


## 2. Basic Operations

Python supports various mathematical and logical operations:
- **Arithmetic**: +, -, *, /, %, ** (power), // (floor division)
- **Comparison**: ==, !=, <, >, <=, >=
- **Logical**: and, or, not

In [19]:
# Arithmetic operations
a = 10
b = 3

print(f"Addition: {a} + {b} = {a + b}")
print(f"Subtraction: {a} - {b} = {a - b}")
print(f"Multiplication: {a} * {b} = {a * b}")
print(f"Division: {a} / {b} = {a / b}")
# print(f"Floor Division: {a} // {b} = {a // b}")
print(f"Modulus: {a} % {b} = {a % b}")
print(f"Power: {a} ** {b} = {a ** b}")

Addition: 10 + 3 = 13
Subtraction: 10 - 3 = 7
Multiplication: 10 * 3 = 30
Division: 10 / 3 = 3.3333333333333335
Modulus: 10 % 3 = 1
Power: 10 ** 3 = 1000


In [20]:
# Comparison and logical operations
x = 5
y = 10

print(f"x == y: {x == y}")
print(f"x < y: {x < y}")
print(f"x > y: {x > y}")
print(f"x != y: {x != y}")

# Logical operations
print(f"(x < y) and (x > 0): {(x < y) and (x > 0)}")
print(f"(x > y) or (x > 0): {(x > y) or (x > 0)}")
print(f"not (x == y): {not (x == y)}")

x == y: False
x < y: True
x > y: False
x != y: True
(x < y) and (x > 0): True
(x > y) or (x > 0): True
not (x == y): True


## 3. Input and Output

- **Output**: Use `print()` to display information
- **Input**: Use `input()` to get user input (always returns a string)

When getting numeric input, you need to convert it using `int()` or `float()`:

In [None]:


user_name = int(input("can you enter your name: "))
print(f"the user name is {user_name} and the type is {type(user_name)}")
print(type(int(user_name)))
user_age = input("Enter your age: ")
print(f"the user age is {user_age} and the type is {type(user_age)}")
user_age = int(user_age)

print(f"Hello, {user_name}! You are {user_age} years old.")
print(f"Next year, you will be {user_age + 1} years old.")


the user name is 12345678 and the type is <class 'str'>
<class 'int'>
the user age is 2567890 and the type is <class 'str'>
Hello, 12345678! You are 2567890 years old.
Next year, you will be 2567891 years old.


## 4. Strings and String Methods

Strings are sequences of characters. Python provides many useful methods to work with strings:

In [1]:
# Creating strings
message = "Hello, Python Programming!"
name = "james bond"

print(f"Original message: {message}")
print(f"Length: {len(message)}")
print(f"Uppercase: {message.upper()}")
print(f"Lowercase: {message.lower()}")
print(f"Capitalized name: {name.title()}")
print(f"Replace 'Python' with 'World': {message.replace('Python', 'World')}")

Original message: Hello, Python Programming!
Length: 26
Uppercase: HELLO, PYTHON PROGRAMMING!
Lowercase: hello, python programming!
Capitalized name: James Bond
Replace 'Python' with 'World': Hello, World Programming!


In [55]:
# String slicing and indexing
text = "Python"

print(f"First character: {text[0]}")
print(f"Last character: {text[-1]}")
print(f"First 3 characters: {text[0:3]}")
# print(f"Last 3 characters: {text[-3:]}")
# print(f"Every second character: {text[::2]}")
print(f"just the last character: {text[-1]}")
print(f"the last 3 letters: {text[-3:]}")

First character: P
Last character: n
First 3 characters: Pyt
just the last character: n
the last 3 letters: hon


In [9]:
# Useful string methods
sentence = "Python is awesome and Python is fun"

print(f"Count 'Python': {sentence.count('Python')}")
print(f"Find 'awesome': {sentence.find('awesome')}")
print(f"Replace 'Python' with 'Programming': {sentence.replace('Python', 'Programming')}")
print(f"Split into words: {sentence.split()}")
print(f"Starts with 'Python': {sentence.startswith('Python')}")
print(f"Ends with 'fun': {sentence.endswith('fun')}")
print(f"Is alphanumeric: {'Python3'.isalnum()}")

Count 'Python': 2
Find 'awesome': 10
Replace 'Python' with 'Programming': Programming is awesome and Programming is fun
Split into words: ['Python', 'is', 'awesome', 'and', 'Python', 'is', 'fun']
Starts with 'Python': True
Ends with 'fun': True
Is alphanumeric: True


In [13]:
name = "Alice"
age = 25
height = 5.6


In [14]:
print(f"{name} is {age} years old and {height} feet tall.")


Alice is 25 years old and 5.6 feet tall.


## 5. Lists and Tuples

### Lists
Lists are ordered, mutable (changeable) collections that can store multiple items:

In [56]:
# Creating and manipulating lists
fruits = ["apple", "banana", "orange", "grape"]
numbers = [1, 2, 3, 4, 5]
mixed = ["hello", 42, 3.14, True]




In [57]:
print(f"Fruits: {fruits}")
print(f"Numbers: {numbers}")
print(f"Mixed: {mixed}")

Fruits: ['apple', 'banana', 'orange', 'grape']
Numbers: [1, 2, 3, 4, 5]
Mixed: ['hello', 42, 3.14, True]


As it is ordered we can access it using index

In [58]:
print(f"First fruit: {fruits[0]}")

First fruit: apple


In [59]:
print(f"Last fruit: {fruits[-1]}")

Last fruit: grape


In [60]:

print(f"Number of fruits: {len(fruits)}")

Number of fruits: 4


In [61]:
type(mixed)

list

In [64]:
type(mixed[3])

bool

In [68]:
# List methods
colors = ["red", "green", "blue"]

In [69]:
# Adding elements
print(colors)
colors.append("yellow")  # Add to end
print(f"After adding: {colors}")

['red', 'green', 'blue']
After adding: ['red', 'green', 'blue', 'yellow']


In [80]:


colors.insert(1, "purple")  # Insert at index 1
print(f"After adding: {colors}")

After adding: ['red', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'green', 'blue', 'yellow']


In [81]:
# Removing elements
colors.remove("green")  # Remove by value
print(f"After removing: {colors}")

After removing: ['red', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'blue', 'yellow']


In [82]:

removed_color = colors.pop()  # Remove and return last element
print(f"After removing: {colors}")
print(f"Removed color: {removed_color}")

After removing: ['red', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'purple', 'blue']
Removed color: yellow


In [85]:

# Other useful methods
numbers = [3, 1, 4, 1, 5, 9, 2, 6]
numbers.sort(reverse=True)  # Sort in place

print(f"Sorted numbers: {numbers}")
print(f"Index of 5: {numbers.index(5)}")

Sorted numbers: [9, 6, 5, 4, 3, 2, 1, 1]
Index of 5: 2


In [86]:
numbers.reverse()  # Reverse the list
print(f"After reversing: {numbers}")

After reversing: [1, 1, 2, 3, 4, 5, 6, 9]


### Tuples
Tuples are ordered, immutable (unchangeable) collections:

In [87]:
# Creating tuples
coordinates = (10, 20)
person_info = ("Alice", 25, "Engineer")

print(f"Coordinates: {coordinates}")
print(f"X coordinate: {coordinates[0]}")
print(f"Y coordinate: {coordinates[1]}")

# Tuple unpacking
name, age, profession = person_info
print(f"Name: {name}, Age: {age}, Profession: {profession}")

Coordinates: (10, 20)
X coordinate: 10
Y coordinate: 20
Name: Alice, Age: 25, Profession: Engineer


## 6. Dictionaries

Dictionaries store data in key-value pairs. They are mutable and unordered (in Python 3.7+, insertion order is preserved):

In [89]:
name = {}
type(name)

dict

In [90]:
# Creating dictionaries
student = {
    "name": "Bob",
    "age": 20,
    "major": "Computer Science",
    "gpa": 3.8
}

print(f"Student info: {student}")
print(f"Student name: {student['name']}")
print(f"Student GPA: {student['gpa']}")

Student info: {'name': 'Bob', 'age': 20, 'major': 'Computer Science', 'gpa': 3.8}
Student name: Bob
Student GPA: 3.8


In [91]:
# Dictionary methods
grades = {"math": 95, "science": 88, "history": 92}
print(grades)

{'math': 95, 'science': 88, 'history': 92}


In [92]:
# Adding/updating entries
grades["english"] = 90  # Add new key-value pair
grades["math"] = 98     # Update existing value

print(f"Updated grades: {grades}")

Updated grades: {'math': 98, 'science': 88, 'history': 92, 'english': 90}


In [93]:
grades.keys()

dict_keys(['math', 'science', 'history', 'english'])

## 7. Control Flow

### If Statements
Use if statements to make decisions in your code:

In [94]:
# Basic if statement
temperature = 75

if temperature > 80:
    print("It's hot outside!")
    print("so im happy")

elif temperature > 60:
    print("It's nice weather.")
else:
    print("It's cold outside.")

# Another example
score = 85

if score >= 90:
    grade = "A"
elif score >= 80:
    grade = "B"
elif score >= 70:
    grade = "C"
elif score >= 60:
    grade = "D"
else:
    grade = "F"

print(f"Score: {score}, Grade: {grade}")

It's nice weather.
Score: 85, Grade: B


## Loops

#### For Loops
Use for loops to iterate over sequences:

In [None]:
# Loop through a list
animals = ["cat", "dog", "bird", "fish"]

print("My favorite animals:")
for animal in animals:
    print(f"- {animal.title()}")

print("\nCounting to 5:")
for i in range(1, 6):  # range(start, stop)
    print(f"Count: {i}")

print("\nEven numbers from 0 to 10:")
for i in range(0, 11, 2):  # range(start, stop, step)
    print(i, end=" ")
print()  # New line

My favorite animals:
- Cat
- Dog
- Bird
- Fish

Counting to 5:
Count: 0
Count: 1
Count: 2
Count: 3
Count: 4
Count: 5

Even numbers from 0 to 10:
0 2 4 6 8 10 


In [4]:
# Loop through dictionary
student_ages = {"Alice": 20, "Bob": 22, "Charlie": 19}

print("Student ages:")
for name, age in student_ages.items():
    print(f"{name} is {age} years old")

# Enumerate for getting index and value
# fruits = ["apple", "banana", "orange"]
# print("\nIndexed fruits:")
for index, (name, age) in enumerate(student_ages.items()):
    print(f"{index}: {name} is {age} years old")

Student ages:
Alice is 20 years old
Bob is 22 years old
Charlie is 19 years old
0: Alice is 20 years old
1: Bob is 22 years old
2: Charlie is 19 years old


#### While Loops
Use while loops to repeat code while a condition is true:

In [5]:
# Basic while loop
count = 1
print("Counting with while loop:")
while count <= 5:
    print(f"Count: {count}")
    count += 1  # Same as count = count + 1

# Finding a number
import random
target = random.randint(1, 10)
guess = 0
attempts = 0

print(f"\nGuessing game! The target is {target}")
while guess != target:
    guess = random.randint(1, 10)
    attempts += 1
    print(f"Attempt {attempts}: Guessed {guess}")

print(f"Found it in {attempts} attempts!")

Counting with while loop:
Count: 1
Count: 2
Count: 3
Count: 4
Count: 5

Guessing game! The target is 6
Attempt 1: Guessed 4
Attempt 2: Guessed 7
Attempt 3: Guessed 10
Attempt 4: Guessed 9
Attempt 5: Guessed 1
Attempt 6: Guessed 10
Attempt 7: Guessed 8
Attempt 8: Guessed 2
Attempt 9: Guessed 10
Attempt 10: Guessed 1
Attempt 11: Guessed 9
Attempt 12: Guessed 8
Attempt 13: Guessed 4
Attempt 14: Guessed 3
Attempt 15: Guessed 10
Attempt 16: Guessed 7
Attempt 17: Guessed 10
Attempt 18: Guessed 4
Attempt 19: Guessed 5
Attempt 20: Guessed 9
Attempt 21: Guessed 3
Attempt 22: Guessed 4
Attempt 23: Guessed 7
Attempt 24: Guessed 6
Found it in 24 attempts!


### More Loop Examples and Techniques

Let's explore more advanced loop concepts and useful patterns:

In [6]:
# Nested loops - loops inside loops
print("Multiplication table (1-5):")
for i in range(1, 6):
    for j in range(1, 6):
        product = i * j
        print(f"{i} × {j} = {product:2d}", end="  ")
    print()  # New line after each row

print("\nPattern printing:")
for i in range(1, 6):
    for j in range(i):
        print("*", end="")
    print()  # New line after each row

Multiplication table (1-5):
1 × 1 =  1  1 × 2 =  2  1 × 3 =  3  1 × 4 =  4  1 × 5 =  5  
2 × 1 =  2  2 × 2 =  4  2 × 3 =  6  2 × 4 =  8  2 × 5 = 10  
3 × 1 =  3  3 × 2 =  6  3 × 3 =  9  3 × 4 = 12  3 × 5 = 15  
4 × 1 =  4  4 × 2 =  8  4 × 3 = 12  4 × 4 = 16  4 × 5 = 20  
5 × 1 =  5  5 × 2 = 10  5 × 3 = 15  5 × 4 = 20  5 × 5 = 25  

Pattern printing:
*
**
***
****
*****


In [7]:
# Loop control: break and continue
print("Using 'break' to exit a loop early:")
for i in range(1, 11):
    if i == 6:
        print(f"Breaking at {i}")
        break
    print(f"Number: {i}")

print("\nUsing 'continue' to skip iterations:")
for i in range(1, 11):
    if i % 2 == 0:  # Skip even numbers
        continue
    print(f"Odd number: {i}")

print("\nFinding the first number divisible by 7:")
for i in range(50, 100):
    if i % 7 == 0:
        print(f"Found: {i}")
        break
else:
    print("No number found")  # This runs if loop completes without break

Using 'break' to exit a loop early:
Number: 1
Number: 2
Number: 3
Number: 4
Number: 5
Breaking at 6

Using 'continue' to skip iterations:
Odd number: 1
Odd number: 3
Odd number: 5
Odd number: 7
Odd number: 9

Finding the first number divisible by 7:
Found: 56


In [8]:
# List comprehensions - a Pythonic way to create lists
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Traditional way with for loop
squares_traditional = []
for num in numbers:
    squares_traditional.append(num ** 2)

# Pythonic way with list comprehension
squares_pythonic = [num ** 2 for num in numbers]

print(f"Traditional: {squares_traditional}")
print(f"Pythonic: {squares_pythonic}")

# List comprehension with condition
even_squares = [num ** 2 for num in numbers if num % 2 == 0]
print(f"Even squares: {even_squares}")

# More examples
fruits = ["apple", "banana", "cherry", "date"]
uppercase_fruits = [fruit.upper() for fruit in fruits]
long_fruits = [fruit for fruit in fruits if len(fruit) > 5]

print(f"Uppercase: {uppercase_fruits}")
print(f"Long fruits: {long_fruits}")

Traditional: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Pythonic: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Even squares: [4, 16, 36, 64, 100]
Uppercase: ['APPLE', 'BANANA', 'CHERRY', 'DATE']
Long fruits: ['banana', 'cherry']


### Common Loop Pitfalls and Mistakes

When learning loops, students often make certain mistakes. Here are some common pitfalls to avoid:

In [None]:
# Pitfall 1: Trying to use C/C++ style loops

# WRONG - This doesn't work in Python!
# for (int i = 0; i < 10; i++) {
#     print(i)
# }

# CORRECT - Python way:
for i in range(10):
    print(i)

print("---")

# If you need the index, use enumerate() or range()
items = ["apple", "banana", "cherry"]

# Method 1: Using enumerate
for index, item in enumerate(items):
    print(f"Index {index}: {item}")

print("---")

# Method 2: Using range and len
for i in range(len(items)):
    print(f"Index {i}: {items[i]}")

print("---")

# But usually, you don't need the index at all:
for item in items:
    print(f"Item: {item}")

0
1
2
3
4
5
6
7
8
9
---
Index 0: apple
Index 1: banana
Index 2: cherry
---
Index 0: apple
Index 1: banana
Index 2: cherry
---
Item: apple
Item: banana
Item: cherry


In [None]:
# Pitfall 2: Trying to loop over dictionary using index

person = {"name": "Alice", "age": 30, "city": "New York"}

# WRONG - Dictionaries don't have numeric indices!
# for i in range(len(person)):
#     print(person[i])  # This will cause an error!

# CORRECT ways to loop over dictionaries:

# Method 1: Loop over keys (default behavior)
print("Keys:")
for key in person:
    print(f"{key}: {person[key]}")

print("\nKeys (explicit):")
for key in person.keys():
    print(f"{key}: {person[key]}")

print("\nValues:")
for value in person.values():
    print(value)

print("\nKey-Value pairs:")
for key, value in person.items():
    print(f"{key}: {value}")

# Note: Dictionary order is preserved in Python 3.7+
# but you shouldn't rely on order for older versions

In [None]:
# Pitfall 3: Modifying a list while iterating over it

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# WRONG - This can skip elements or cause unexpected behavior!
print("Original list:", numbers)
numbers_copy_wrong = numbers.copy()

# This might not work as expected
# for num in numbers_copy_wrong:
#     if num % 2 == 0:
#         numbers_copy_wrong.remove(num)  # Dangerous!

# BETTER approaches:

# Method 1: Iterate over a copy
numbers_copy1 = numbers.copy()
for num in numbers.copy():  # Iterate over copy
    if num % 2 == 0:
        numbers_copy1.remove(num)
print("Method 1 (iterate over copy):", numbers_copy1)

# Method 2: Use list comprehension (most Pythonic)
odd_numbers = [num for num in numbers if num % 2 != 0]
print("Method 2 (list comprehension):", odd_numbers)

# Method 3: Iterate backwards when removing by index
numbers_copy2 = numbers.copy()
for i in range(len(numbers_copy2) - 1, -1, -1):
    if numbers_copy2[i] % 2 == 0:
        del numbers_copy2[i]
print("Method 3 (backwards iteration):", numbers_copy2)

# Method 4: Use filter()
odd_numbers_filter = list(filter(lambda x: x % 2 != 0, numbers))
print("Method 4 (filter):", odd_numbers_filter)

In [None]:
# Pitfall 4: Creating infinite loops

# WRONG - Forgetting to update the loop variable
# count = 0
# while count < 5:
#     print(count)
#     # Forgot to increment count - infinite loop!

# CORRECT - Always update the loop variable
count = 0
while count < 5:
    print(f"Count: {count}")
    count += 1  # Don't forget this!

print("---")

# Another common mistake with while loops
user_input = "no"
# WRONG - condition never changes
# while user_input != "yes":
#     print("Please type 'yes'")
#     # user_input never gets updated - infinite loop!

# CORRECT - update the condition variable
attempts = 0
max_attempts = 3
while user_input != "yes" and attempts < max_attempts:
    print(f"Attempt {attempts + 1}: Please type 'yes' (or we'll stop after {max_attempts} attempts)")
    # In a real program, you'd get input from user
    # user_input = input("Your choice: ")
    # For demo purposes:
    if attempts == 1:
        user_input = "yes"
    attempts += 1

print(f"Final input: {user_input}")
print(f"Total attempts: {attempts}")

# Always have a way to exit your while loops!

In [3]:
# Pitfall 5: Variable scope confusion in loops

# In Python, loop variables persist after the loop ends
for i in range(3):
    print(f"Inside loop: {i}")

print(f"After loop: {i}")  # i still exists and equals 2

print("---")

# This can cause confusion with nested loops
for i in range(3):
    for j in range(2):
        print(f"i={i}, j={j}")

print(f"After nested loops: i={i}, j={j}")  # Both still exist

print("---")

# Be careful with loop variables in list comprehensions
# This creates a common gotcha:
functions = []

# WRONG approach (all functions will use the same 'i')
# for i in range(3):
#     functions.append(lambda: i)  # All will return 2!

# CORRECT approach - capture the variable
functions_correct = []
for i in range(3):
    functions_correct.append(lambda x=i: x)  # Capture current value of i

# Or use list comprehension
functions_comprehension = [lambda x=i: x for i in range(3)]

# Test the functions
print("Correct approach:")
for func in functions_correct:
    print(func())

print("List comprehension approach:")
for func in functions_comprehension:
    print(func())

Inside loop: 0
Inside loop: 1
Inside loop: 2
After loop: 2
---
i=0, j=0
i=0, j=1
i=1, j=0
i=1, j=1
i=2, j=0
i=2, j=1
After nested loops: i=2, j=1
---
Correct approach:
0
1
2
List comprehension approach:
0
1
2


In [6]:
# WRONG WAY (common mistake):
functions_wrong = []
for i in range(3):
    functions_wrong.append(lambda: i)  # All will return 2!
print(functions_wrong)
# Test wrong way:
for func in functions_wrong:
    print(func())  # Prints: 2, 2, 2 (all the same!)

print("---")

# CORRECT WAY (your example):
functions_correct = [lambda x=i: x for i in range(3)]

# Test correct way:
for func in functions_correct:
    print(func())  # Prints: 0, 1, 2 (different values!)

[<function <lambda> at 0x74db882ecc20>, <function <lambda> at 0x74db882edd00>, <function <lambda> at 0x74db882ece00>]
2
2
2
---
0
1
2


In [10]:
nums = [1, 2, 3, 4, 5]
# print(map(lambda x: x**2, nums))
result = map(lambda x: x**2, nums)
print(result)
for value in result:
    print(value)

<map object at 0x74db882fe530>
1
4
9
16
25


## 8. Functions

Functions are reusable blocks of code that perform specific tasks. They help organize code and avoid repetition:

In [None]:
# Basic function definition
def greet():
    """A simple function that prints a greeting."""
    print("Hello, welcome to Python!")

# Call the function
greet()

In [None]:
# Function with parameters
def greet_person(name, age):
    """Greet a person with their name and age."""
    print(f"Hello, {name}! You are {age} years old.")

# Call with arguments
greet_person("Alice", 25)
greet_person("Bob", 30)

In [None]:
# Function with return value
def add_numbers(a, b):
    """Add two numbers and return the result."""
    result = a + b
    return result

def multiply_numbers(x, y):Level 1: Basics with Data Types & Conditionals

# Using functions with return values
sum_result = add_numbers(5, 3)
product_result = multiply_numbers(4, 6)
    
print(f"5 + 3 = {sum_result}")
print(f"4 × 6 = {product_result}")  

In [4]:

def introduce(name, age, city="Unknown"):
    """Introduce a person with optional city parameter."""
    return f"Hi, I'm {name}, {age} years old, from {city}."

# Call with and without optional parameter
print(introduce("Charlie", 28))
print(introduce("Diana", 32, "New York"))

Hi, I'm Charlie, 28 years old, from Unknown.
Hi, I'm Diana, 32 years old, from New York.


In [None]:
# More complex function examples
def calculate_grade(score):
    """Calculate letter grade based on numeric score."""
    if score >= 90:
        return "A"
    elif score >= 80:
        return "B"
    elif score >= 70:
        return "C"
    elif score >= 60:
        return "D"
    else:
        return "F"

def is_even(number):
    """Check if a number is even."""
    return number % 2 == 0

def factorial(n):
    """Calculate factorial of a number."""
    if n <= 1:
        return 1
    else:
        result = 1
        for i in range(2, n + 1):
            result *= i
        return result

# Test the functions
scores = [95, 82, 76, 68, 55]
for score in scores:
    grade = calculate_grade(score)
    print(f"Score {score}: Grade {grade}")

print(f"\nIs 10 even? {is_even(10)}")
print(f"Is 7 even? {is_even(7)}")

print(f"\n5! = {factorial(5)}")
print(f"3! = {factorial(3)}")

## Practice Exercises

Now it's time to practice! Try solving these exercises:

### Exercise 1: Personal Information
Create a function that takes a person's name, age, and favorite color as parameters and returns a formatted string with their information.

In [None]:
# Your solution here
def create_profile(name, age, favorite_color):
    # Write your code here
    pass

# Test your function
# print(create_profile("John", 25, "blue"))

### Exercise 2: List Operations
Write a function that takes a list of numbers and returns a dictionary with the following information:
- The sum of all numbers
- The average of all numbers
- The largest number
- The smallest number

In [None]:
# Your solution here
def analyze_numbers(numbers):
    # Write your code here
    pass

# Test your function
# test_numbers = [10, 5, 8, 15, 3, 12]
# print(analyze_numbers(test_numbers))

### Exercise 3: Word Counter
Create a function that takes a sentence as input and returns a dictionary with each word and how many times it appears in the sentence.

In [None]:
# Your solution here
def count_words(sentence):
    # Write your code here
    pass

# Test your function
# test_sentence = "the quick brown fox jumps over the lazy dog the fox is quick"
# print(count_words(test_sentence))

## 10 Introduction to NumPy

NumPy, short for "Numerical Python," is a fundamental open-source library for scientific computing in Python.

It provides support for large, multi-dimensional arrays and matrices, along with a comprehensive collection of high-level mathematical functions to perform operations on these arrays.

NumPy is primarily built on C and Fortran for its core computational components, which allows it to perform fast numerical operations. It relies heavily on established low-level libraries such as BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra Package) for efficient linear algebra computations.

In [34]:
import numpy as np

# Create NumPy array
a = np.array([1, 2, 3, 4, 5])

print("Array:", a)
print("Type:", type(a))           # <class 'numpy.ndarray'>
print("Data type:", a.dtype)      # int64 (may vary by system)
print("Shape:", a.shape)          # (5,) - 1D array with 5 elements
print("Dimensions:", a.ndim)      # 1 - one dimensional
print("Size:", a.size)            # 5 - total number of elements


Array: [1 2 3 4 5]
Type: <class 'numpy.ndarray'>
Data type: int64
Shape: (5,)
Dimensions: 1
Size: 5


In [35]:
import numpy as np

# Different ways to create arrays
arr1 = np.array([1, 2, 3, 4, 5])                    # From list
arr2 = np.arange(1, 6)                               # Range 1 to 5
arr3 = np.zeros(5)                                   # Array of zeros
arr4 = np.ones(5)                                    # Array of ones
arr5 = np.linspace(0, 10, 5)                        # 5 evenly spaced numbers

print("From list:", arr1)
print("Range:", arr2) 
print("Zeros:", arr3)
print("Ones:", arr4)
print("Linspace:", arr5)

# 2D arrays
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D Array:", matrix)
print("Shape:", matrix.shape)                        # (2, 3)
print("Dimensions:", matrix.ndim)                    # 2

# Array operations
arr = np.array([1, 2, 3, 4, 5])
print("\nOriginal:", arr)
print("Squared:", arr**2)
print("Sum:", np.sum(arr))
print("Mean:", np.mean(arr))
print("Max:", np.max(arr))

From list: [1 2 3 4 5]
Range: [1 2 3 4 5]
Zeros: [0. 0. 0. 0. 0.]
Ones: [1. 1. 1. 1. 1.]
Linspace: [ 0.   2.5  5.   7.5 10. ]

2D Array: [[1 2 3]
 [4 5 6]]
Shape: (2, 3)
Dimensions: 2

Original: [1 2 3 4 5]
Squared: [ 1  4  9 16 25]
Sum: 15
Mean: 3.0
Max: 5


In [36]:

arr6 = np.random.rand(3,3)
print("\nRandom Array:\n", arr6)

arr7 = np.random.randint(1, 10, (5, 5))      # Random integers
print("Random Integers:\n", arr7)

# Random choice from array from normal distribution where (mean=0, std=2, size=3)
arr8 = np.random.normal(0, 2, 3)         # Normal distribution
print("Normal Distribution:\n", arr8)



Random Array:
 [[0.47408517 0.94699847 0.36295418]
 [0.68857765 0.99133117 0.73613164]
 [0.4431635  0.5903959  0.05647806]]
Random Integers:
 [[1 8 3 8 2]
 [2 8 9 1 6]
 [8 1 6 4 5]
 [6 9 2 4 6]
 [6 7 8 2 2]]
Normal Distribution:
 [-1.1880946   2.80461278 -1.93845354]


In [37]:
arr9 = np.eye(4)                           # Identity matrix
print("Identity Matrix:\n", arr9)

arr10 = np.diag([1, 2, 3, 4])               # Diagonal matrix
print("Diagonal Matrix:\n", arr10)

arr11 = np.full((3, 3), 7)            # Fill with value
print("Full Array:\n", arr11)

arr12 = np.transpose(arr7)            # Transpose
print("Original Matrix:\n", arr7)
print("Transposed Matrix:\n", arr12)

arr12 = arr1.reshape(5, 1)                # Reshape to 5x1
print("Reshaped Array:\n", arr12)

print("Inverse of Original Matrix:\n", np.linalg.inv(arr10))

#eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(arr10)
print("Eigenvalues:\n", eigenvalues)
print("Eigenvectors:\n", eigenvectors)


Identity Matrix:
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
Diagonal Matrix:
 [[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]]
Full Array:
 [[7 7 7]
 [7 7 7]
 [7 7 7]]
Original Matrix:
 [[1 8 3 8 2]
 [2 8 9 1 6]
 [8 1 6 4 5]
 [6 9 2 4 6]
 [6 7 8 2 2]]
Transposed Matrix:
 [[1 2 8 6 6]
 [8 8 1 9 7]
 [3 9 6 2 8]
 [8 1 4 4 2]
 [2 6 5 6 2]]
Reshaped Array:
 [[1]
 [2]
 [3]
 [4]
 [5]]
Inverse of Original Matrix:
 [[1.         0.         0.         0.        ]
 [0.         0.5        0.         0.        ]
 [0.         0.         0.33333333 0.        ]
 [0.         0.         0.         0.25      ]]
Eigenvalues:
 [1. 2. 3. 4.]
Eigenvectors:
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [45]:
# ===== ARRAY INDEXING & SLICING =====
print("\n2. ARRAY INDEXING & SLICING")
print("-" * 30)

# 1D indexing
arr = np.array([10, 20, 30, 40, 50])
print(f"Original array: {arr}")
print(f"First element: {arr[0]}")
print(f"Last element: {arr[-1]}")
print(f"Slice [1:4]: {arr[1:4]}")
print(f"Every 2nd element: {arr[::2]}")

# 2D indexing
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f"\n2D Matrix:\n{matrix}")
print(f"Element at [1,2]: {matrix[1, 2]}")
print(f"First row: {matrix[0, :]}")
print(f"Second column: {matrix[:, 1]}")
print(f"Submatrix [0:2, 1:3]:\n{matrix[0:2, 1:3]}")

# Boolean indexing
arr = np.array([1, 5, 3, 8, 2, 7])
print(f"\nBoolean indexing:")
print(f"Original: {arr}")
print(f"Elements > 4: {arr[arr > 4]}")
print(f"Even numbers: {arr[arr % 2 == 0]}")


2. ARRAY INDEXING & SLICING
------------------------------
Original array: [10 20 30 40 50]
First element: 10
Last element: 50
Slice [1:4]: [20 30 40]
Every 2nd element: [10 30 50]

2D Matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Element at [1,2]: 6
First row: [1 2 3]
Second column: [2 5 8]
Submatrix [0:2, 1:3]:
[[2 3]
 [5 6]]

Boolean indexing:
Original: [1 5 3 8 2 7]
Elements > 4: [5 8 7]
Even numbers: [8 2]


In [44]:
# ===== ARRAY OPERATIONS =====
print("\n3. ARRAY OPERATIONS")
print("-" * 30)

a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

# Element-wise operations
print(f"a = {a}")
print(f"b = {b}")
print(f"Addition: {a + b}")
print(f"Subtraction: {a - b}")
print(f"Multiplication: {a * b}")
print(f"Division: {a / b}")
print(f"Power: {a ** 2}")

# Mathematical functions
print(f"\nMath functions:")
print(f"Square root: {np.sqrt(a)}")
print(f"Exponential: {np.exp(a)}")
print(f"Logarithm: {np.log(a)}")
print(f"Sine: {np.sin(a)}")

# Matrix operations
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print(f"\nMatrix operations:")
print(f"Matrix A:\n{A}")
print(f"Matrix B:\n{B}")
print(f"Matrix multiplication:\n{np.dot(A, B)}")
print(f"Element-wise multiplication:\n{A * B}")


3. ARRAY OPERATIONS
------------------------------
a = [1 2 3 4]
b = [5 6 7 8]
Addition: [ 6  8 10 12]
Subtraction: [-4 -4 -4 -4]
Multiplication: [ 5 12 21 32]
Division: [0.2        0.33333333 0.42857143 0.5       ]
Power: [ 1  4  9 16]

Math functions:
Square root: [1.         1.41421356 1.73205081 2.        ]
Exponential: [ 2.71828183  7.3890561  20.08553692 54.59815003]
Logarithm: [0.         0.69314718 1.09861229 1.38629436]
Sine: [ 0.84147098  0.90929743  0.14112001 -0.7568025 ]

Matrix operations:
Matrix A:
[[1 2]
 [3 4]]
Matrix B:
[[5 6]
 [7 8]]
Matrix multiplication:
[[19 22]
 [43 50]]
Element-wise multiplication:
[[ 5 12]
 [21 32]]


In [46]:
# ===== STATISTICAL OPERATIONS =====
print("\n4. STATISTICAL OPERATIONS")
print("-" * 30)

data = np.random.randint(1, 100, 20)
print(f"Random data: {data}")

# Basic statistics
print(f"\nBasic Statistics:")
print(f"Mean: {np.mean(data):.2f}")
print(f"Median: {np.median(data):.2f}")
print(f"Standard deviation: {np.std(data):.2f}")
print(f"Variance: {np.var(data):.2f}")
print(f"Min: {np.min(data)}")
print(f"Max: {np.max(data)}")
print(f"Sum: {np.sum(data)}")

# Percentiles
print(f"\nPercentiles:")
print(f"25th percentile: {np.percentile(data, 25):.2f}")
print(f"50th percentile: {np.percentile(data, 50):.2f}")
print(f"75th percentile: {np.percentile(data, 75):.2f}")

# 2D statistics
matrix = np.random.randint(1, 10, (3, 4))
print(f"\n2D Statistics:")
print(f"Matrix:\n{matrix}")
print(f"Sum along axis 0 (columns): {np.sum(matrix, axis=0)}")
print(f"Sum along axis 1 (rows): {np.sum(matrix, axis=1)}")
print(f"Mean along axis 0: {np.mean(matrix, axis=0)}")


4. STATISTICAL OPERATIONS
------------------------------
Random data: [96 19 29 85  7 97 34 11 53 86 53 74 85 19 36 81 65 31 94 44]

Basic Statistics:
Mean: 54.95
Median: 53.00
Standard deviation: 29.90
Variance: 893.95
Min: 7
Max: 97
Sum: 1099

Percentiles:
25th percentile: 30.50
50th percentile: 53.00
75th percentile: 85.00

2D Statistics:
Matrix:
[[2 9 6 2]
 [6 3 1 8]
 [7 4 8 7]]
Sum along axis 0 (columns): [15 16 15 17]
Sum along axis 1 (rows): [19 18 26]
Mean along axis 0: [5.         5.33333333 5.         5.66666667]


In [47]:
# ===== ARRAY MANIPULATION =====
print("\n5. ARRAY MANIPULATION")
print("-" * 30)

arr = np.arange(12)
print(f"Original array: {arr}")

# Reshaping
print(f"Reshaped to (3,4):\n{arr.reshape(3, 4)}")
print(f"Reshaped to (2,6):\n{arr.reshape(2, 6)}")

# Flattening
matrix = arr.reshape(3, 4)
print(f"Flattened: {matrix.flatten()}")
print(f"Raveled: {matrix.ravel()}")

#differing between flatten and ravel
print("\nDiffering between flatten and ravel:")
print("Original matrix:")
print(matrix)

f = matrix.flatten()
r = matrix.ravel()
f[0] = 100
print(f"the f array {f}")
print(f"Flattened:\n{matrix}")
r[0] = 200
print("the r array", r)
print(f"Raveled:\n{matrix}")

# Concatenation
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(f"\nConcatenation:")
print(f"Horizontal: {np.concatenate([a, b])}")
print(f"Horizontal (hstack): {np.hstack([a, b])}")

# Splitting
arr = np.arange(10)
print(f"\nSplitting {arr}:")
split_arrays = np.split(arr, 5)
print(f"Split into 5 parts: {split_arrays}")


5. ARRAY MANIPULATION
------------------------------
Original array: [ 0  1  2  3  4  5  6  7  8  9 10 11]
Reshaped to (3,4):
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Reshaped to (2,6):
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]
Flattened: [ 0  1  2  3  4  5  6  7  8  9 10 11]
Raveled: [ 0  1  2  3  4  5  6  7  8  9 10 11]

Differing between flatten and ravel:
Original matrix:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
the f array [100   1   2   3   4   5   6   7   8   9  10  11]
Flattened:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
the r array [200   1   2   3   4   5   6   7   8   9  10  11]
Raveled:
[[200   1   2   3]
 [  4   5   6   7]
 [  8   9  10  11]]

Concatenation:
Horizontal: [1 2 3 4 5 6]
Horizontal (hstack): [1 2 3 4 5 6]

Splitting [0 1 2 3 4 5 6 7 8 9]:
Split into 5 parts: [array([0, 1]), array([2, 3]), array([4, 5]), array([6, 7]), array([8, 9])]


In [48]:
# ===== PRACTICAL APPLICATIONS =====
print("\n7. PRACTICAL APPLICATIONS")
print("-" * 30)

# Example 1: Grade calculation
print("Example 1: Student Grade Analysis")
students = ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']
subjects = ['Math', 'Physics', 'Chemistry']
grades = np.random.randint(60, 100, (5, 3))

print(f"Grades matrix (Students x Subjects):\n{grades}")
print(f"Students: {students}")
print(f"Subjects: {subjects}")

# Analysis
avg_per_student = np.mean(grades, axis=1)
avg_per_subject = np.mean(grades, axis=0)

print(f"\nAverage per student: {avg_per_student}")
print(f"Average per subject: {avg_per_subject}")

for i, student in enumerate(students):
    print(f"{student}: {avg_per_student[i]:.1f}")

# Find top performer
top_student_idx = np.argmax(avg_per_student)
print(f"\nTop performer: {students[top_student_idx]} ({avg_per_student[top_student_idx]:.1f})")


7. PRACTICAL APPLICATIONS
------------------------------
Example 1: Student Grade Analysis
Grades matrix (Students x Subjects):
[[91 78 97]
 [96 70 82]
 [83 90 73]
 [64 84 92]
 [98 62 83]]
Students: ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']
Subjects: ['Math', 'Physics', 'Chemistry']

Average per student: [88.66666667 82.66666667 82.         80.         81.        ]
Average per subject: [86.4 76.8 85.4]
Alice: 88.7
Bob: 82.7
Charlie: 82.0
Diana: 80.0
Eve: 81.0

Top performer: Alice (88.7)


## 10. Introduction to Pandas

Pandas is a powerful Python library for data analysis and manipulation. It provides easy-to-use data structures and data analysis tools.

## What is Pandas?
- **Pandas** = Python Data Analysis Library
- Built on top of NumPy
- Provides two main data structures: **Series** (1D) and **DataFrame** (2D)
- Makes data manipulation and analysis much easier than pure Python

In [2]:
# Let's start with importing pandas
import pandas as pd
import numpy as np

print("Pandas version:", pd.__version__)
print("NumPy version:", np.__version__)

Pandas version: 2.3.2
NumPy version: 2.3.2


### Creating Data Structures

Let's see how to create and work with data - first the traditional Python way, then the Pandas way.

In [3]:
# Traditional Python way to store student data
students_python = [
    ["Alice", 85, "Math"],
    ["Bob", 92, "Physics"], 
    ["Charlie", 78, "Chemistry"],
    ["Diana", 90, "Math"]
]

print("Traditional Python List of Lists:")
for student in students_python:
    print(f"Name: {student[0]}, Score: {student[1]}, Subject: {student[2]}")
    
print(f"\nTotal students: {len(students_python)}")
print("Accessing data requires remembering indices - student[0] = name, student[1] = score, etc.")

Traditional Python List of Lists:
Name: Alice, Score: 85, Subject: Math
Name: Bob, Score: 92, Subject: Physics
Name: Charlie, Score: 78, Subject: Chemistry
Name: Diana, Score: 90, Subject: Math

Total students: 4
Accessing data requires remembering indices - student[0] = name, student[1] = score, etc.


In [4]:
# Pandas way - much cleaner and more intuitive!
df_students = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'Score': [85, 92, 78, 90],
    'Subject': ['Math', 'Physics', 'Chemistry', 'Math']
})

print("Pandas DataFrame:")
print(df_students)
print(f"\nShape: {df_students.shape}")
print(" Much cleaner! Column names make data self-explanatory")

Pandas DataFrame:
      Name  Score    Subject
0    Alice     85       Math
1      Bob     92    Physics
2  Charlie     78  Chemistry
3    Diana     90       Math

Shape: (4, 3)
 Much cleaner! Column names make data self-explanatory


### Accessing Data

Let's see how to access specific data from our dataset.

In [5]:
# Python way - accessing all names (error-prone)
names_python = []
for student in students_python:
    names_python.append(student[0])  # Remember: 0 = name index

print("Python way - Get all names:")
print(names_python)

# Getting all scores
scores_python = [student[1] for student in students_python]  # 1 = score index
print(f"All scores: {scores_python}")
print(" Hard to remember which index is which!")

Python way - Get all names:
['Alice', 'Bob', 'Charlie', 'Diana']
All scores: [85, 92, 78, 90]
 Hard to remember which index is which!


In [6]:
# Pandas way - much more intuitive!
print("Pandas way - Get all names:")
print(df_students['Name'].tolist())

print(f"\nAll scores: {df_students['Score'].tolist()}")

# Even better - direct access to columns
print(f"\nFirst student name: {df_students['Name'][0]}")
print(f"Average score: {df_students['Score'].mean():.1f}")
print(" Column names make it self-explanatory and less error-prone!")

Pandas way - Get all names:
['Alice', 'Bob', 'Charlie', 'Diana']

All scores: [85, 92, 78, 90]

First student name: Alice
Average score: 86.2
 Column names make it self-explanatory and less error-prone!


### Filtering Data

Let's see how to find students with specific criteria.

In [7]:
# Python way - find students with score > 85
high_scorers_python = []
for student in students_python:
    if student[1] > 85:  # Remember: 1 = score index
        high_scorers_python.append(student)

print("Python way - Students with score > 85:")
for student in high_scorers_python:
    print(f"{student[0]}: {student[1]}")
    
print(" Requires loops and manual index management")

Python way - Students with score > 85:
Bob: 92
Diana: 90
 Requires loops and manual index management


In [8]:
# Pandas way - elegant and readable!
high_scorers_pandas = df_students[df_students['Score'] > 85]

print("Pandas way - Students with score > 85:")
print(high_scorers_pandas)

# Even get just the names
print(f"\nJust the names: {high_scorers_pandas['Name'].tolist()}")
print(" One line of code! Much more readable and intuitive!")

Pandas way - Students with score > 85:
    Name  Score  Subject
1    Bob     92  Physics
3  Diana     90     Math

Just the names: ['Bob', 'Diana']
 One line of code! Much more readable and intuitive!


### Statistical Operations

Let's calculate some basic statistics on our data.

In [9]:
# Python way - calculate statistics manually
scores_python = [student[1] for student in students_python]

avg_score = sum(scores_python) / len(scores_python)
max_score = max(scores_python)
min_score = min(scores_python)

# Standard deviation calculation (complex!)
import math
variance = sum([(score - avg_score) ** 2 for score in scores_python]) / len(scores_python)
std_dev = math.sqrt(variance)

print("Python way - Statistics:")
print(f"Average: {avg_score:.2f}")
print(f"Maximum: {max_score}")
print(f"Minimum: {min_score}")
print(f"Std Dev: {std_dev:.2f}")
print(" Requires manual calculations and importing math module")

Python way - Statistics:
Average: 86.25
Maximum: 92
Minimum: 78
Std Dev: 5.40
 Requires manual calculations and importing math module


In [10]:
# Pandas way - built-in statistical methods!
print("Pandas way - Statistics:")
print(f"Average: {df_students['Score'].mean():.2f}")
print(f"Maximum: {df_students['Score'].max()}")
print(f"Minimum: {df_students['Score'].min()}")
print(f"Std Dev: {df_students['Score'].std():.2f}")

# Even better - get all statistics at once!
print(f"\nComplete statistical summary:")
print(df_students['Score'].describe())
print(" Built-in methods make statistics effortless!")

Pandas way - Statistics:
Average: 86.25
Maximum: 92
Minimum: 78
Std Dev: 6.24

Complete statistical summary:
count     4.000000
mean     86.250000
std       6.238322
min      78.000000
25%      83.250000
50%      87.500000
75%      90.500000
max      92.000000
Name: Score, dtype: float64
 Built-in methods make statistics effortless!


### Working with Real CSV Data

Now let's work with the BMW car data you provided. This shows the real power of Pandas!

In [11]:
# Python way - reading CSV manually (very complex!)
import csv

cars_python = []
with open('bmw.csv', 'r') as file:
    csv_reader = csv.reader(file)
    headers = next(csv_reader)  # Get column names
    for row in csv_reader:
        cars_python.append(row)

print("Python way - Reading CSV:")
print(f"Headers: {headers}")
print(f"Total cars: {len(cars_python)}")
print("First 3 cars:")
for i in range(3):
    print(f"Car {i+1}: {cars_python[i]}")
print(" Complex file handling, manual parsing, no data types!")

Python way - Reading CSV:
Headers: ['model', 'year', 'price', 'transmission', 'mileage', 'fuelType', 'tax', 'mpg', 'engineSize']
Total cars: 10781
First 3 cars:
Car 1: [' 5 Series', '2014', '11200', 'Automatic', '67068', 'Diesel', '125', '57.6', '2.0']
Car 2: [' 6 Series', '2018', '27000', 'Automatic', '14827', 'Petrol', '145', '42.8', '2.0']
Car 3: [' 5 Series', '2016', '16000', 'Automatic', '62794', 'Diesel', '160', '51.4', '3.0']
 Complex file handling, manual parsing, no data types!


In [12]:
# Pandas way - reading CSV is incredibly simple!
# df_bmw = pd.read_csv('bmw_with_missing_data.csv')
df_bmw = pd.read_csv('bmw.csv')
print("Pandas way - Reading CSV:")
print(f"Shape: {df_bmw.shape}")
print(f"Columns: {list(df_bmw.columns)}")
print("\nFirst 5 cars:")
print(df_bmw.head())
print("\n One line of code! Automatic data type detection!")


Pandas way - Reading CSV:
Shape: (10781, 9)
Columns: ['model', 'year', 'price', 'transmission', 'mileage', 'fuelType', 'tax', 'mpg', 'engineSize']

First 5 cars:
       model  year  price transmission  mileage fuelType  tax   mpg  \
0   5 Series  2014  11200    Automatic    67068   Diesel  125  57.6   
1   6 Series  2018  27000    Automatic    14827   Petrol  145  42.8   
2   5 Series  2016  16000    Automatic    62794   Diesel  160  51.4   
3   1 Series  2017  12750    Automatic    26676   Diesel  145  72.4   
4   7 Series  2014  14500    Automatic    39554   Diesel  160  50.4   

   engineSize  
0         2.0  
1         2.0  
2         3.0  
3         1.5  
4         3.0  

 One line of code! Automatic data type detection!


### Data Exploration

Let's explore our BMW dataset to understand what information we have.

In [13]:
# Quick data exploration with Pandas
print("Dataset Information:")
print(f"Shape: {df_bmw.shape}")
print(f"\nColumn names and types:")
print(df_bmw.dtypes)

print(f"\nBasic info:")
df_bmw.info()

print(f"\nMissing values:")
print(df_bmw.isnull().sum())

Dataset Information:
Shape: (10781, 9)

Column names and types:
model            object
year              int64
price             int64
transmission     object
mileage           int64
fuelType         object
tax               int64
mpg             float64
engineSize      float64
dtype: object

Basic info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10781 entries, 0 to 10780
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   model         10781 non-null  object 
 1   year          10781 non-null  int64  
 2   price         10781 non-null  int64  
 3   transmission  10781 non-null  object 
 4   mileage       10781 non-null  int64  
 5   fuelType      10781 non-null  object 
 6   tax           10781 non-null  int64  
 7   mpg           10781 non-null  float64
 8   engineSize    10781 non-null  float64
dtypes: float64(2), int64(4), object(3)
memory usage: 758.2+ KB

Missing values:
model           0
year         

### Dealing with Missing Data - A Critical Skill!

Real-world data is messy! Let's see how to handle missing values (NaN, empty cells).

In [14]:
# Python way - finding missing values is very complex!
import csv

missing_count_python = {}
total_rows = 0

with open('bmw_with_missing_data.csv', 'r') as file:
    csv_reader = csv.reader(file)
    headers = next(csv_reader)
    
    # Initialize counters for each column
    for header in headers:
        missing_count_python[header] = 0
    
    for row in csv_reader:
        total_rows += 1
        for i, value in enumerate(row):
            if value == '' or value.strip() == '':  # Check for empty values
                missing_count_python[headers[i]] += 1

print("Python way - Missing value analysis:")
for column, count in missing_count_python.items():
    percentage = (count / total_rows) * 100
    print(f"{column}: {count} missing ({percentage:.1f}%)")
    
print(" Complex file processing, manual checking, error-prone!")

Python way - Missing value analysis:
model: 9 missing (8.4%)
year: 20 missing (18.7%)
price: 17 missing (15.9%)
transmission: 14 missing (13.1%)
mileage: 16 missing (15.0%)
fuelType: 14 missing (13.1%)
tax: 7 missing (6.5%)
mpg: 14 missing (13.1%)
engineSize: 5 missing (4.7%)
 Complex file processing, manual checking, error-prone!


In [15]:
# Pandas way - missing data analysis is incredibly simple!
print("Pandas way - Missing value analysis:")
missing_info = df_bmw.isnull().sum()
print("Missing values per column:")
print(missing_info)

print(f"\nTotal missing values: {df_bmw.isnull().sum().sum()}")
print(f"Percentage of complete rows: {(len(df_bmw) - df_bmw.isnull().any(axis=1).sum()) / len(df_bmw) * 100:.1f}%")

# Visualize missing data pattern
print(f"\nMissing data percentage:")
missing_percentage = (df_bmw.isnull().sum() / len(df_bmw) * 100).round(1)
print(missing_percentage[missing_percentage > 0])
print(" Instant analysis with built-in methods!")

Pandas way - Missing value analysis:
Missing values per column:
model           0
year            0
price           0
transmission    0
mileage         0
fuelType        0
tax             0
mpg             0
engineSize      0
dtype: int64

Total missing values: 0
Percentage of complete rows: 100.0%

Missing data percentage:
Series([], dtype: float64)
 Instant analysis with built-in methods!


### Data Cleaning - Handling Missing Values

Let's see different strategies for dealing with missing data.

In [16]:
# Python way - removing rows with missing data is very complex!
clean_cars_python = []

with open('bmw_with_missing_data.csv', 'r') as file:
    csv_reader = csv.reader(file)
    headers = next(csv_reader)
    
    for row in csv_reader:
        # Check if any value is missing
        has_missing = False
        for value in row:
            if value == '' or value.strip() == '':
                has_missing = True
                break
        
        if not has_missing:
            clean_cars_python.append(row)

print("Python way - Remove rows with ANY missing values:")
print(f"Original rows: {total_rows}")
print(f"Clean rows: {len(clean_cars_python)}")
print(f"Removed: {total_rows - len(clean_cars_python)} rows")
print(" Manual iteration, complex logic, slow!")

Python way - Remove rows with ANY missing values:
Original rows: 107
Clean rows: 5
Removed: 102 rows
 Manual iteration, complex logic, slow!


In [17]:
# Pandas way - data cleaning made easy!
print("Pandas way - Multiple cleaning strategies:")

# Strategy 1: Remove rows with ANY missing values
df_clean_strict = df_bmw.dropna()
print(f"1. Drop rows with ANY missing values:")
print(f"   Original: {len(df_bmw)} rows")
print(f"   Clean: {len(df_clean_strict)} rows")
print(f"   Removed: {len(df_bmw) - len(df_clean_strict)} rows")

# Strategy 2: Remove rows only if ALL values are missing
df_clean_lenient = df_bmw.dropna(how='all')
print(f"\n2. Drop rows only if ALL values are missing:")
print(f"   Remaining: {len(df_clean_lenient)} rows")

# Strategy 3: Remove rows with missing values in specific columns
df_clean_subset = df_bmw.dropna(subset=['price', 'year'])
print(f"\n3. Drop rows with missing 'price' or 'year':")
print(f"   Remaining: {len(df_clean_subset)} rows")

# Strategy 4: Remove rows with less than a threshold of non-missing values
df_clean_threshold = df_bmw.dropna(thresh=7)  # Keep rows with at least 7 non-missing values
print(f"\n4. Drop rows with less than 7 non-missing values:")
print(f"   Remaining: {len(df_clean_threshold)} rows")

print("\n Multiple strategies, one line each, super fast!")

Pandas way - Multiple cleaning strategies:
1. Drop rows with ANY missing values:
   Original: 10781 rows
   Clean: 10781 rows
   Removed: 0 rows

2. Drop rows only if ALL values are missing:
   Remaining: 10781 rows

3. Drop rows with missing 'price' or 'year':
   Remaining: 10781 rows

4. Drop rows with less than 7 non-missing values:
   Remaining: 10781 rows

 Multiple strategies, one line each, super fast!


### Filling Missing Values

Sometimes removing data isn't the best option. Let's see how to fill missing values.

In [None]:
# Python way - filling missing values manually 
filled_cars_python = []

# First, calculate mean price for filling
price_values = []
for row in clean_cars_python:
    try:
        if row[2]:  # Assuming price is column 2
            price_values.append(float(row[2]))
    except (ValueError, IndexError):
        continue

mean_price = sum(price_values) / len(price_values) if price_values else 0

# Now fill missing values
with open('bmw_with_missing_data.csv', 'r') as file:
    csv_reader = csv.reader(file)
    headers = next(csv_reader)
    
    for row in csv_reader:
        filled_row = []
        for i, value in enumerate(row):
            if value == '' or value.strip() == '':
                if headers[i] == 'price':
                    filled_row.append(str(mean_price))
                elif headers[i] == 'model':
                    filled_row.append('Unknown')
                else:
                    filled_row.append('0')  # Default fill
            else:
                filled_row.append(value)
        filled_cars_python.append(filled_row)

print("Python way - Fill missing values:")
print(f"Mean price used for filling: {mean_price:.0f}")
print(" Complex calculations, manual logic, lots of code!")

Python way - Fill missing values:
Mean price used for filling: 19312
 Complex calculations, manual logic, lots of code!


In [31]:
# Pandas way - multiple filling strategies made simple!
df_filled = df_bmw.copy()

print("Pandas way - Smart filling strategies:")

# Strategy 1: Fill with specific values
df_filled['model'] = df_filled['model'].fillna('Unknown Model')
print("1. Fill missing models with 'Unknown Model'")

# Strategy 2: Fill with mean (for numeric columns)
# For Int64 columns, we need to round the mean to integer
if 'price' in df_filled.columns:
    mean_price = df_filled['price'].mean()
    df_filled['price'] = df_filled['price'].fillna(int(round(mean_price)))
    print(f"2. Fill missing prices with mean: {mean_price:.0f}")

# Strategy 3: Forward fill (use previous value) 
if 'year' in df_filled.columns:
    df_filled['year'] = df_filled['year'].ffill()  
    print("3. Forward fill missing years")

# Strategy 4: Fill with mode (most common value)
if 'transmission' in df_filled.columns:
    mode_transmission = df_filled['transmission'].mode()[0]
    df_filled['transmission'] = df_filled['transmission'].fillna(mode_transmission)
    print(f"4. Fill missing transmission with mode: {mode_transmission}")

print(f"\nMissing values after filling:")
print(df_filled.isnull().sum().sum())
print(" Multiple intelligent filling methods, automatic calculations!")

Pandas way - Smart filling strategies:
1. Fill missing models with 'Unknown Model'
2. Fill missing prices with mean: 22733
3. Forward fill missing years
4. Fill missing transmission with mode: Semi-Auto

Missing values after filling:
0
 Multiple intelligent filling methods, automatic calculations!


### Data Quality Issues

Real data often has quality problems beyond just missing values.

In [20]:
# Let's identify data quality issues in our dataset
print("Data Quality Analysis:")

# Check for duplicates
print(f"1. DUPLICATES:")
duplicates = df_bmw.duplicated()
print(f"   Duplicate rows: {duplicates.sum()}")
if duplicates.sum() > 0:
    print("   Duplicate entries found:")
    print(df_bmw[duplicates])

# Check for invalid data types
print(f"\n2. DATA TYPE ISSUES:")
print("   Current data types:")
print(df_bmw.dtypes)

# Check for outliers/impossible values
print(f"\n3. IMPOSSIBLE VALUES:")
if 'year' in df_bmw.columns:
    invalid_years = df_bmw[(df_bmw['year'].astype(int) < 1900) | (df_bmw['year'].astype(int) > 2025)]
    print(f"   Invalid years: {len(invalid_years)}")
    
if 'price' in df_bmw.columns:
    negative_prices = df_bmw[df_bmw['price'] < 0]
    print(f"   Negative prices: {len(negative_prices)}")

# Check for inconsistent categories
print(f"\n4. INCONSISTENT CATEGORIES:")
if 'model' in df_bmw.columns:
    unique_models = df_bmw['model'].value_counts()
    print(f"   Unique models: {len(unique_models)}")
    print(f"   Suspicious entries:")
    suspicious = unique_models[unique_models == 1].head(5)  # Models appearing only once
    print(suspicious)

Data Quality Analysis:
1. DUPLICATES:
   Duplicate rows: 117
   Duplicate entries found:
          model  year  price transmission  mileage fuelType  tax    mpg  \
174          X4  2019  33998    Semi-Auto     7272   Diesel  150   42.8   
393          X1  2018  16995    Semi-Auto    17276   Petrol  150   46.3   
709    2 Series  2014  11999       Manual    31289   Diesel   30   62.8   
957    1 Series  2019  21898       Manual     4100   Petrol  150   41.5   
1173   1 Series  2017  20995    Semi-Auto    31544   Petrol  145   39.8   
...         ...   ...    ...          ...      ...      ...  ...    ...   
7808   5 Series  2019  31550    Automatic     1550   Hybrid  140  156.9   
9096         M4  2020  45488    Automatic       10   Petrol  150   34.0   
9797   4 Series  2019  25449    Automatic     6890   Diesel  145   65.7   
9940         M3  2009  16950       Manual    65000   Petrol  580   21.9   
9943   3 Series  2013  10985    Automatic    70000   Diesel  165   50.4   

      engi

In [32]:
#remove duplicates based on a certain column
print(df_bmw.shape)
print(df_bmw['mileage'].duplicated().sum())
print(df_bmw.duplicated().sum())
df_cleaned = df_bmw.drop_duplicates(subset = ['mileage'])
print(df_cleaned.duplicated().sum())
print(df_cleaned.shape)

(10781, 9)
2695
117
0
(8086, 9)


In [22]:
# Pandas makes data cleaning incredibly easy!
print("Data Cleaning with Pandas:")

# 1. Remove duplicates
df_clean = df_filled.drop_duplicates().copy()
print(f"1. Removed duplicates: {len(df_filled) - len(df_clean)} rows")

# 2. Fix data types
print(f"\n2. Fix data types:")
print(f"The current data types are:")
print(df_clean.dtypes)
# Convert numeric columns that might be stored as text
numeric_columns = ['year', 'price', 'mileage', 'tax', 'mpg', 'engineSize']
for col in numeric_columns:
    if col in df_clean.columns:
        df_clean[col] = pd.to_numeric(df_clean[col])
        print(f"Converted {type(df_clean[col])} to numeric")

# 3. Remove impossible values
print(f"\n3. Remove impossible values:")
original_size = len(df_clean)
if 'year' in df_clean.columns:
    df_clean = df_clean[(df_clean['year'] >= 1990) & (df_clean['year'] <= 2025)]
if 'price' in df_clean.columns:
    df_clean = df_clean[df_clean['price'] > 0]
print(f"   Removed {original_size - len(df_clean)} impossible values")

# 4. Standardize text data
print(f"\n4. Standardize text data:")
text_columns = ['model', 'transmission', 'fuelType']
for col in text_columns:
    if col in df_clean.columns:
        df_clean[col] = df_clean[col].str.strip()  # Remove whitespace
        print(f"   Cleaned {col}")

print(f"\nFinal clean dataset: {len(df_clean)} rows")
print(" Comprehensive cleaning in just a few lines!")

Data Cleaning with Pandas:
1. Removed duplicates: 117 rows

2. Fix data types:
The current data types are:
model            object
year              int64
price             int64
transmission     object
mileage           int64
fuelType         object
tax               int64
mpg             float64
engineSize      float64
dtype: object
Converted <class 'pandas.core.series.Series'> to numeric
Converted <class 'pandas.core.series.Series'> to numeric
Converted <class 'pandas.core.series.Series'> to numeric
Converted <class 'pandas.core.series.Series'> to numeric
Converted <class 'pandas.core.series.Series'> to numeric
Converted <class 'pandas.core.series.Series'> to numeric

3. Remove impossible values:
   Removed 0 impossible values

4. Standardize text data:
   Cleaned model
   Cleaned transmission
   Cleaned fuelType

Final clean dataset: 10664 rows
 Comprehensive cleaning in just a few lines!


### Complex Data Analysis - Finding Patterns

Let's do some complex analysis that would be very difficult with pure Python.

In [23]:
# Let's find cars with specific criteria
# Example: BMW cars from 2015+ with price < 20000 and mileage < 50000

# First let's see what columns we have
print("Available columns:")
print(list(df_bmw.columns))

# Let's look at the first few rows to understand the data better
print(f"\nSample data:")
print(df_bmw.head(3))

Available columns:
['model', 'year', 'price', 'transmission', 'mileage', 'fuelType', 'tax', 'mpg', 'engineSize']

Sample data:
       model  year  price transmission  mileage fuelType  tax   mpg  \
0   5 Series  2014  11200    Automatic    67068   Diesel  125  57.6   
1   6 Series  2018  27000    Automatic    14827   Petrol  145  42.8   
2   5 Series  2016  16000    Automatic    62794   Diesel  160  51.4   

   engineSize  
0         2.0  
1         2.0  
2         3.0  


In [24]:
# Python way - complex filtering would require multiple nested loops
# Let's say we want cars with price < 20000 and year >= 2015

filtered_cars_python = []
price_col = None
year_col = None

# First find which columns contain price and year data
with open('bmw.csv', 'r') as file:
    headers = file.readline().strip().split(',')
    for i, header in enumerate(headers):
        if 'price' in header.lower():
            price_col = i
        if 'year' in header.lower():
            year_col = i

# Now filter (assuming we found the columns)
for car in cars_python[:10]:  # Just first 10 for demo
    try:
        if price_col and year_col:
            if float(car[price_col]) < 20000 and int(car[year_col]) >= 2015:
                filtered_cars_python.append(car)
    except (ValueError, IndexError):
        continue  # Skip rows with bad data

print(f"Python way - Found {len(filtered_cars_python)} cars (sample only)")
print(" Complex, error-prone, requires exception handling!")

Python way - Found 7 cars (sample only)
 Complex, error-prone, requires exception handling!


In [None]:
# Pandas way - elegant and powerful filtering!
print("Pandas way - Complex filtering:")

# Check if we have price and year columns (adjust based on actual data)
if 'price' in df_bmw.columns and 'year' in df_bmw.columns:
    filtered_cars_pandas = df_bmw[(df_bmw['price'].astype(int) < 20000) & (df_bmw['year'].astype(int) >= 2015)]
    print(f"Found {len(filtered_cars_pandas)} cars")
    print(filtered_cars_pandas.head())
else:
    # Show filtering with whatever columns exist
    print("Available numeric columns for demonstration:")
    numeric_cols = df_bmw.select_dtypes(include=[np.number]).columns
    print(list(numeric_cols))
    
    if len(numeric_cols) > 0:
        col = numeric_cols[0]
        filtered_demo = df_bmw[df_bmw[col] > df_bmw[col].median()]
        print(f"\nExample: Cars where {col} > median:")
        print(f"Found {len(filtered_demo)} cars")

print(" Clean, readable, handles data types automatically!")

Pandas way - Complex filtering:
Found 4145 cars
       model  year  price transmission  mileage fuelType  tax   mpg  \
2   5 Series  2016  16000    Automatic    62794   Diesel  160  51.4   
3   1 Series  2017  12750    Automatic    26676   Diesel  145  72.4   
5   5 Series  2016  14900    Automatic    35309   Diesel  125  60.1   
6   5 Series  2017  16000    Automatic    38538   Diesel  125  60.1   
7   2 Series  2018  16250       Manual    10401   Petrol  145  52.3   

   engineSize  
2         3.0  
3         1.5  
5         2.0  
6         2.0  
7         1.5  
 Clean, readable, handles data types automatically!


### Grouping and Aggregation

Let's group our data by categories and calculate statistics.

In [26]:
# Python way - grouping by categories is extremely complex!
# Let's try to group our students by subject and get average scores

subject_groups_python = {}
for student in students_python:
    subject = student[2]  # Remember: 2 = subject index
    score = int(student[1])  # 1 = score index
    
    if subject not in subject_groups_python:
        subject_groups_python[subject] = []
    subject_groups_python[subject].append(score)

# Calculate averages
print("Python way - Grouping by subject:")
for subject, scores in subject_groups_python.items():
    avg_score = sum(scores) / len(scores)
    print(f"{subject}: {avg_score:.1f}")
    
print(" Requires manual dictionary management and calculations")

Python way - Grouping by subject:
Math: 87.5
Physics: 92.0
Chemistry: 78.0
 Requires manual dictionary management and calculations


In [27]:
# Pandas way - grouping is incredibly simple!
print("Pandas way - Grouping by subject:")
subject_groups_pandas = df_students.groupby('Subject')['Score'].mean()
print(subject_groups_pandas)

# Multiple statistics at once!
print(f"\nComplete statistics by subject:")
print(df_students.groupby('Subject')['Score'].agg(['mean', 'max', 'min', 'count']))

# For BMW data - let's group by available categorical column
if len(df_bmw.select_dtypes(include=['object']).columns) > 0:
    cat_col = df_bmw.select_dtypes(include=['object']).columns[0]
    numeric_col = df_bmw.select_dtypes(include=[np.number]).columns[0] if len(df_bmw.select_dtypes(include=[np.number]).columns) > 0 else None
    
    if numeric_col:
        print(f"\nBMW data - Grouping by {cat_col}:")
        print(df_bmw.groupby(cat_col)[numeric_col].mean().head())

print(" One line of code for powerful grouping and aggregation!")

Pandas way - Grouping by subject:
Subject
Chemistry    78.0
Math         87.5
Physics      92.0
Name: Score, dtype: float64

Complete statistics by subject:
           mean  max  min  count
Subject                         
Chemistry  78.0   78   78      1
Math       87.5   90   85      2
Physics    92.0   92   92      1

BMW data - Grouping by model:
model
1 Series    2016.608939
2 Series    2017.799837
3 Series    2016.680311
4 Series    2017.483417
5 Series    2016.884470
Name: year, dtype: float64
 One line of code for powerful grouping and aggregation!


### Data Cleaning and Transformation

Pandas makes data cleaning and transformation much easier.

In [28]:
# Adding new calculated columns
# Let's add a "Grade Category" based on scores

# Python way - adding calculated columns
students_with_grade_python = []
for student in students_python:
    name, score, subject = student[0], int(student[1]), student[2]
    
    if score >= 90:
        grade_category = "Excellent"
    elif score >= 80:
        grade_category = "Good"
    else:
        grade_category = "Average"
        
    students_with_grade_python.append([name, score, subject, grade_category])

print("Python way - Adding calculated column:")
for student in students_with_grade_python[:3]:
    print(f"{student[0]}: {student[3]}")
print(" Requires creating new data structure")

Python way - Adding calculated column:
Alice: Good
Bob: Excellent
Charlie: Average
 Requires creating new data structure


In [29]:
# Pandas way - adding calculated columns is super easy!
df_students_copy = df_students.copy()

# Method 1: Using conditions
df_students_copy['Grade_Category'] = df_students_copy['Score'].apply(
    lambda x: 'Excellent' if x >= 90 else 'Good' if x >= 80 else 'Average'
)

# Method 2: Using numpy.where for even cleaner code
df_students_copy['Performance'] = np.where(
    df_students_copy['Score'] >= 85, 'High', 'Normal'
)

print("Pandas way - Adding calculated columns:")
print(df_students_copy)
print(" Direct column assignment, multiple methods available!")

Pandas way - Adding calculated columns:
      Name  Score    Subject Grade_Category Performance
0    Alice     85       Math           Good        High
1      Bob     92    Physics      Excellent        High
2  Charlie     78  Chemistry        Average      Normal
3    Diana     90       Math      Excellent        High
 Direct column assignment, multiple methods available!


### Summary: Why Pandas is Better

Let's summarize the key advantages we've seen.

### Practice Exercises - Try These!

Now practice what you've learned with these exercises using both approaches.

Try both Python and Pandas approaches!\

EXERCISE 1: Data Cleaning Challenge
Using your BMW dataset, try to:
1. Find how many missing values are in each column
2. Remove all rows with missing prices
3. Fill missing years with the most common year
4. Find and remove duplicate entries
5. Calculate average price after cleaning


EXERCISE 2: Data Analysis Challenge
After cleaning the data, try to:
1. Find the most expensive car
2. Group cars by fuel type and get average price
3. Find cars from 2015+ with mileage < 50000





### Key Takeaways:
- **Pandas simplifies complex data operations**
- **One line of Pandas code often replaces dozens of Python lines**
- **Built-in methods for statistics, filtering, grouping, and more**
- **Automatic data type handling and optimization**
- **Self-documenting code with column names**

### Next Steps:
- Practice with your own CSV files
- Learn about data visualization with Pandas + Matplotlib
- Explore advanced Pandas features like merging, pivoting, and time series
- Try real-world datasets to solidify your understanding
