# Python Basics for Data Science

This notebook covers all essential Python basics required for data science, including definitions, code examples, and real-life uses.

## 1. Variables and Data Types
Variables store data. Python supports types like int, float, str, and bool.

**What's next:** You'll see how to declare variables of different types and check their types.

**Real-life use:** Storing user age, product price, or a name.

In [1]:
# Declaring variables of different types
age = 30  # int
price = 19.99  # float
name = 'Alice'  # str
is_active = True  # bool

# Checking types
print(type(age), type(price), type(name), type(is_active))  
# Output: <class 'int'> <class 'float'> <class 'str'> <class 'bool'>

<class 'int'> <class 'float'> <class 'str'> <class 'bool'>


## 2. Basic Operators
Operators perform operations on variables and values.

**What's next:** You'll see examples of arithmetic and exponentiation.

**Real-life use:** Calculating total price, discounts, or averages.

In [2]:
# Arithmetic operations
a = 10
b = 3
print('Addition:', a + b)       # Output: Addition: 13
print('Division:', a / b)      # Output: Division: 3.3333333333333335
print('Exponent:', a ** b)     # Output: Exponent: 1000

Addition: 13
Division: 3.3333333333333335
Exponent: 1000
 13
Division: 3.3333333333333335
Exponent: 1000


## 3. Control Flow: if, elif, else
Control flow lets you make decisions in code.

**What's next:** You'll see how to use if-else to branch logic.

**Real-life use:** Checking if a user is old enough to access a service.

In [3]:
# Using if-else for decision making
age = 18
if age >= 18:
    print('Adult')  # This will be executed since age is 18
else:
    print('Minor')  # This will not be executed
    
# Output: Adult

Adult


## 4. Loops: for and while
Loops repeat actions. `for` is used for iterating, `while` for repeating until a condition is met.

**What's next:** You'll see both for and while loops in action.

**Real-life use:** Processing all rows in a dataset.

In [4]:
# For loop example
for i in range(3):
    print('Iteration', i)
# Output:
# Iteration 0
# Iteration 1
# Iteration 2

Iteration 0
Iteration 1
Iteration 2


In [5]:
# While loop example
count = 0
while count < 3:
    print('Count:', count)
    count += 1
# Output:
# Count: 0
# Count: 1
# Count: 2

Count: 0
Count: 1
Count: 2
 0
Count: 1
Count: 2


## 5. Functions and Lambda Expressions
Functions organize code into reusable blocks. Lambda creates small anonymous functions.

**What's next:** You'll see how to define and use functions and lambdas.

**Real-life use:** Data cleaning, feature engineering.

In [6]:
# Defining and using a function
def square(x):
    return x * x

print(square(5))  # Output: 25

# Lambda (anonymous) function
square_lambda = lambda x: x * x
print(square_lambda(6))  # Output: 36

25
36


## 6. List, Tuple, and Dictionary Operations
Lists, tuples, and dictionaries store collections of data.

**What's next:** You'll see how to create and access these data structures.

**Real-life use:** Storing columns of data, mapping names to values.

In [7]:
# Creating and accessing lists, tuples, and dictionaries
my_list = [1, 2, 3]          # A mutable ordered collection
my_tuple = (4, 5, 6)        # An immutable ordered collection
my_dict = {'a': 1, 'b': 2}  # A key-value mapping
print(my_list[0], my_tuple[1], my_dict['a'])  # Output: 1 5 1

1 5 1


## 7. String Manipulation
Strings are text data. You can slice, format, and search them.

**What's next:** You'll see how to change case and replace text in strings.

**Real-life use:** Parsing CSV files, cleaning text data.

In [8]:
# String operations
text = 'Data Science'
print(text.lower())           # Output: data science
print(text.replace(' ', '_'))  # Output: Data_Science

data science
Data_Science


## 8. List Comprehensions
List comprehensions create lists in a concise way.

**What's next:** You'll see how to use list comprehensions for transformations.

**Real-life use:** Creating new columns in dataframes.

In [9]:
# List comprehension example
squares = [x**2 for x in range(5)]
print(squares)

[0, 1, 4, 9, 16]


## 9. Importing Libraries
Libraries add extra functionality. Use `import` to include them.

**What's next:** You'll see how to import and use standard libraries.

**Real-life use:** Importing pandas, numpy, matplotlib, etc.

In [10]:
# Importing libraries
import math
import random
print(math.sqrt(16))
print(random.randint(1, 10))

4.0
9


## 10. File Handling
Read and write files using Python's built-in functions.

**What's next:** You'll see how to write to and read from files.

**Real-life use:** Loading datasets, saving results.

In [11]:
# Writing and reading a file
with open('example.txt', 'w') as f:
    f.write('Hello, file!')
with open('example.txt', 'r') as f:
    print(f.read())

Hello, file!


## 11. Exception Handling
Handle errors gracefully using try-except blocks.

**What's next:** You'll see how to catch and handle exceptions.

**Real-life use:** Handling missing files or invalid user input.

In [12]:
# Exception handling example
try:
    result = 10 / 0
except ZeroDivisionError:
    print('Cannot divide by zero!')

Cannot divide by zero!



## 12. Working with Dates and Times
Use the `datetime` module to work with dates and times.

**What's next:** You'll see how to get the current date/time and format dates.

**Real-life use:** Timestamping data, calculating durations.

In [13]:
# Working with dates and times
from datetime import datetime, timedelta
now = datetime.now()
print('Current date and time:', now)

# Date operations
one_week_later = now + timedelta(days=7)
print('One week from now:', one_week_later)

# Format dates
formatted_date = now.strftime('%Y-%m-%d %H:%M:%S')
print('Formatted date:', formatted_date)

Current date and time: 2025-05-16 23:15:40.158103
One week from now: 2025-05-23 23:15:40.158103
Formatted date: 2025-05-16 23:15:40


## 13. Sets and Their Operations
Sets are collections of unique elements with mathematical operations.

**What's next:** You'll learn how to create sets, add/remove elements, and perform set operations.

**Real-life use:** Finding unique values, comparing groups, efficient membership testing.

In [14]:
# Creating sets
fruits = {'apple', 'banana', 'cherry'}
citrus = {'orange', 'lemon', 'lime', 'cherry'}

In [15]:
# Add and remove elements from a set
fruits.add('mango')
fruits.remove('banana')
print('Updated fruits:', fruits)

Updated fruits: {'cherry', 'apple', 'mango'}


In [16]:
# Set operations: union, intersection, difference, symmetric difference
print('\nUnion:', fruits | citrus)  # All elements from both sets
print('Intersection:', fruits & citrus)  # Common elements
print('Difference (fruits - citrus):', fruits - citrus)  # In fruits but not in citrus
print('Symmetric difference:', fruits ^ citrus)  # In either set but not both


Union: {'orange', 'lemon', 'cherry', 'lime', 'apple', 'mango'}
Intersection: {'cherry'}
Difference (fruits - citrus): {'mango', 'apple'}
Symmetric difference: {'lemon', 'mango', 'lime', 'apple', 'orange'}


In [17]:
# Membership testing in sets
print('\nIs apple in fruits?', 'apple' in fruits)


Is apple in fruits? True


## 14. Advanced Dictionary Techniques
Dictionaries can be used in sophisticated ways for data manipulation.

**What's next:** You'll see dictionary comprehensions, default dictionaries, nested dictionaries, and merging.

**Real-life use:** Feature extraction, counting occurrences, grouping related data.

In [18]:
# Dictionary comprehensions
squares_dict = {x: x**2 for x in range(1, 6)}
print('Square dictionary:', squares_dict)

Square dictionary: {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}


In [19]:
# Default dictionaries for automatic default values
from collections import defaultdict
word_count = defaultdict(int)  # Default value is 0 for int
text = "data science is the science of data"
for word in text.split():
    word_count[word] += 1
print('\nWord count:', dict(word_count))


Word count: {'data': 2, 'science': 2, 'is': 1, 'the': 1, 'of': 1}


In [20]:
# Nested dictionaries for structured data
employee_data = {
    'Alice': {'department': 'Data Science', 'salary': 85000},
    'Bob': {'department': 'Engineering', 'salary': 92000}
}
print(f"\nAlice's department: {employee_data['Alice']['department']}")


Alice's department: Data Science


In [21]:
# Dictionary merging (Python 3.9+)
dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}
merged = dict1 | dict2  # dict2 values override dict1 where keys overlap
print('\nMerged dictionary:', merged)


Merged dictionary: {'a': 1, 'b': 3, 'c': 4}


## 15. Generators and Iterators
Generators produce values on-demand, saving memory for large datasets.

**What's next:** You'll learn to create generator functions, use generator expressions, and see real-life memory efficiency.

**Real-life use:** Processing large files, creating data pipelines.

In [22]:
# Generator function for Fibonacci numbers
def fibonacci(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

In [23]:
# Using a generator to get the first 10 Fibonacci numbers
fib_10 = list(fibonacci(10))
print('First 10 Fibonacci numbers:', fib_10)

First 10 Fibonacci numbers: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]


In [24]:
# Generator expression for even squares
even_squares = (x**2 for x in range(10) if x % 2 == 0)
print('\nEven squares:',)
for square in even_squares:
    print(square, end=' ')


Even squares:
0 4 16 36 64 
0 4 16 36 64 

In [25]:
# Memory efficiency example: reading a large file line by line
def read_large_file(file_path):
    """A generator that yields lines from a file"""
    with open(file_path, 'r') as file:
        for line in file:
            yield line.strip()

# Example usage (commented out):
# for line in read_large_file('huge_dataset.csv'):
#     process(line)  # Process one line at a time

## 16. Regular Expressions
Regex patterns help search, replace, and validate text.

**What's next:** You'll see how to extract, replace, and validate text using regex.

**Real-life use:** Data cleaning, extraction, validation.

In [26]:
import re

In [27]:
# Finding patterns (extracting emails from text)
text = "Contact us at info@example.com or support@company.org"
emails = re.findall(r'[\w.+-]+@[\w-]+\.[\w.-]+', text)
print('Extracted emails:', emails)

Extracted emails: ['info@example.com', 'support@company.org']


In [28]:
# Replacing patterns (censoring emails)
censored = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-ZaZ0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL REDACTED]', text)
print('\nCensored text:', censored)


Censored text: Contact us at info@example.com or support@company.org


In [29]:
# Validation (checking if phone numbers are valid)
def is_valid_phone(phone):
    pattern = r'^(\+\d{1,3})?[-\s]?\(?\d{3}\)?[-\s]?\d{3}[-\s]?\d{4}$'
    return bool(re.match(pattern, phone))

phones = ['+1-555-123-4567', '123-456-7890', '5551234567', '12-34-567']
for phone in phones:
    print(f"'{phone}' is valid: {is_valid_phone(phone)}")

'+1-555-123-4567' is valid: True
'123-456-7890' is valid: True
'5551234567' is valid: True
'12-34-567' is valid: False


## 17. Map, Filter, and Reduce Functions
Functional programming tools for data transformation.

**What's next:** You'll see how to use map, filter, and reduce for batch processing and pipelines.

**Real-life use:** Batch processing, data transformation pipelines.

In [30]:
from functools import reduce

In [31]:
# Map: apply function to each item
numbers = [1, 2, 3, 4, 5]
squares = list(map(lambda x: x**2, numbers))
print('Squared numbers:', squares)

Squared numbers: [1, 4, 9, 16, 25]


In [32]:
# Filter: keep items that match condition
even_numbers = list(filter(lambda x: x % 2 == 0, numbers))
print('Even numbers:', even_numbers)

Even numbers: [2, 4]


In [33]:
# Reduce: accumulate values
sum_result = reduce(lambda x, y: x + y, numbers)
print('Sum of numbers:', sum_result)

Sum of numbers: 15
 15


In [34]:
# Real-world example: data processing pipeline
data = ['  alice  ', 'BOB', 'Charlie  ', '  DAVID']

# Clean, filter and transform data in one pipeline
processed = list(map(
    lambda x: x.capitalize(),
    filter(
        lambda x: len(x.strip()) > 3,
        map(lambda x: x.strip(), data)
    )
))
print('\nProcessed names:', processed)


Processed names: ['Alice', 'Charlie', 'David']


## 18. Context Managers with `with`
Context managers ensure proper resource handling.

**What's next:** You'll see how to use context managers for file operations and timing code.

**Real-life use:** File operations, database connections, API requests.

In [35]:
# File handling with context manager
with open('example.txt', 'w') as file:
    file.write('This file will be closed automatically\n')
    file.write('Even if an error occurs')
print('File is closed after with block:', file.closed)

File is closed after with block: True


In [36]:
# Custom context manager for timing code execution
from contextlib import contextmanager
import time

@contextmanager
def timer():
    """Measure execution time of a code block"""
    start = time.time()
    try:
        yield  # This is where the with-block's code executes
    finally:
        end = time.time()
        print(f"Execution took {end - start:.5f} seconds")

# Usage
with timer():
    # Simulate work
    sum(i**2 for i in range(1000000))

Execution took 0.66362 seconds


## 19. Object-Oriented Programming
Classes allow you to create custom data types with behavior.

**What's next:** You'll define classes, use inheritance, and see real-life encapsulation.

**Real-life use:** Creating custom data structures, encapsulating logic.

In [37]:
# Define a class for a data point
class DataPoint:
    def __init__(self, x, y, label=None):
        self.x = x
        self.y = y
        self.label = label
        self._distance = None  # Private attribute
    
    def distance_from_origin(self):
        """Calculate distance from (0,0)"""
        import math
        if self._distance is None:  # Calculate only once
            self._distance = math.sqrt(self.x**2 + self.y**2)
        return self._distance
    
    def __repr__(self):
        """String representation"""
        if self.label:
            return f"DataPoint({self.x}, {self.y}, '{self.label}')"
        return f"DataPoint({self.x}, {self.y})"

In [38]:
# Class inheritance: LabeledDataPoint extends DataPoint
class LabeledDataPoint(DataPoint):
    def is_outlier(self, threshold=5.0):
        """Check if point is far from origin"""
        return self.distance_from_origin() > threshold

In [39]:
# Using the classes
points = [
    DataPoint(1, 2, 'A'),
    LabeledDataPoint(3, 4, 'B'),
    LabeledDataPoint(10, 10, 'Outlier')
]

for p in points:
    print(f"{p} - Distance: {p.distance_from_origin():.2f}")
    if isinstance(p, LabeledDataPoint):
        print(f"  Is outlier: {p.is_outlier()}")

DataPoint(1, 2, 'A') - Distance: 2.24
DataPoint(3, 4, 'B') - Distance: 5.00
  Is outlier: False
DataPoint(10, 10, 'Outlier') - Distance: 14.14
  Is outlier: True


## 20. Error Handling Best Practices
Proper error handling makes your code more robust.

**What's next:** You'll see specific exception handling and how to create custom exceptions.

**Real-life use:** Making data pipelines resilient, graceful degradation.

In [40]:
# Specific exception handling
def safe_division(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        print("Error: Division by zero")
        return float('inf')  # Return infinity
    except TypeError as e:
        print(f"Error: {e}")
        return None
    finally:
        print("Division operation attempted")

print(safe_division(10, 2))   # Normal case
print(safe_division(10, 0))   # Division by zero
print(safe_division('10', 2)) # Type error

Division operation attempted
5.0
Error: Division by zero
Division operation attempted
inf
Error: unsupported operand type(s) for /: 'str' and 'int'
Division operation attempted
None


In [41]:
# Custom exceptions for data validation
class DataValidationError(Exception):
    """Exception raised for data validation errors"""
    pass

def process_age(age):
    try:
        age = int(age)
        if age < 0 or age > 120:
            raise DataValidationError(f"Age must be between 0 and 120, got {age}")
        return age
    except ValueError:
        raise DataValidationError(f"Age must be a number, got {age}")

# Testing the validation
test_ages = ['25', '-5', '200', 'thirty']
for age in test_ages:
    try:
        validated_age = process_age(age)
        print(f"Valid age: {validated_age}")
    except DataValidationError as e:
        print(f"Validation error: {e}")

Valid age: 25
Validation error: Age must be between 0 and 120, got -5
Validation error: Age must be between 0 and 120, got 200
Validation error: Age must be a number, got thirty


## 21. Working with CSV and JSON
CSV and JSON are common data interchange formats.

**What's next:** You'll see how to write and read CSV and JSON files in Python.

**Real-life use:** Importing datasets, API communication, data storage.

In [42]:
import csv
import json

In [43]:
# Create sample data for CSV and JSON
data = [
    {'name': 'Alice', 'age': 30, 'city': 'New York'},
    {'name': 'Bob', 'age': 25, 'city': 'Boston'},
    {'name': 'Charlie', 'age': 35, 'city': 'Chicago'}
]

In [44]:
# Writing and reading CSV
with open('people.csv', 'w', newline='') as csvfile:
    fieldnames = ['name', 'age', 'city']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(data)

print("Reading from CSV:")
with open('people.csv', 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(f"{row['name']}, {row['age']}, {row['city']}")

Reading from CSV:
Alice, 30, New York
Bob, 25, Boston
Charlie, 35, Chicago


In [45]:
# Writing and reading JSON
with open('people.json', 'w') as jsonfile:
    json.dump(data, jsonfile, indent=4)

print("\nReading from JSON:")
with open('people.json', 'r') as jsonfile:
    loaded_data = json.load(jsonfile)
    for person in loaded_data:
        print(f"{person['name']}, {person['age']}, {person['city']}")


Reading from JSON:
Alice, 30, New York
Bob, 25, Boston
Charlie, 35, Chicago


## 22. Unpacking and Multiple Assignment
Unpacking and multiple assignment allow elegant value extraction.

**What's next:** You'll see how to unpack tuples, lists, and dictionaries, and use multiple assignment in real code.

**Real-life use:** Data destructuring, tuple returns, coordinate handling.

In [46]:
# Basic unpacking
coords = (10, 20)
x, y = coords  # Unpack tuple into variables
print(f"Coordinates: x={x}, y={y}")

Coordinates: x=10, y=20


In [47]:
# Multiple assignment
a, b, c = 1, 2, 3
print(f"Values: a={a}, b={b}, c={c}")

Values: a=1, b=2, c=3


In [48]:
# Swapping variables (no temp variable needed)
a, b = b, a
print(f"After swap: a={a}, b={b}")

After swap: a=2, b=1


In [49]:
# Unpacking with * for remaining items
first, *middle, last = [1, 2, 3, 4, 5]
print(f"First: {first}, Middle: {middle}, Last: {last}")

First: 1, Middle: [2, 3, 4], Last: 5


In [50]:
# Unpacking in a for loop
points = [(1, 2), (3, 4), (5, 6)]
for x, y in points:
    print(f"Point: ({x}, {y})")

Point: (1, 2)
Point: (3, 4)
Point: (5, 6)


In [51]:
# Unpacking dictionary items
person = {'name': 'Alice', 'age': 30, 'job': 'Data Scientist'}
for key, value in person.items():
    print(f"{key}: {value}")

name: Alice
age: 30
job: Data Scientist


In [52]:
# Function with multiple return values
def get_statistics(numbers):
    """Return min, max, average of a list"""
    return min(numbers), max(numbers), sum(numbers)/len(numbers)

data = [5, 3, 8, 1, 9, 2]
minimum, maximum, average = get_statistics(data)
print(f"\nStats - Min: {minimum}, Max: {maximum}, Avg: {average:.2f}")


Stats - Min: 1, Max: 9, Avg: 4.67
