# Python Basics for Data Science

This notebook covers all essential Python basics required for data science, including definitions, code examples, and real-life uses.

## 1. Variables and Data Types
Variables store data. Python supports types like int, float, str, and bool.

**Real-life use:** Storing user age, product price, or a name.

In [1]:
age = 30  # int
price = 19.99  # float
name = 'Alice'  # str
is_active = True  # bool
print(type(age), type(price), type(name), type(is_active))

<class 'int'> <class 'float'> <class 'str'> <class 'bool'>


## 2. Basic Operators
Operators perform operations on variables and values.

**Real-life use:** Calculating total price, discounts, or averages.

In [2]:
a = 10
b = 3
print('Addition:', a + b)
print('Division:', a / b)
print('Exponent:', a ** b)

Addition: 13
Division: 3.3333333333333335
Exponent: 1000


## 3. Control Flow: if, elif, else
Control flow lets you make decisions in code.

**Real-life use:** Checking if a user is old enough to access a service.

In [3]:
age = 18
if age >= 18:
    print('Adult')
else:
    print('Minor')

Adult


## 4. Loops: for and while
Loops repeat actions. `for` is used for iterating, `while` for repeating until a condition is met.

**Real-life use:** Processing all rows in a dataset.

In [4]:
for i in range(3):
    print('Iteration', i)

count = 0
while count < 3:
    print('Count:', count)
    count += 1

Iteration 0
Iteration 1
Iteration 2
Count: 0
Count: 1
Count: 2


## 5. Functions and Lambda Expressions
Functions organize code into reusable blocks. Lambda creates small anonymous functions.

**Real-life use:** Data cleaning, feature engineering.

In [5]:
def square(x):
    return x * x

print(square(5))

square_lambda = lambda x: x * x
print(square_lambda(6))

25
36


## 6. List, Tuple, and Dictionary Operations
Lists, tuples, and dictionaries store collections of data.

**Real-life use:** Storing columns of data, mapping names to values.

In [6]:
my_list = [1, 2, 3]
my_tuple = (4, 5, 6)
my_dict = {'a': 1, 'b': 2}
print(my_list[0], my_tuple[1], my_dict['a'])

1 5 1


## 7. String Manipulation
Strings are text data. You can slice, format, and search them.

**Real-life use:** Parsing CSV files, cleaning text data.

In [7]:
text = 'Data Science'
print(text.lower())
print(text.replace(' ', '_'))

data science
Data_Science


## 8. List Comprehensions
List comprehensions create lists in a concise way.

**Real-life use:** Creating new columns in dataframes.

In [8]:
squares = [x**2 for x in range(5)]
print(squares)

[0, 1, 4, 9, 16]


## 9. Importing Libraries
Libraries add extra functionality. Use `import` to include them.

**Real-life use:** Importing pandas, numpy, matplotlib, etc.

In [9]:
import math
import random
print(math.sqrt(16))
print(random.randint(1, 10))

4.0
9


## 10. File Handling
Read and write files using Python's built-in functions.

**Real-life use:** Loading datasets, saving results.

In [10]:
with open('example.txt', 'w') as f:
    f.write('Hello, file!')
with open('example.txt', 'r') as f:
    print(f.read())

Hello, file!


## 11. Exception Handling
Handle errors gracefully using try-except blocks.

**Real-life use:** Handling missing files or invalid user input.

In [11]:
try:
    result = 10 / 0
except ZeroDivisionError:
    print('Cannot divide by zero!')

Cannot divide by zero!


## 12. Working with Dates and Times
Use the `datetime` module to work with dates and times.

**Real-life use:** Timestamping data, calculating durations.

In [None]:
from datetime import datetime, timedelta
now = datetime.now()
print('Current date and time:', now)

# Date operations
one_week_later = now + timedelta(days=7)
print('One week from now:', one_week_later)

# Format dates
formatted_date = now.strftime('%Y-%m-%d %H:%M:%S')
print('Formatted date:', formatted_date)

Current date and time: 2025-05-06 17:20:25.699884


## 13. Sets and Their Operations
Sets are collections of unique elements with mathematical operations.

**Real-life use:** Finding unique values, comparing groups, efficient membership testing.

In [None]:
# Creating sets
fruits = {'apple', 'banana', 'cherry'}
citrus = {'orange', 'lemon', 'lime', 'cherry'}

# Add and remove elements
fruits.add('mango')
fruits.remove('banana')
print('Updated fruits:', fruits)

# Set operations
print('\nUnion:', fruits | citrus)  # All elements from both sets
print('Intersection:', fruits & citrus)  # Common elements
print('Difference (fruits - citrus):', fruits - citrus)  # In fruits but not in citrus
print('Symmetric difference:', fruits ^ citrus)  # In either set but not both

# Membership testing (very efficient for large collections)
print('\nIs apple in fruits?', 'apple' in fruits)

## 14. Advanced Dictionary Techniques
Dictionaries can be used in sophisticated ways for data manipulation.

**Real-life use:** Feature extraction, counting occurrences, grouping related data.

In [None]:
# Dictionary comprehensions
squares_dict = {x: x**2 for x in range(1, 6)}
print('Square dictionary:', squares_dict)

# Default dictionaries
from collections import defaultdict
word_count = defaultdict(int)  # Default value is 0 for int
text = "data science is the science of data"
for word in text.split():
    word_count[word] += 1
print('\nWord count:', dict(word_count))

# Nested dictionaries
employee_data = {
    'Alice': {'department': 'Data Science', 'salary': 85000},
    'Bob': {'department': 'Engineering', 'salary': 92000}
}
print(f"\nAlice's department: {employee_data['Alice']['department']}")

# Dictionary merging (Python 3.9+)
dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}
merged = dict1 | dict2  # dict2 values override dict1 where keys overlap
print('\nMerged dictionary:', merged)

## 15. Generators and Iterators
Generators produce values on-demand, saving memory for large datasets.

**Real-life use:** Processing large files, creating data pipelines.

In [None]:
# Generator function
def fibonacci(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

# Using a generator
fib_10 = list(fibonacci(10))
print('First 10 Fibonacci numbers:', fib_10)

# Generator expression
even_squares = (x**2 for x in range(10) if x % 2 == 0)
print('\nEven squares:',)
for square in even_squares:
    print(square, end=' ')

# Memory efficiency example
def read_large_file(file_path):
    """A generator that yields lines from a file"""
    with open(file_path, 'r') as file:
        for line in file:
            yield line.strip()

# Would be used like this:
# for line in read_large_file('huge_dataset.csv'):
#     process(line)  # Process one line at a time

## 16. Regular Expressions
Regex patterns help search, replace, and validate text.

**Real-life use:** Data cleaning, extraction, validation.

In [None]:
import re

# Finding patterns
text = "Contact us at info@example.com or support@company.org"
emails = re.findall(r'[\w.+-]+@[\w-]+\.[\w.-]+', text)
print('Extracted emails:', emails)

# Replacing patterns
censored = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL REDACTED]', text)
print('\nCensored text:', censored)

# Validation
def is_valid_phone(phone):
    pattern = r'^(\+\d{1,3})?[-\s]?\(?\d{3}\)?[-\s]?\d{3}[-\s]?\d{4}$'
    return bool(re.match(pattern, phone))

phones = ['+1-555-123-4567', '123-456-7890', '5551234567', '12-34-567']
for phone in phones:
    print(f"'{phone}' is valid: {is_valid_phone(phone)}")

## 17. Map, Filter, and Reduce Functions
Functional programming tools for data transformation.

**Real-life use:** Batch processing, data transformation pipelines.

In [None]:
from functools import reduce

# Map: apply function to each item
numbers = [1, 2, 3, 4, 5]
squares = list(map(lambda x: x**2, numbers))
print('Squared numbers:', squares)

# Filter: keep items that match condition
even_numbers = list(filter(lambda x: x % 2 == 0, numbers))
print('Even numbers:', even_numbers)

# Reduce: accumulate values
sum_result = reduce(lambda x, y: x + y, numbers)
print('Sum of numbers:', sum_result)

# Real-world example: data processing pipeline
data = ['  alice  ', 'BOB', 'Charlie  ', '  DAVID']

# Clean, filter and transform data in one pipeline
processed = list(map(
    lambda x: x.capitalize(),
    filter(
        lambda x: len(x.strip()) > 3,
        map(lambda x: x.strip(), data)
    )
))
print('\nProcessed names:', processed)

## 18. Context Managers with `with`
Context managers ensure proper resource handling.

**Real-life use:** File operations, database connections, API requests.

In [None]:
# File handling with context manager
with open('example.txt', 'w') as file:
    file.write('This file will be closed automatically\n')
    file.write('Even if an error occurs')
print('File is closed after with block:', file.closed)

# Custom context manager
from contextlib import contextmanager
import time

@contextmanager
def timer():
    """Measure execution time of a code block"""
    start = time.time()
    try:
        yield  # This is where the with-block's code executes
    finally:
        end = time.time()
        print(f"Execution took {end - start:.5f} seconds")

# Usage
with timer():
    # Simulate work
    sum(i**2 for i in range(1000000))

## 19. Object-Oriented Programming
Classes allow you to create custom data types with behavior.

**Real-life use:** Creating custom data structures, encapsulating logic.

In [None]:
# Define a class
class DataPoint:
    def __init__(self, x, y, label=None):
        self.x = x
        self.y = y
        self.label = label
        self._distance = None  # Private attribute
        
    def distance_from_origin(self):
        """Calculate distance from (0,0)"""
        import math
        if self._distance is None:  # Calculate only once
            self._distance = math.sqrt(self.x**2 + self.y**2)
        return self._distance
    
    def __repr__(self):
        """String representation"""
        if self.label:
            return f"DataPoint({self.x}, {self.y}, '{self.label}')"
        return f"DataPoint({self.x}, {self.y})"

# Class inheritance
class LabeledDataPoint(DataPoint):
    def is_outlier(self, threshold=5.0):
        """Check if point is far from origin"""
        return self.distance_from_origin() > threshold

# Using the classes
points = [
    DataPoint(1, 2, 'A'),
    LabeledDataPoint(3, 4, 'B'),
    LabeledDataPoint(10, 10, 'Outlier')
]

for p in points:
    print(f"{p} - Distance: {p.distance_from_origin():.2f}")
    if isinstance(p, LabeledDataPoint):
        print(f"  Is outlier: {p.is_outlier()}")

## 20. Error Handling Best Practices
Proper error handling makes your code more robust.

**Real-life use:** Making data pipelines resilient, graceful degradation.

In [None]:
# Specific exception handling
def safe_division(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        print("Error: Division by zero")
        return float('inf')  # Return infinity
    except TypeError as e:
        print(f"Error: {e}")
        return None
    finally:
        print("Division operation attempted")

print(safe_division(10, 2))   # Normal case
print(safe_division(10, 0))   # Division by zero
print(safe_division('10', 2)) # Type error

# Custom exceptions
class DataValidationError(Exception):
    """Exception raised for data validation errors"""
    pass

def process_age(age):
    try:
        age = int(age)
        if age < 0 or age > 120:
            raise DataValidationError(f"Age must be between 0 and 120, got {age}")
        return age
    except ValueError:
        raise DataValidationError(f"Age must be a number, got {age}")

# Testing the validation
test_ages = ['25', '-5', '200', 'thirty']
for age in test_ages:
    try:
        validated_age = process_age(age)
        print(f"Valid age: {validated_age}")
    except DataValidationError as e:
        print(f"Validation error: {e}")

## 21. Working with CSV and JSON
CSV and JSON are common data interchange formats.

**Real-life use:** Importing datasets, API communication, data storage.

In [None]:
import csv
import json

# Create sample data
data = [
    {'name': 'Alice', 'age': 30, 'city': 'New York'},
    {'name': 'Bob', 'age': 25, 'city': 'Boston'},
    {'name': 'Charlie', 'age': 35, 'city': 'Chicago'}
]

# Writing and reading CSV
with open('people.csv', 'w', newline='') as csvfile:
    fieldnames = ['name', 'age', 'city']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(data)

print("Reading from CSV:")
with open('people.csv', 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(f"{row['name']}, {row['age']}, {row['city']}")

# Writing and reading JSON
with open('people.json', 'w') as jsonfile:
    json.dump(data, jsonfile, indent=4)

print("\nReading from JSON:")
with open('people.json', 'r') as jsonfile:
    loaded_data = json.load(jsonfile)
    for person in loaded_data:
        print(f"{person['name']}, {person['age']}, {person['city']}")

## 22. Unpacking and Multiple Assignment
Unpacking and multiple assignment allow elegant value extraction.

**Real-life use:** Data destructuring, tuple returns, coordinate handling.

In [None]:
# Basic unpacking
coords = (10, 20)
x, y = coords  # Unpack tuple into variables
print(f"Coordinates: x={x}, y={y}")

# Multiple assignment
a, b, c = 1, 2, 3
print(f"Values: a={a}, b={b}, c={c}")

# Swapping variables (no temp variable needed)
a, b = b, a
print(f"After swap: a={a}, b={b}")

# Unpacking with * for remaining items
first, *middle, last = [1, 2, 3, 4, 5]
print(f"First: {first}, Middle: {middle}, Last: {last}")

# Unpacking in a for loop
points = [(1, 2), (3, 4), (5, 6)]
for x, y in points:
    print(f"Point: ({x}, {y})")

# Unpacking dictionary items
person = {'name': 'Alice', 'age': 30, 'job': 'Data Scientist'}
for key, value in person.items():
    print(f"{key}: {value}")

# Function with multiple return values
def get_statistics(numbers):
    """Return min, max, average of a list"""
    return min(numbers), max(numbers), sum(numbers)/len(numbers)

data = [5, 3, 8, 1, 9, 2]
minimum, maximum, average = get_statistics(data)
print(f"\nStats - Min: {minimum}, Max: {maximum}, Avg: {average:.2f}")