# 17 - Modules and Packages

## Introduction

Modules and packages allow you to organize code and reuse functionality. This is essential when working with PySpark, pandas, and other data engineering libraries.

## What You'll Learn

- Importing built-in modules
- Using module functions
- Creating your own modules
- Importing specific functions
- Understanding packages


## What are Modules?

A module is a file containing Python code (functions, variables, classes). Python comes with many built-in modules, and you can also create your own.


In [1]:
# Import a module
import math

# Use functions from the module
print("Square root of 16:", math.sqrt(16))
print("Value of pi:", math.pi)
print("Power of 2^3:", math.pow(2, 3))


Square root of 16: 4.0
Value of pi: 3.141592653589793
Power of 2^3: 8.0


## Importing Specific Functions

You can import only the functions you need from a module.


In [2]:
# Import specific functions
from math import sqrt, pi

# Now you can use them directly without the module name
print("Square root of 25:", sqrt(25))
print("Value of pi:", pi)


Square root of 25: 5.0
Value of pi: 3.141592653589793


## Common Built-in Modules

Python has many useful built-in modules. Let's explore a few important ones.


In [3]:
# Random module - for generating random numbers
import random
print("Random number (1-10):", random.randint(1, 10))
print("Random choice:", random.choice(["apple", "banana", "orange"]))

# OS module - for operating system interactions
import os
print("Current directory:", os.getcwd())

# Datetime module - for working with dates and times
from datetime import datetime
print("Current date and time:", datetime.now())


Random number (1-10): 6
Random choice: banana
Current directory: /Users/rohityadav/ry_workspace/dev_de_tr/06 Python
Current date and time: 2025-12-26 00:29:50.747892


## Creating Your Own Module

You can create your own module by saving Python code in a `.py` file.


In [4]:
# First, let's create a simple module file
# (In a real scenario, you'd create a separate .py file)

# Create a file called my_utils.py with this content:
# def add(a, b):
#     return a + b
#
# def multiply(a, b):
#     return a * b
#
# PI = 3.14159

# Then you can import it like this:
# import my_utils
# result = my_utils.add(5, 3)
# print(result)

# For demonstration, we'll define functions here
def add(a, b):
    return a + b

def multiply(a, b):
    return a * b

print("Using our functions:")
print("Add:", add(5, 3))
print("Multiply:", multiply(5, 3))


Using our functions:
Add: 8
Multiply: 15


## Using Aliases

You can give modules shorter names using aliases, which is common practice.


In [5]:
# Import with alias
import math as m

print("Square root using alias:", m.sqrt(16))

# This is very common with data science libraries:
# import pandas as pd
# import numpy as np
# from pyspark.sql import SparkSession


Square root using alias: 4.0


## What are Packages?

A package is a collection of modules organized in directories. Packages help organize related modules together.


In [6]:
# Packages are imported similarly to modules
# For example:
# from pyspark.sql import SparkSession
# from pyspark.sql.functions import col, lit

# When you install packages using pip, you can import them:
# pip install pandas
# import pandas as pd

print("In PySpark, you'll use:")
print("from pyspark.sql import SparkSession")
print("from pyspark.sql.functions import *")


In PySpark, you'll use:
from pyspark.sql import SparkSession
from pyspark.sql.functions import *


## Key Points to Remember

- Modules are Python files containing reusable code
- Use `import module_name` to import entire modules
- Use `from module import function` to import specific functions
- Use aliases (`as`) for commonly used modules
- Packages are collections of modules
- PySpark, pandas, and numpy are examples of packages you'll use in data engineering
- Always import modules at the top of your file
