# Python Data Serialization and I/O
This notebook covers data serialization and input/output (I/O) in Python, including pickle, CSV, XML, Excel, images, and PDFs, with real-life use cases, best practices, and code examples.

## 1. Pickle (Object Serialization)
**Definition:** Pickle is used to serialize and deserialize Python objects. Useful for saving models or data structures.

**Syntax and Example:**

In [None]:
import pickle
data = {'a': 1, 'b': 2}
with open('data.pkl', 'wb') as f:
    pickle.dump(data, f)
with open('data.pkl', 'rb') as f:
    loaded = pickle.load(f)
print(loaded)

**Output:**
{'a': 1, 'b': 2}

**Real-life use case:** Saving trained machine learning models for later use.

**Common mistakes:** Pickle files are not secure against code execution attacks. Never unpickle data from untrusted sources.

**Best practices:** Use pickle only for trusted data and consider alternatives for cross-language compatibility.

## 2. CSV Files
**Definition:** CSV (Comma-Separated Values) is a common format for tabular data.

**Syntax and Example:**

In [None]:
import csv
with open('example.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['name', 'age'])
    writer.writerow(['Alice', 30])
    writer.writerow(['Bob', 25])
with open('example.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

**Output:**
['name', 'age']
['Alice', '30']
['Bob', '25']

**Real-life use case:** Importing/exporting data between Excel and Python.

**Common mistakes:** Not handling newlines or encoding issues.

**Best practices:** Always specify newline='' and encoding when working with CSV files.

## 3. XML Files
**Definition:** XML (eXtensible Markup Language) is used for structured data exchange.

**Syntax and Example:**

In [None]:
import xml.etree.ElementTree as ET
xml_data = '<root><item>A</item><item>B</item></root>'
root = ET.fromstring(xml_data)
for item in root.findall('item'):
    print(item.text)

**Output:**
A
B

**Real-life use case:** Reading configuration files or data from web services.

**Common mistakes:** Not handling namespaces or large files efficiently.

**Best practices:** Use iterparse for large XML files.

## 4. Excel Files
**Definition:** Excel files are widely used for data storage and analysis. Use `pandas` for easy reading/writing.

**Syntax and Example:**

In [None]:
import pandas as pd
df = pd.DataFrame({'name': ['Alice', 'Bob'], 'age': [30, 25]})
df.to_excel('example.xlsx', index=False)
df2 = pd.read_excel('example.xlsx')
print(df2)

**Output:**
   name  age
0 Alice   30
1   Bob   25

**Real-life use case:** Automating report generation for business analytics.

**Common mistakes:** Not installing required libraries (openpyxl, xlrd).

**Best practices:** Always specify index=False unless you want to save the DataFrame index.

## 5. Images
**Definition:** Use the `PIL` (Pillow) library to work with images.

**Syntax and Example:**

In [None]:
from PIL import Image
img = Image.new('RGB', (60, 30), color = 'red')
img.save('pil_red.png')
img2 = Image.open('pil_red.png')
print(img2.size)

**Output:**
(60, 30)

**Real-life use case:** Generating thumbnails or processing images for machine learning.

**Common mistakes:** Not closing image files after opening.

**Best practices:** Use context managers or call close() on images.

## 6. PDFs
**Definition:** Use the `PyPDF2` library to read PDF files.

**Syntax and Example:**

In [None]:
import PyPDF2
with open('example.pdf', 'rb') as f:
    reader = PyPDF2.PdfReader(f)
    page = reader.pages[0]
    print(page.extract_text())

In [None]:
import PyPDF2
with open('example.pdf', 'rb') as f:
    reader = PyPDF2.PdfReader(f)
    page = reader.pages[0]
    print(page.extract_text())  # Output: Text from first page of PDF