# 02 - Reading and Writing Data

## Introduction

One of the most common tasks in data engineering is reading data from files and writing processed data back to files. Pandas makes this very easy with built-in functions for various file formats.

## What You'll Learn

- Reading CSV files
- Writing CSV files
- Reading Excel files
- Writing Excel files
- Reading JSON files
- Writing JSON files
- Handling different file encodings


## Reading CSV Files

CSV (Comma-Separated Values) is the most common file format for data. Use `pd.read_csv()` to read CSV files.


In [7]:
import pandas as pd

# First, let's create a sample CSV file
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [25, 30, 35, 28, 32],
    'City': ['New York', 'London', 'Tokyo', 'Paris', 'Sydney'],
    'Salary': [50000, 60000, 70000, 55000, 65000]
}
df = pd.DataFrame(data)
df.to_csv('sample_data.csv', index=False)
print("Sample CSV file created!")


Sample CSV file created!


In [8]:
# Read the CSV file
df = pd.read_csv('sample_data.csv')
print("Data read from CSV:")
print(df)
print(f"\nShape: {df.shape}")


Data read from CSV:
      Name  Age      City  Salary
0    Alice   25  New York   50000
1      Bob   30    London   60000
2  Charlie   35     Tokyo   70000
3    Diana   28     Paris   55000
4      Eve   32    Sydney   65000

Shape: (5, 4)


In [9]:
# Reading CSV with specific parameters
# Common parameters:
# - sep: delimiter (default is ',')
# - header: row number to use as column names (default is 0)
# - nrows: number of rows to read
# - usecols: columns to read

df = pd.read_csv('sample_data.csv', nrows=3)  # Read only first 3 rows
print("First 3 rows:")
print(df)


First 3 rows:
      Name  Age      City  Salary
0    Alice   25  New York   50000
1      Bob   30    London   60000
2  Charlie   35     Tokyo   70000


## Writing CSV Files

Use `df.to_csv()` to write DataFrames to CSV files.


In [10]:
# Create a DataFrame
df = pd.DataFrame({
    'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
    'Price': [999, 25, 75, 299],
    'Quantity': [10, 50, 30, 15]
})

# Write to CSV
df.to_csv('products.csv', index=False)  # index=False prevents writing row numbers
print("Data written to products.csv")
print(df)


Data written to products.csv
    Product  Price  Quantity
0    Laptop    999        10
1     Mouse     25        50
2  Keyboard     75        30
3   Monitor    299        15


## Reading Excel Files

Excel files (.xlsx, .xls) are common in business environments. Use `pd.read_excel()` to read them.

**Note:** You need `openpyxl` library installed:
```bash
pip install openpyxl
```


In [11]:
# Create a DataFrame and write to Excel
df = pd.DataFrame({
    'Employee': ['Alice', 'Bob', 'Charlie'],
    'Department': ['IT', 'Sales', 'Marketing'],
    'Salary': [75000, 65000, 70000]
})

# Write to Excel
df.to_excel('employees.xlsx', index=False, sheet_name='Employees')
print("Data written to Excel file!")


Data written to Excel file!


In [12]:
# Read from Excel
try:
    df = pd.read_excel('employees.xlsx', sheet_name='Employees')
    print("Data read from Excel:")
    print(df)
except ImportError:
    print("openpyxl not installed. Install it with: pip install openpyxl")


Data read from Excel:
  Employee Department  Salary
0    Alice         IT   75000
1      Bob      Sales   65000
2  Charlie  Marketing   70000


## Reading JSON Files

JSON (JavaScript Object Notation) is common for APIs and configuration files. Use `pd.read_json()` to read JSON files.


In [13]:
# Create a DataFrame and write to JSON
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'London', 'Tokyo']
})

# Write to JSON
df.to_json('data.json', orient='records', indent=2)
print("Data written to JSON file!")


Data written to JSON file!


In [14]:
# Read from JSON
df = pd.read_json('data.json')
print("Data read from JSON:")
print(df)


Data read from JSON:
      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Tokyo


## Summary

In this notebook, you learned:
- ✅ How to read CSV files with `pd.read_csv()`
- ✅ How to write CSV files with `df.to_csv()`
- ✅ How to read Excel files with `pd.read_excel()`
- ✅ How to write Excel files with `df.to_excel()`
- ✅ How to read JSON files with `pd.read_json()`
- ✅ How to write JSON files with `df.to_json()`

**Next:** Learn how to select and filter data in `03_data_selection_filtering.ipynb`
