# Working with CSV and Excel Files

In this lecture, we'll cover how to work with CSV and Excel files using Python. We'll explore various libraries and techniques to read, write, and manipulate data in these formats.

## 1. Introduction to CSV and Excel file formats

### CSV (Comma-Separated Values)
- Simple text format for storing tabular data
- Each line represents a row, and columns are separated by commas (or other delimiters)
- Easy to read and write, but lacks formatting and multiple sheet support

### Excel
- Proprietary format developed by Microsoft
- Supports multiple sheets, formatting, formulas, and charts
- More complex than CSV, but offers richer features for data analysis and presentation

## 2. Reading and writing CSV files using the `csv` module

Python's built-in `csv` module provides functionality to read and write CSV files.

In [None]:
import csv

# Reading a CSV file
with open('example.csv', 'r') as file:
    csv_reader = csv.reader(file)
    for row in csv_reader:
        print(row)

# Writing to a CSV file
data = [
    ['Name', 'Age', 'City'],
    ['Alice', '30', 'New York'],
    ['Bob', '25', 'San Francisco']
]

with open('output.csv', 'w', newline='') as file:
    csv_writer = csv.writer(file)
    csv_writer.writerows(data)

# Using DictReader and DictWriter
with open('example.csv', 'r') as file:
    csv_reader = csv.DictReader(file)
    for row in csv_reader:
        print(row['Name'], row['Age'])

with open('output.csv', 'w', newline='') as file:
    fieldnames = ['Name', 'Age', 'City']
    csv_writer = csv.DictWriter(file, fieldnames=fieldnames)
    csv_writer.writeheader()
    csv_writer.writerow({'Name': 'Charlie', 'Age': '35', 'City': 'Chicago'})

## 3. Working with Excel files using `openpyxl`

`openpyxl` is a powerful library for working with Excel files (.xlsx format).

In [None]:
from openpyxl import Workbook, load_workbook

# Creating a new workbook
wb = Workbook()
sheet = wb.active
sheet.title = "Sample Sheet"

# Writing data
sheet['A1'] = "Name"
sheet['B1'] = "Age"
sheet['A2'] = "Alice"
sheet['B2'] = 30

# Saving the workbook
wb.save('example.xlsx')

# Reading an existing workbook
wb = load_workbook('example.xlsx')
sheet = wb.active

# Reading cell values
print(sheet['A1'].value)
print(sheet['B2'].value)

# Iterating through rows
for row in sheet.iter_rows(values_only=True):
    print(row)

# Formatting cells
from openpyxl.styles import Font, PatternFill

cell = sheet['A1']
cell.font = Font(bold=True)
cell.fill = PatternFill(start_color="FFFF00", end_color="FFFF00", fill_type="solid")

wb.save('example_formatted.xlsx')

## 4. Data manipulation and analysis with `pandas`

`pandas` is a powerful data manipulation library that can work with both CSV and Excel files.

In [None]:
import pandas as pd

# Reading CSV
df_csv = pd.read_csv('example.csv')

# Reading Excel
df_excel = pd.read_excel('example.xlsx')

# Basic operations
print(df_csv.head())
print(df_csv.describe())

# Filtering
filtered_df = df_csv[df_csv['Age'] > 30]

# Grouping and aggregation
grouped = df_csv.groupby('City')['Age'].mean()

# Writing to CSV and Excel
df_csv.to_csv('output_pandas.csv', index=False)
df_excel.to_excel('output_pandas.xlsx', index=False)

## 5. Error handling and data validation

When working with external files, it's crucial to implement proper error handling and data validation.

In [None]:
import csv
import pandas as pd

def read_csv_safe(filename):
    try:
        with open(filename, 'r') as file:
            reader = csv.reader(file)
            data = list(reader)
        return data
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found.")
    except csv.Error as e:
        print(f"CSV Error: {e}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
    return None

def validate_data(df):
    # Check for missing values
    if df.isnull().values.any():
        print("Warning: Dataset contains missing values")
    
    # Check data types
    if df['Age'].dtype != 'int64':
        print("Warning: 'Age' column is not of integer type")
    
    # Custom validation
    if (df['Age'] < 0).any():
        print("Error: Negative ages found in the dataset")

# Example usage
data = read_csv_safe('example.csv')
if data:
    df = pd.DataFrame(data[1:], columns=data[0])
    validate_data(df)

## Conclusion

This lecture covered the essentials of working with CSV and Excel files in Python. We explored reading and writing operations, data manipulation with pandas, and important considerations for error handling and data validation. Practice these concepts with real datasets to gain proficiency in data analysis and manipulation tasks.