# A Practical Course on Handling CSV and JSONL Files in Python

Welcome! In the world of data, two of the most common file formats you'll encounter are CSV (Comma-Separated Values) and JSONL (JSON Lines). This notebook will guide you from the basics to advanced techniques for reading and writing these files using Python's powerful standard library.

We'll cover:
*   What CSV and JSONL files are and when to use them.
*   Reading and writing these files using the `csv` and `json` modules.
*   Best practices, such as handling headers and ensuring data integrity.
*   Advanced topics like CSV dialects and memory-efficient streaming for large files.

### Table of Contents

**Part 1: Handling CSV Files**
1. [Setup: Creating Our Sample CSV File](#setup-csv)
2. [The Basics: Reading a CSV with `csv.reader`](#read-csv-basic)
3. [A Better Way: Reading a CSV into Dictionaries with `csv.DictReader`](#read-csv-dict)
4. [The Basics: Writing Data to a CSV with `csv.writer`](#write-csv-basic)
5. [A Better Way: Writing Dictionaries to a CSV with `csv.DictWriter`](#write-csv-dict)
6. [Advanced CSV: Handling Different Dialects (e.g., TSV)](#advanced-csv)

**Part 2: Handling JSONL Files**
7. [Setup: Creating Our Sample JSONL File](#setup-jsonl)
8. [Reading a JSONL File](#read-jsonl)
9. [Writing to a JSONL File](#write-jsonl)
10. [Advanced JSONL: Memory-Efficient Streaming](#advanced-jsonl)

**Part 3: Conclusion & Comparison**
11. [Summary: CSV vs. JSONL](#summary)

# Part 1: Handling CSV Files

A CSV file is a simple text file where values are separated by a delimiter, usually a comma. It's great for tabular data, like spreadsheets.

<a id='setup-csv'></a>
### 1. Setup: Creating Our Sample CSV File

Let's start by creating a sample CSV file to work with. This cell writes a file named `users.csv` in the same directory as this notebook.

In [5]:
import csv

# Data we want to write
csv_data = [
    ['ID', 'Name', 'Age', 'City'],
    ['101', 'Alice', '30', 'New York'],
    ['102', 'Bob', '25', 'Los Angeles'],
    ['103', 'Charlie', '35', 'Chicago']
]

# Writing to users.csv
with open('users.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(csv_data)

print("users.csv created successfully.")

users.csv created successfully.


<a id='read-csv-basic'></a>
### 2. The Basics: Reading a CSV with `csv.reader`

The `csv.reader` object is the simplest way to read a CSV. It treats each row as a **list of strings**.

In [8]:
with open('users.csv', 'r') as file:
    reader = csv.reader(file)
    print(reader)
    # The reader is an iterator. We can skip the header row with next().
    header = next(reader)
    print(f"Header: {header}")
    
    print("--- User Data ---")
    for row in reader:
        # Note: All values are read as strings!
        print(row)
        print(f"The datatype of row is: {type(row)}")
        user_id, name, age, city = row
        print(f"ID: {user_id}, Name: {name}, Age: {age} (a {type(age)}), City: {city}")
        # You would need to manually convert types, e.g., int(age)

<_csv.reader object at 0x7f078ebb9d90>
Header: ['ID', 'Name', 'Age', 'City']
--- User Data ---
['101', 'Alice', '30', 'New York']
The datatype of row is: <class 'list'>
ID: 101, Name: Alice, Age: 30 (a <class 'str'>), City: New York
['102', 'Bob', '25', 'Los Angeles']
The datatype of row is: <class 'list'>
ID: 102, Name: Bob, Age: 25 (a <class 'str'>), City: Los Angeles
['103', 'Charlie', '35', 'Chicago']
The datatype of row is: <class 'list'>
ID: 103, Name: Charlie, Age: 35 (a <class 'str'>), City: Chicago


<a id='read-csv-dict'></a>
### 3. A Better Way: Reading into Dictionaries with `csv.DictReader`

`csv.DictReader` is more convenient. It reads each row into a **dictionary**, using the header row for keys. This makes your code much more readable and less prone to errors if the column order changes.

In [9]:
with open('users.csv', 'r') as file:
    reader = csv.DictReader(file)
    
    print("--- User Data as Dictionaries ---")
    for row_dict in reader:
        print(row_dict)
        # Access data by column name - much better!
        print(f"ID: {row_dict['ID']}, Name: {row_dict['Name']}, City: {row_dict['City']}")

--- User Data as Dictionaries ---
{'ID': '101', 'Name': 'Alice', 'Age': '30', 'City': 'New York'}
ID: 101, Name: Alice, City: New York
{'ID': '102', 'Name': 'Bob', 'Age': '25', 'City': 'Los Angeles'}
ID: 102, Name: Bob, City: Los Angeles
{'ID': '103', 'Name': 'Charlie', 'Age': '35', 'City': 'Chicago'}
ID: 103, Name: Charlie, City: Chicago


<a id='write-csv-basic'></a>
### 4. The Basics: Writing Data to a CSV with `csv.writer`

To write data, we use `csv.writer`. The data should be a list of lists.

**Crucial Tip:** Always open the file with `newline=''` when writing CSVs to prevent extra blank rows from being added, especially on Windows.

In [5]:
products_to_write = [
    ['SKU', 'ProductName', 'Price'],
    ['P-001', 'Laptop', '1200'],
    ['P-002', 'Mouse', '25'],
    ['P-003', 'Keyboard', '75']
]

with open('products.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(products_to_write)

print("products.csv created successfully.")

products.csv created successfully.


<a id='write-csv-dict'></a>
### 5. A Better Way: Writing Dictionaries to a CSV with `csv.DictWriter`

If your data is a list of dictionaries, `csv.DictWriter` is the perfect tool. You must specify the `fieldnames` (the column headers) when you create the writer.

In [10]:
inventory_records = [
    {'ID': 'A1', 'Item': 'Apple', 'Stock': 500, 'Color': 'Red'},
    {'ID': 'B2', 'Item': 'Banana', 'Stock': 800, 'Color': 'Yellow'},
    {'ID': 'C3', 'Item': 'Orange', 'Stock': 650, 'Color': 'Orange'}
]

# Define the headers for your CSV file
fieldnames = ['ID', 'Item', 'Stock', 'Color']

with open('inventory.csv', 'w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    
    writer.writeheader()  # Writes the header row
    writer.writerows(inventory_records) # Writes all the dictionary rows

print("inventory.csv created successfully.")

# reading the inventory.csv file
with open('inventory.csv', 'r') as file:
    reader = csv.DictReader(file)
    
    print("--- Inventory Records ---")
    for row in reader:
        print(f"ID: {row['ID']}, Item: {row['Item']}, Stock: {row['Stock']} (a {type(row['Stock'])}), Color: {row['Color']}")
        # Note: Stock is still a string, you would need to convert it to int if needed

inventory.csv created successfully.
--- Inventory Records ---
ID: A1, Item: Apple, Stock: 500 (a <class 'str'>), Color: Red
ID: B2, Item: Banana, Stock: 800 (a <class 'str'>), Color: Yellow
ID: C3, Item: Orange, Stock: 650 (a <class 'str'>), Color: Orange


<a id='advanced-csv'></a>
### 6. Advanced CSV: Handling Different Dialects

Not all "comma-separated" files actually use commas. Some use tabs (TSV), semicolons, or pipes. The `csv` module can handle this easily by specifying a **dialect**, which includes parameters like `delimiter` and `quotechar`.

In [7]:
# Let's create a Tab-Separated-Values (TSV) file
data = [['Name', 'Score'], ['Alice', '95'], ['Bob', '88']]

with open('scores.tsv', 'w', newline='') as file:
    # Here we specify the delimiter is a tab character
    writer = csv.writer(file, delimiter='\t')
    writer.writerows(data)

print("scores.tsv created successfully.")

# Now let's read it back, telling the reader to expect tabs
print("\n--- Reading the TSV file ---")
with open('scores.tsv', 'r') as file:
    reader = csv.reader(file, delimiter='\t')
    for row in reader:
        print(row)

scores.tsv created successfully.

--- Reading the TSV file ---
['Name', 'Score']
['Alice', '95']
['Bob', '88']


# Part 2: Handling JSON Lines (JSONL) Files

A JSONL file (also called newline-delimited JSON) is a text file where each line is a separate, valid JSON object. This format is fantastic for streaming data and logs because you can process the file one line at a time without loading the whole thing into memory.

<a id='setup-jsonl'></a>
### 7. Setup: Creating Our Sample JSONL File

Let's create a `logs.jsonl` file. We will use the standard `json` module.

In [8]:
import json

log_entries = [
    {'timestamp': '2023-10-27T10:00:00Z', 'level': 'INFO', 'message': 'User logged in', 'user_id': '101'},
    {'timestamp': '2023-10-27T10:01:30Z', 'level': 'WARN', 'message': 'Failed login attempt', 'ip': '192.168.1.100'},
    {'timestamp': '2023-10-27T10:02:00Z', 'level': 'INFO', 'message': 'Data exported', 'user_id': '103'}
]

with open('logs.jsonl', 'w') as file:
    for entry in log_entries:
        # Convert the dictionary to a JSON string
        json_string = json.dumps(entry)
        # Write the string to the file, followed by a newline
        file.write(json_string + '\n')

print("logs.jsonl created successfully.")

logs.jsonl created successfully.


<a id='read-jsonl'></a>
### 8. Reading a JSONL File

Reading a JSONL file is beautifully simple. You read the file line by line, and use `json.loads()` (load **s**tring) to parse each line from a JSON string into a Python dictionary.

In [10]:
parsed_logs = []
with open('logs.jsonl', 'r') as file:
    for line in file:
        # Each line is a JSON string, parse it into a dictionary
        log_dict = json.loads(line)
        parsed_logs.append(log_dict)

print("--- Parsed Log Entries ---")
for log in parsed_logs:
    print(f"Level: {log.get('level')}, Message: {log.get('message')}")

--- Parsed Log Entries ---
Level: INFO, Message: User logged in
Level: WARN, Message: Failed login attempt
Level: INFO, Message: Data exported


<a id='write-jsonl'></a>
### 9. Writing to a JSONL File

Writing is the reverse of reading. For each Python dictionary you want to save, you use `json.dumps()` (dump **s**tring) to convert it to a JSON string and then write that string to the file with a newline character.

In [12]:
new_events = [
    {'event_id': 500, 'type': 'click', 'element': 'button#submit'},
    {'event_id': 501, 'type': 'scroll', 'depth': '75%'}
]

# Let's append these events to our existing logs file
with open('logs.jsonl', 'a') as file: # 'a' for append mode
    for event in new_events:
        file.write(json.dumps(event) + '\n')

print("Appended new events to logs.jsonl.")

Appended new events to logs.jsonl.


<a id='advanced-jsonl'></a>
### 10. Advanced JSONL: Memory-Efficient Streaming

The primary advantage of JSONL is its ability to handle datasets that are too large to fit in memory. You can process the file one record at a time without ever loading the whole thing.

Here's how you might process a huge log file to count the number of 'WARN' level messages, using a generator to be extra memory-efficient.

In [13]:
def read_logs_stream(filepath):
    """A generator function that yields one log entry at a time."""
    with open(filepath, 'r') as f:
        for line in f:
            try:
                yield json.loads(line)
            except json.JSONDecodeError:
                # Handle corrupted lines gracefully
                print(f"Skipping corrupted line: {line.strip()}")
                continue

# Imagine logs.jsonl is 100 GB. This code would still run instantly with minimal memory.
warn_count = 0
log_stream = read_logs_stream('logs.jsonl')

for log_entry in log_stream:
    if log_entry.get('level') == 'WARN':
        warn_count += 1

print(f"\nTotal 'WARN' level logs found: {warn_count}")


Total 'WARN' level logs found: 1


# Part 3: Conclusion & Comparison

<a id='summary'></a>
### 11. Summary: CSV vs. JSONL

| Feature | CSV (Comma-Separated Values) | JSONL (JSON Lines) |
| :--- | :--- | :--- |
| **Structure** | Simple, flat, tabular. | Each line is a full JSON object. Can be nested and complex. |
| **Schema** | Implicit schema (header row). All rows should have the same columns. | Flexible schema. Each line can have different keys. |
| **Data Types** | All data is read as **strings**. Requires manual type conversion. | Preserves data types (strings, numbers, booleans, lists, etc.). |
| **Readability** | Human-readable in a spreadsheet. | Human-readable as text, very machine-readable. |
| **Best For** | Exporting from spreadsheets, relational databases, simple tabular data. | Logs, streaming API responses, complex records, semi-structured data. |
| **Library** | `import csv` | `import json` |

You are now equipped with the knowledge to handle two of the most important data formats in Python. Practice by finding sample datasets online and trying to read, transform, and write them!