# Introduction to HDF5 with `h5py`

In this notebook, we will explore the basics of working with HDF5 files using the `h5py` package in Python. We'll cover the concepts of HDF5 files, datasets, and how to manage and access data stored in HDF5 format.

## 1. HDF5 Files and Datasets

### HDF5 File
- An HDF5 file is a container for storing datasets and groups in a hierarchical manner.
- It can contain multiple datasets and groups, organized in a tree structure.

### Dataset
- A dataset in HDF5 is a multidimensional array of data.
- It can hold data of any type (e.g., integers, floats, strings).
- Each dataset is accessed by a unique name within the HDF5 file.

### Groups
- Groups are containers within HDF5 files that can hold datasets and other groups.
- They help organize datasets hierarchically.

## 2. Basic Operations with `h5py`

### Example 1: Reading and Writing a NumPy Array
```python
import h5py
import numpy as np

# Create a simple NumPy array
data = np.arange(100).reshape(10, 10)

# Writing to an HDF5 file
with h5py.File('example1.h5', 'w') as h5f:
    h5f.create_dataset('my_dataset', data=data)
    
# Reading from the HDF5 file
with h5py.File('example1.h5', 'r') as h5f:
    loaded_data = h5f['my_dataset'][:]
    print(loaded_data)
```

### Example 2: Reading and Writing a CSV File
```python
import h5py
import numpy as np

# Load data from a CSV file into a NumPy array
csv_data = np.loadtxt('example2.csv', delimiter=',')

# Writing to an HDF5 file
with h5py.File('example2.h5', 'w') as h5f:
    h5f.create_dataset('csv_dataset', data=csv_data)
    
# Reading from the HDF5 file
with h5py.File('example2.h5', 'r') as h5f:
    loaded_csv_data = h5f['csv_dataset'][:]
    print(loaded_csv_data)
```

## 3. Using Dataset Options

### Compression
```python
with h5py.File('compressed_example.h5', 'w') as h5f:
    h5f.create_dataset('compressed_dataset', data=data, compression='gzip', compression_opts=4)
```
- **`compression='gzip'`**: Uses gzip algorithm for compression.
- **`compression_opts=4`**: Sets the compression level (0-9).

### Chunking
```python
with h5py.File('chunked_example.h5', 'w') as h5f:
    h5f.create_dataset('chunked_dataset', data=data, chunks=(5, 5))
```
- **`chunks=(5, 5)`**: Defines the size of chunks (blocks of data).

### Data Types
```python
with h5py.File('dtype_example.h5', 'w') as h5f:
    h5f.create_dataset('int_dataset', data=data, dtype='int32')
```
- **`dtype='int32'`**: Specifies the data type for the dataset.

## 4. Working with Groups
```python
with h5py.File('group_example.h5', 'w') as h5f:
    group = h5f.create_group('my_group')
    group.create_dataset('grouped_dataset', data=data)
```
- **`create_group('my_group')`**: Creates a group named `my_group`.
- **`group.create_dataset()`**: Creates a dataset within the specified group.

## Summary
In this tutorial, we covered:
- The basic concepts of HDF5 files and datasets.
- How to read and write data using `h5py`.
- Various options for creating datasets, such as compression and chunking.
- How to use groups to organize datasets hierarchically.

With these concepts, you should be able to efficiently store and manage large datasets using HDF5. If you have more questions or need further clarification, feel free to ask!