# **AVRO format**

AVRO is a data serialization format developed by Apache that provides a compact, efficient, and schema-based way to store and exchange data. It is designed to support dynamic and evolving data structures. AVRO files are typically used in big data processing systems like Apache Hadoop and Apache Kafka.

## **AVRO using Python**

In Python, you can work with AVRO files using the fastavro library.

## **Install fastavro**

In [4]:
%pip install fastavro

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


# **Write data to an AVRO file**


In [5]:
import fastavro

# schema
schema = {
    'type': 'record',
    'name': 'Example',
    'fields': [
        {'name': 'name', 'type': 'string'},
        {'name': 'age', 'type': 'int'},
    ]
}

# data
data = {'name': 'Naruto', 'age': 30}

# save
with open('data.avro', 'wb') as avro_file:
    fastavro.writer(avro_file, schema, [data])

# **Read data from an AVRO file**

In [6]:
with open('data.avro', 'rb') as avro_file:
    avro_reader = fastavro.reader(avro_file)
    for record in avro_reader:
        # Process each record
        print(record)

{'name': 'Naruto', 'age': 30}
