<div style="text-align:center;font-size:22pt; font-weight:bold;color:white;border:solid black 1.5pt;background-color:#1e7263;">
    Reading and Writing Apache AVRO Data 
</div>

In [1]:
# ============================================================
#                                                            =
#             Title: Reading and Writing Apache AVRO Data    =
#             ---------------------------------              =
#                                                            =
#             Author: Dr. Saad Laouadi                       =
#                                                            =
#             Copyright: Dr. Saad Laouadi                    =
# ============================================================
#                                                            =
#                       LICENSE                              =
#             ----------------------                         =
#                                                            =
#             This material is intended for educational      =
#             purposes only and may not be used directly in  =
#             courses, video recordings, or similar          =
#             without prior consent from the author.         =
#             When using or referencing this material,       =
#             proper credit must be attributed to the        =
#             author.                                        =
# ============================================================

In [2]:
# Environment Setup
import sys
sys.path.append('../../scripts/')  

# import the working libraries
from importlibs import *

******************************************
          The imported libs are:          
******************************************
polars version is :     0.20.2
pandas version is :      2.1.4
numpy version is  :     1.26.2
pyarrow version is:     14.0.2
******************************************
The imported builtin modules are:
['os', 'sys', 'pathlib', 'time', 'shutil', 're']
**************************************************************
The python executable path is:
 /usr/local/Caskroom/mambaforge/base/envs/plenv/bin/python3.12
**************************************************************

....................
Important Reminder:
....................

Before proceeding, please ensure that you have activated the appropriate virtual environment for this project.
This step is crucial to maintain consistent dependencies and project settings.
...............................................................................


In [10]:
# Data path settings
DATA_ROOT = Path("../../datasets").resolve()
WEATHER = Path.joinpath(DATA_ROOT, 'london_weather.avro')

## The `polars.read_avro()` Method

The pl.read_avro function in the Polars library provides a convenient way to read data from Apache Avro format into a DataFrame. This function is especially useful when dealing with large datasets stored in Avro format, which is a row-oriented, compact binary file format optimized for data serialization.

Function Prototype

```python
pl.read_avro(
    source: 'str | Path | BytesIO | BinaryIO',
    *,
    columns: 'list[int] | list[str] | None' = None,
    n_rows: 'int | None' = None
) -> 'DataFrame'
```

- **The Parameters**
    - source (str | Path | BytesIO | BinaryIO):
        - This is the path to the Avro file or a file-like object that you want to read.
        - If it's a string or Path, it should be the file path.
        - For file-like objects, they should have a read() method (e.g., a file opened with the built-in open function or a BytesIO object).
    - columns (list[int] | list[str] | None, default None):
        - This parameter allows you to specify which columns to read from the Avro file.
        - You can provide a list of column indices (starting from 0) or a list of column names.
        - If None, all columns are read.
    - n_rows (int | None, default None):
        - Use this to specify the number of rows to read.
        - If set to an integer, the function will stop reading after n_rows rows.
        - If None, it reads the entire file.

- **The Return Value**

- The function returns a DataFrame containing the data read from the Avro file.


### Example 

```python
import polars as pl

# Reading an Avro file
df = pl.read_avro("path/to/your/file.avro")

# Reading with specific columns
df = pl.read_avro("path/to/file.avro", columns=["column1", "column2"])

# Reading a fixed number of rows
df = pl.read_avro("path/to/file.avro", n_rows=100)
```

### Practical Example

In [11]:
weather = pl.read_avro(WEATHER)

In [17]:
from pprint import pprint
pprint(weather.schema)

OrderedDict([('date', Int64),
             ('cloud_cover', Float64),
             ('sunshine', Float64),
             ('global_radiation', Float64),
             ('max_temp', Float64),
             ('mean_temp', Float64),
             ('min_temp', Float64),
             ('precipitation', Float64),
             ('pressure', Float64),
             ('snow_depth', Float64)])


In [18]:
print(weather.shape)

(15341, 10)


In [15]:
weather.head(10)

date,cloud_cover,sunshine,global_radiation,max_temp,mean_temp,min_temp,precipitation,pressure,snow_depth
i64,f64,f64,f64,f64,f64,f64,f64,f64,f64
19790101,2.0,7.0,52.0,2.3,-4.1,-7.5,0.4,101900.0,9.0
19790102,6.0,1.7,27.0,1.6,-2.6,-7.5,0.0,102530.0,8.0
19790103,5.0,0.0,13.0,1.3,-2.8,-7.2,0.0,102050.0,4.0
19790104,8.0,0.0,13.0,-0.3,-2.6,-6.5,0.0,100840.0,2.0
19790105,6.0,2.0,29.0,5.6,-0.8,-1.4,0.0,102250.0,1.0
19790106,5.0,3.8,39.0,8.3,-0.5,-6.6,0.7,102780.0,1.0
19790107,8.0,0.0,13.0,8.5,1.5,-5.3,5.2,102520.0,0.0
19790108,8.0,0.1,15.0,5.8,6.9,5.3,0.8,101870.0,0.0
19790109,4.0,5.8,50.0,5.2,3.7,1.6,7.2,101170.0,0.0
19790110,7.0,1.9,30.0,4.9,3.3,1.4,2.1,98700.0,0.0


### Writing `avro` Data Format