# Data types and Apache Arrow

In [None]:
import polars as pl

csv_file = 'Titanic.csv'

df = pl.read_csv(csv_file)

As in pandas, every column in a polars DataFrame has a datatype (dtype). It can be easily checked with .schema attribute.

In [None]:
df.schema

{'PassengerId': Int64,
 'Survived': Int64,
 'Pclass': Int64,
 'Name': Utf8,
 'Sex': Utf8,
 'Age': Float64,
 'SibSp': Int64,
 'Parch': Int64,
 'Ticket': Utf8,
 'Fare': Float64,
 'Cabin': Utf8,
 'Embarked': Utf8}

Alternatively, a list of dtypes with no column names can be accessed via dtypes attribute:

In [None]:
df.dtypes

[Int64,
 Int64,
 Int64,
 Utf8,
 Utf8,
 Float64,
 Int64,
 Int64,
 Utf8,
 Float64,
 Utf8,
 Utf8]

### Apache Arrow

The key difference is that dtypes in a Series or a DataFrame in polars come from the Apache Arrow, while the data types in pandas are typically a mix on NumPy, Pyhton or some other custom extensions. Apache Arrow represents a better way to represent tabular data in memory. Polars uses the implementation of Arrow from the Rust library - Arrow2. Some of the key advantages of using Arrow:

* data sharing without copying (e.g. zero-copy)
* faster vectorized calculations
* consistent representations of missing data

All the above mentioned makes polars faster. Although to_arrow method is available that can translate a polars DataFrame object directly into Arrow object, it is rarely needed in practice and is all done automatically by polars.

In [None]:
df.to_arrow()

pyarrow.Table
PassengerId: int64
Survived: int64
Pclass: int64
Name: large_string
Sex: large_string
Age: double
SibSp: int64
Parch: int64
Ticket: large_string
Fare: double
Cabin: large_string
Embarked: large_string
----
PassengerId: [[1,2,3,4,5,...,887,888,889,890,891]]
Survived: [[0,1,1,1,0,...,0,1,0,1,0]]
Pclass: [[3,1,3,1,3,...,2,1,3,1,3]]
Name: [["Braund, Mr. Owen Harris","Cumings, Mrs. John Bradley (Florence Briggs Thayer)","Heikkinen, Miss. Laina","Futrelle, Mrs. Jacques Heath (Lily May Peel)","Allen, Mr. William Henry",...,"Montvila, Rev. Juozas","Graham, Miss. Margaret Edith","Johnston, Miss. Catherine Helen "Carrie"","Behr, Mr. Karl Howell","Dooley, Mr. Patrick"]]
Sex: [["male","female","female","female","male",...,"male","female","female","male","male"]]
Age: [[22,38,26,35,35,...,27,19,null,26,32]]
SibSp: [[1,1,0,1,0,...,0,0,1,0,0]]
Parch: [[0,0,0,0,0,...,0,0,2,0,0]]
Ticket: [["A/5 21171","PC 17599","STON/O2. 3101282","113803","373450",...,"211536","112053","W./C. 6607","1113