# Introduction to nested dtypes: List, Struct and Object
By the end of this lecture you will be able to:
- create columns with List, Struct and Object dtypes
- explain the difference between the List, Struct and Object dtypes
- unnest the fields in a Struct dtype

In [None]:
import polars as pl

### `pl.List` dtype
With a `pl.List` dtype each row is a `Series` and each `Series` has the same dtype.

We can create a `pl.List` column manually with a Python `list` *where all elements of the `list` have the same type or can be cast to the same type e.g. `int` to `float`*

In [None]:
dfLists = pl.DataFrame({
    'ints':[ 
        [0,1], 
        [2,3]
    ],
    'floats':[ 
        [0.0,1], 
        [2,3]
    ],
    'strings':[ 
        ["0","1"],
        ["2","3"]
    ]
})
dfLists

We cover the `pl.List` dtype in the lectures that follow

## Object dtype
We create a column with an object dtype when the lists cannot be cast to a homogenous type

In [None]:
dfObject = pl.DataFrame({
    'mixed':[ 
        ['a',0],
        ['b',1]
    ]
})
dfObject

The "list" on each row in a **`pl.Object`** column is a standard python `list` under the hood.

In [None]:
dfObject[0,0]

In [None]:
type(dfObject[0,0])

Operations on a `pl.Object` column are slow as the operations are working with slow Python `lists` rather than fast Polars `Series`.

We generally want to avoid working with a `pl.Object` dtype if possible. For example, it may be better to cast integers to strings to have a string `pl.List` column rather than a `pl.Object` column.

## `pl.Struct` dtype
The `pl.Struct` dtype also has a collection of data on each row.

We create a `pl.Struct` column by passing a list of `dicts` where:
- the `dict` on each row has the same keys
- the values for each key on each row have the same dtype

In [None]:
dfStructs = (
    pl.DataFrame(
        {
            "year":[2020,2021],
            "trades":[
                {"exporter":"India","importer":"USA","quantity":0.0},
                {"exporter":"India","importer":"USA","quantity":1.5},
            ]
          }
    )
)
dfStructs

The keys in a struct column are called `fields`.

We can list the keys with `struct.fields` on a `Series`

In [None]:
dfStructs["trades"].struct.fields

## Accessing  `pl.Struct` fields

We access fields within a struct column in an expression

In [None]:
(
    dfStructs
    .select(
        pl.col("trades").struct.field("exporter")
    )
)

## Extracting data from a `pl.Struct`

We can convert a struct `Series` to be its own multi-column `DataFrame`

In [None]:
dfStructs["trades"].struct.unnest()

We can also un-nest a `pl.Struct` column to become columns in the `DataFrame`

In [None]:
dfStructs.unnest("trades")

Struct columns are useful for having nested data within a column of a `DataFrame`.

In this example we keep the `quantity` field at the top level of the `pl.Struct` but move the `importer`/`exporter` fields into a nested level within the `pl.Struct`

In [None]:
dfStructsDeep = pl.DataFrame({'trades':[
        {
            "countries":{"exporter":"India","importer":"USA"},
            "quantity":0.0
        },
        {
            "countries":{"exporter":"India","importer":"USA"},
            "quantity":1.5
        },
    ]
  })
dfStructsDeep

We can do fast operations on a `pl.Struct` dtype because we are working with Polars objects rather than python `lists`.

## Exercises
In the quiz in this Section you will develop your understanding of:
- creating `pl.List` columns
- creating `pl.Object` columns
- creating `pl.Struct` columns