## Pandas Data Structures – Overview

Let us understand the details with respect to Pandas.
* Pandas is not a core Python module and hence we need to install using pip - `pip install pandas`.
* It has 2 types of data structures - `Series` and `DataFrame`.
* `Series` is a one dimension array while `DataFrame` is a two dimension array.
* `Series` only contains index for each row and one attribute or column.
* `DataFrame` contains index for each row and multiple columns.
* Each attribute in the DataFrame is nothing but a Series.
* We can perform all standard transformations using Pandas APIs
* We also have SQL based wrappers on top of Pandas where we can write queries.
Here are the steps to get started with Pandas Data Structures:
* Make sure Pandas library is installed using `pip`.
* Import Pandas library - `import pandas as pd`
* We need to have a collection or data in a file to create Pandas Data Structures.
* Use appropriate APIs on the data to create Pandas Data Structures.
  * `Series` for single dimension array.
  * `DataFrame` for two dimension array.

```{note}
Typically we use `Series` for list of regular objects or dict and `DataFrame` for list of tuples or list of dicts. Let us use list for `Series` and list of dicts for `DataFrame`.
```

In [None]:
%pip install pandas

In [1]:
import pandas as pd

In [2]:
sals_l = [1500.0, 2000.0, 2200.00]

In [3]:
pd.Series?

[1;31mInit signature:[0m
[0mpd[0m[1;33m.[0m[0mSeries[0m[1;33m([0m[1;33m
[0m    [0mdata[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mindex[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mdtype[0m[1;33m:[0m [1;34m'Dtype | None'[0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mname[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mcopy[0m[1;33m:[0m [1;34m'bool'[0m [1;33m=[0m [1;32mFalse[0m[1;33m,[0m[1;33m
[0m    [0mfastpath[0m[1;33m:[0m [1;34m'bool'[0m [1;33m=[0m [1;32mFalse[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for performing operations involving the index. Statistical
methods from ndarray have been overridden to automatically exc

In [4]:
sals_s = pd.Series(sals_l, name='sal')

In [5]:
sals_s

0    1500.0
1    2000.0
2    2200.0
Name: sal, dtype: float64

In [6]:
sals_s[:2]

0    1500.0
1    2000.0
Name: sal, dtype: float64

In [7]:
sals_ld = [(1, 1500.0), (2, 2000.0), (3, 2200.00)]

In [8]:
pd.DataFrame?

[1;31mInit signature:[0m
[0mpd[0m[1;33m.[0m[0mDataFrame[0m[1;33m([0m[1;33m
[0m    [0mdata[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mindex[0m[1;33m:[0m [1;34m'Axes | None'[0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mcolumns[0m[1;33m:[0m [1;34m'Axes | None'[0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mdtype[0m[1;33m:[0m [1;34m'Dtype | None'[0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mcopy[0m[1;33m:[0m [1;34m'bool | None'[0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Data structure also contains labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be
thought of as a dict-like container for Series objects. The primary
pandas data structure.

Parameters
----------
data : ndarray (structured or hom

In [9]:
sals_df = pd.DataFrame(sals_ld, columns=['id', 'sal'])

In [10]:
sals_df

Unnamed: 0,id,sal
0,1,1500.0
1,2,2000.0
2,3,2200.0


In [11]:
sals_df['id']

0    1
1    2
2    3
Name: id, dtype: int64