# Introduction to Basic Data Structures in Pandas

Pandas provides two types of classes for handling data:
1. `Series`:  a one dimensional labeled array holding data of any type such as integers, strings, Python objects, etc..
2. `DataFrame`: a two dimensional data structure that holds data like a two-dimension array or a table with rows and columns. 

Fundamentally, data alignment is intrinsic. The link between labels and data will not be broken unless done so explicitly by you.

In [9]:
import numpy as np
import pandas as pd

## Series

The axis labels are referred to as the index. The basic method to create a `Series` is to call:

```py
s = pd.Series(data, index = index)
```
Where `data` can be many different things:

- A Pyhton dict
- A ndarray
- A scalar value like 5

The passed `index` is a list of axis labels. Thus, this separates into a few cases depending on what data is:

### ndarray
If `data` is an ndarray, index must be the same length as data. If the index list is smaller or larger there will be a problem. If no index is passed, one will be created having values `[0, ..., len(data) - 1]`.

In [19]:
s=pd.Series(np.random.rand(5), index=['a','b','c','d','e'])
print(type(s))
s

<class 'pandas.core.series.Series'>


a    0.961785
b    0.605480
c    0.102688
d    0.950922
e    0.112398
dtype: float64

In [11]:
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [12]:
pd.Series(np.random.rand(5))

0    0.782993
1    0.477155
2    0.992089
3    0.315390
4    0.656647
dtype: float64

### Dict

`Series` can be instantiated from dicts

In [13]:
d = {"b": 1, "a": 0, "c": 2}
pd.Series(d)

b    1
a    0
c    2
dtype: int64

If an index is passed, the values in data corresponding to the labels in the index will be pulled out. If the index list is smaller than the number of elements, only the one with label will appear and if the index list is larger there will appear indexes with `NaN`

In [14]:
d = {"a": 0.0, "b": 1.0, "c": 2.0}
pd.Series(d)

a    0.0
b    1.0
c    2.0
dtype: float64

In [21]:
pd.Series(d, index=["b", "c"])

b    1.0
c    2.0
dtype: float64

In [22]:
pd.Series(d, index=["b", "c", "a", "d", "e"])

b    1.0
c    2.0
a    0.0
d    NaN
e    NaN
dtype: float64

### From a Scalar Value
If `data` is a scalar value, an index must be provided, if there is no list of indexes, only one indexed element will appear at 0. The value will be repeated to match the length of index.

In [23]:
pd.Series(8)

a    5
b    5
c    5
dtype: int64

In [24]:
pd.Series(5, index=['a', 'b', 'c'])

0    8
dtype: int64

## Series is ndarray-like

`Series `acts very similarly to a `ndarray` and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index.

In [25]:
s

a    0.961785
b    0.605480
c    0.102688
d    0.950922
e    0.112398
dtype: float64

In [27]:
print(s[0])
s[:3]

0.9617845382948971


a    0.961785
b    0.605480
c    0.102688
dtype: float64

In [29]:
print(s.mean())
s[s>s.mean()]

0.5466547632641244


a    0.961785
b    0.605480
d    0.950922
dtype: float64

In [30]:
s[[4, 3, 1]]

e    0.112398
d    0.950922
b    0.605480
dtype: float64

In [31]:
np.exp(s)

a    2.616361
b    1.832132
c    1.108146
d    2.588096
e    1.118959
dtype: float64