# Data Structures 

In [1]:
import numpy as np
import pandas as pd

 Here is a basic tenet to keep in mind: **data alignment is intrinsic**. The link between labels and data will not be broken unless done so explicitly by you.

## Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call: 

In [2]:
# s= pd.Series(data, index=index)

#### from ndarray 

If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].

In [3]:
s=pd.Series(np.random.rand(5), index=['a', 'b', 'c','d','e'])

In [4]:
s

a    0.148030
b    0.234606
c    0.359841
d    0.586602
e    0.198871
dtype: float64

In [5]:
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

Note pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time. The reason for being lazy is nearly all performance-based (there are many instances in computations, like parts of GroupBy, where the index is not used).

#### from dict 

If data is a dict, if index is passed the values in data corresponding to the labels in the index will be pulled out. Otherwise, an index will be constructed from the sorted keys of the dict, if possible.

In [6]:
d = {'a':0., 'b': 1., 'c':2.}

In [7]:
pd.Series(d)

a    0.0
b    1.0
c    2.0
dtype: float64

In [8]:
pd.Series(d, index = ['b', 'c', 'd', 'a'])

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64

Note NaN (not a number) is the standard missing data marker used in pandas

#### from scalar value 

From scalar value If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

In [9]:
pd.Series(5., index=['a', 'b', 'c'])

a    5.0
b    5.0
c    5.0
dtype: float64

#### slicing Series

Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, things like slicing also slice the index.

In [10]:
s[:3]

a    0.148030
b    0.234606
c    0.359841
dtype: float64

#### indexing Labels

A Series is like a fixed-size dict in that you can get and set values by index label:

In [11]:
s['e']

0.19887113674925827

In [12]:
s.e

0.19887113674925827

In [13]:
'e' in s

True

In [14]:
'f' in s

False

#### Vectorized Operations

In [15]:
s

a    0.148030
b    0.234606
c    0.359841
d    0.586602
e    0.198871
dtype: float64

In [16]:
s + s

a    0.296060
b    0.469212
c    0.719682
d    1.173204
e    0.397742
dtype: float64

In [17]:
np.exp(s)

a    1.159547
b    1.264410
c    1.433102
d    1.797869
e    1.220025
dtype: float64

A key difference between Series and ndarray is that operations between Series automatically align the data based on label. Thus, you can write computations without giving consideration to whether the Series involved have the same labels.

The result of an operation between unaligned Series will have the union of the indexes involved. If a label is not found in one Series or the other, the result will be marked as missing NaN. 

In [18]:
s[1:] + s[:-1]

a         NaN
b    0.469212
c    0.719682
d    1.173204
e         NaN
dtype: float64

**Note:**
In general, we chose to make the default result of operations between differently indexed objects yield the union of the indexes in order to avoid loss of information. Having an index label, though the data is missing, is typically important information as part of a computation. You of course have the option of dropping labels with missing data via the dropna function.

In [19]:
n = s[1:] + s[:-1]

n.dropna()

b    0.469212
c    0.719682
d    1.173204
dtype: float64

#### Name Attributes

In [20]:
s = pd.Series(np.random.randn(5), name = 'something')

s

0   -0.039442
1   -1.585085
2    1.284594
3    0.964915
4   -0.254935
Name: something, dtype: float64

You can rename a Series with the pandas.Series.rename() method. Note that s and s2 refer to different objects.

In [21]:
s2 = s.rename('different')

s2.name

'different'