In [69]:
import pandas as pd
import numpy as np

### Series
1D labeled array capable of holding any data type (integers,strings,float,objects..)  
The axis-labels are referred to as index.  
Eg: s = pd.Series(data,index=index)  

Here, data can be many different things:  
   -  a python dic
   - an ndarray
   - a scaler value (like 5)

In [70]:
s = pd.Series(np.random.randn(5),index=list('abcde'))

In [71]:
s

a    1.307574
b    0.329669
c   -0.641361
d    0.528699
e   -0.793238
dtype: float64

In [72]:
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [73]:
pd.Series(np.random.randn(5))

0    1.264824
1   -0.493394
2    0.212319
3   -0.901024
4    1.133160
dtype: float64

### From dict
If data is a dict, if index is passed the values in data; corresponding values to the labels in the index will be pulled out.  
Otherwise, an index will be constructed from the sorted keys of the dict, if possible.

In [74]:
d = {'a':0.,'b':1.,'c':2.}

In [75]:
pd.Series(d)

a    0.0
b    1.0
c    2.0
dtype: float64

In [76]:
pd.Series(d,index=list('bceda'))

b    1.0
c    2.0
e    NaN
d    NaN
a    0.0
dtype: float64

From scalar value If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

In [77]:
pd.Series(5.,index=list('abcd'))

a    5.0
b    5.0
c    5.0
d    5.0
dtype: float64

### Series as ndarray
Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, things like slicing also slice the index.

In [78]:
s[0]

1.307574353133605

In [79]:
s[2:]

c   -0.641361
d    0.528699
e   -0.793238
dtype: float64

In [80]:
s[s>s.median()]

a    1.307574
d    0.528699
dtype: float64

In [81]:
s[[4,2,0]]

e   -0.793238
c   -0.641361
a    1.307574
dtype: float64

In [82]:
np.exp(s)

a    3.697195
b    1.390508
c    0.526575
d    1.696723
e    0.452378
dtype: float64

In [83]:
s

a    1.307574
b    0.329669
c   -0.641361
d    0.528699
e   -0.793238
dtype: float64

### Series as dict
A Series is like a fixed-size dict in that you can get and set values by index label:

In [84]:
s['a']

1.307574353133605

In [85]:
s['e']=12.23332

In [86]:
'e' in s

True

In [87]:
s.get('f',np.nan)

nan

You can access an index on a Series,column on a DataFrame and an item on Panel directly as an attribute, as well as modify it.

In [88]:
s.a

1.307574353133605

#### Vectorized operation and label alignment with Series
When doing data analysis, as with raw NumPy arrays looping through Series value-by-value is usually not necessary. Series can also be passed into most NumPy methods expecting an ndarray.

In [89]:
s+s

a     2.615149
b     0.659338
c    -1.282722
d     1.057397
e    24.466640
dtype: float64

In [90]:
s*2

a     2.615149
b     0.659338
c    -1.282722
d     1.057397
e    24.466640
dtype: float64

A key difference between ndarray and Series is that operations Series automatically aligns the data based on label.  
Thus you can write computations without giving considerations to whether the Series involved have the same lables.

In [91]:
s[1:]+s[:-1]

a         NaN
b    0.659338
c   -1.282722
d    1.057397
e         NaN
dtype: float64

In [92]:
s2 = pd.Series(np.random.randn(4),index=list('abcd'))

In [93]:
s+s2

a    0.246488
b   -0.070441
c   -0.762355
d    1.154501
e         NaN
dtype: float64

The result of an operation between unaligned Seris will have the union of the indices involved. If a label is not found in __either__ of the series, the result will be marked as missing __NaN__.  
You of course have the option of dropping the labels with missing data via the __dropna__ function.

## DataFrame
__DataFrame__ is a 2D labeled data structure with columns of potentially different types.
It can accept different types of inputs like,
- Dict of 1D ndarray,lists,dicts or Series
- 2-D numpy.ndarray
- Structured or record ndarray
- A Series
- Another DataFrame

Along with data, you can optionally pass __index__(row labels) and __columns__(column labels) args. 
Thus a dict of Series plus a specific index will discard all data not matching up to the passed index.

### From dict of Series or dicts
The result __index__ will be the union of the indexes of the various Series.  
If there are many nested dicts, these will be first converted into Series.  
If no columns are passed, the columns will be the sorted list of dict keys.

In [94]:
d = {'one':pd.Series([1.,2.,3.],index=list('abc')),
     'two':pd.Series(np.random.randn(4),index=list('bcde')),
    'three':pd.Series(np.random.randn(4),index=list('abcd'))
    }

In [95]:
df = pd.DataFrame(d)

In [97]:
df

Unnamed: 0,one,three,two
a,1.0,-0.346026,
b,2.0,0.166429,1.706309
c,3.0,-1.478164,-0.61804
d,,-0.400184,-2.549495
e,,,0.585074


In [116]:
pd.DataFrame(d,index=list('dba'))

Unnamed: 0,one,three,two
d,,-0.400184,-2.549495
b,2.0,0.166429,1.706309
a,1.0,-0.346026,


In [119]:
pd.DataFrame(d,index=list('dab'),columns=['one','two'])

Unnamed: 0,one,two
d,,-2.549495
a,1.0,
b,2.0,1.706309


In [100]:
pd.DataFrame({"a":{"b":[1,2,3],"c":2},"a1":{"c":4}})

Unnamed: 0,a,a1
b,"[1, 2, 3]",
c,2,4.0


In [101]:
df.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [102]:
df.columns

Index(['one', 'three', 'two'], dtype='object')

#### From dict of ndarray / lists
The ndarrays must all be the same length.  
If an index is passed, it must be of same length as the arrays.  
if no index is passed, the result will be range(n), where n is array length.

In [123]:
pd.DataFrame(
    {
    'one':[1.,2.,3.,4.],
    'two':range(4)
    },
    index=list('abcd')
)

Unnamed: 0,one,two
a,1.0,0
b,2.0,1
c,3.0,2
d,4.0,3


#### From structured to record array

In [104]:
data = np.zeros((2,), dtype=[('A', 'i4'),('B', 'f4'),('C', 'a10')])

In [105]:
data

array([(0,  0., b''), (0,  0., b'')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

In [106]:
data[:]= [(1,2.,'Hello'),(2,3.3,'World')]

In [107]:
pd.DataFrame(data)

Unnamed: 0,A,B,C
0,1,2.0,b'Hello'
1,2,3.3,b'World'


In [108]:
pd.DataFrame(data,index=['first','second'])

Unnamed: 0,A,B,C
first,1,2.0,b'Hello'
second,2,3.3,b'World'


In [109]:
pd.DataFrame(data,columns=['C','A','B'])

Unnamed: 0,C,A,B
0,b'Hello',1,2.0
1,b'World',2,3.3


#### From a list of dicts

In [110]:
 data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]

In [111]:
pd.DataFrame(data2)

Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0


In [112]:
pd.DataFrame(data2,index=['first','second'])

Unnamed: 0,a,b,c
first,1,2,
second,5,10,20.0


In [113]:
pd.DataFrame(data2,columns=['b','a','c'])

Unnamed: 0,b,a,c
0,2,1,
1,10,5,20.0
