# Intro to Data Structures
http://pandas.pydata.org/pandas-docs/stable/dsintro.html

In [1]:
#Loading packages 

import numpy as np
import pandas as pd

## Series
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

In [6]:
# Setting the index of a Series

s = pd.Series(np.random.randn(5), index=['a','b','c','d','e'])
print(pd.Series(np.random.randn(5)))
print(s)
print(s.index)

0   -1.614897
1    0.609289
2    0.059510
3    0.286530
4    0.210087
dtype: float64
a   -1.774420
b   -0.679498
c   -1.310559
d    1.359342
e    1.259230
dtype: float64
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')


In [8]:
# instantiation from a dict

d = {'b':1,'a':0,'c':2}
pd.Series(d)

b    1
a    0
c    2
dtype: int64

In [9]:
# Instantiation from a scalar value
pd.Series(5,index =['a','b','c'])

a    5
b    5
c    5
dtype: int64

## Series is ndarray-like
Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index.

In [12]:
print(s[0])
print(s[:3])
print(s[s>s.median()])


-1.7744202459969702
a   -1.774420
b   -0.679498
c   -1.310559
dtype: float64
d    1.359342
e    1.259230
dtype: float64


In [19]:
'e' in s

True

# Vectorized operations and label alignment with Series

In [20]:
s + s 

a   -3.548840
b   -1.358995
c   -2.621119
d    2.718684
e    2.518461
dtype: float64

In [21]:
s *2

a   -3.548840
b   -1.358995
c   -2.621119
d    2.718684
e    2.518461
dtype: float64

In [22]:
np.exp(s)

a    0.169582
b    0.506872
c    0.269669
d    3.893630
e    3.522709
dtype: float64

In [23]:
s[1:]+s[:-1]

a         NaN
b   -1.358995
c   -2.621119
d    2.718684
e         NaN
dtype: float64

In [29]:
s = s.rename('Test')
print(s.name)

Test


## DataFrame

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input, such as
a dict of 1D ndarrays, lists, dicts, or Series
2-D numpy.ndarray
structured or record ndarray
a Series
another DataFrame.

In [30]:
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

In [32]:
df = pd.DataFrame(d)

In [33]:
df.head()

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


In [34]:
pd.DataFrame(d, index=['d','b','a'], columns=['two', 'andy'])

Unnamed: 0,two,andy
d,4.0,
b,2.0,
a,1.0,


In [35]:
df.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [36]:
df.columns

Index(['one', 'two'], dtype='object')