# Introduction
- Pandas is a new package built on top of numpy
- Pandas provides efficient implementation of a DataFrame
- Pandas offers a convenient storage interface for labelled data
- Pandas implement a number of powerful data operations familiar to users of both DATABASE FRAMEWORKS and SPREADSHEET PROGRAMS

## DataFrames
- DATAFRAMES are multidimensional arrays with attached rows and column names
- DATAFRAMES are heterogenous in nature and type
- DATAFRAMES are a collection of atleast one or more SERIES

In [1]:
import pandas as pd


## Pandas Objects:
- SERIES: 1d information or 1 column. Consists of Numpy array and index information
- DATA FRAMES: These are a collection of series

In [5]:
s=pd.Series([5,6,7,8,9])

In [6]:
s

0    5
1    6
2    7
3    8
4    9
dtype: int64

In [7]:
type(s.values)

numpy.ndarray

In [8]:
s.values

array([5, 6, 7, 8, 9])

In [9]:
s.index

RangeIndex(start=0, stop=5, step=1)

In [12]:
t = pd.Series([7,8,9,10,11,12], index = ['abc','def','ghi','jkl','lmn','opq'])

In [13]:
t

abc     7
def     8
ghi     9
jkl    10
lmn    11
opq    12
dtype: int64

In [20]:
s[4]

9

In [22]:
t['def']

8

In [27]:
#Difference between Series and Dictionary is that the former can have repeating indexes
t = pd.Series([7,8,9,10,11,12], index = ['abc','abc','abc','abc','abc','abc'])
t

abc     7
abc     8
abc     9
abc    10
abc    11
abc    12
dtype: int64

In [48]:
#Converting dictionary to Series:
db = {'mumbai':10000, 'kashmir':5000,'delhi':3000}

In [49]:
db

{'delhi': 3000, 'kashmir': 5000, 'mumbai': 10000}

In [50]:
sb_s = pd.Series(db)

In [51]:
sb_s

delhi       3000
kashmir     5000
mumbai     10000
dtype: int64

In [52]:
sb_s.index

Index([u'delhi', u'kashmir', u'mumbai'], dtype='object')

In [53]:
sb_s['delhi':'mumbai']

delhi       3000
kashmir     5000
mumbai     10000
dtype: int64

In [54]:
sb_s['delhi'] = 8000

In [55]:
sb_s

delhi       8000
kashmir     5000
mumbai     10000
dtype: int64

In [56]:
s = pd.Series([1,2,3,4], index=['a','b','c','d'])

In [57]:
s['a']

1

In [60]:
t = pd.Series(15,index=[1000,2000,3000])

In [61]:
t

1000    15
2000    15
3000    15
dtype: int64

In [62]:
t[1000]

15

#### Data Frame Object
- SERIES can be analog to a 1d array, whereas DATAFRAMES are analog to 2d array
- DATAFRAMES are a collection of Series objects

In [63]:
s1 = pd.Series([1,2,3,4])
s2 = pd.Series(['a','b','c','d'])
# Note: since there is no index provided the indexes are as per default which are 0,1,2,3

In [66]:
df = pd.DataFrame({'names': s2, 'age':s1})

In [67]:
df

Unnamed: 0,age,names
0,1,a
1,2,b
2,3,c
3,4,d


In [85]:
s1 = pd.Series([1,2,3,4])
s2 = pd.Series(['a','b','c','d'],index=[4,5,6,7])

In [86]:
df = pd.DataFrame({'name': s2, 'age': s1})

In [87]:
#Since the indexes are different, it does not confirm to default indexing as above (0,1,2,3,4)
df

Unnamed: 0,age,name
0,1.0,
1,2.0,
2,3.0,
3,4.0,
4,,a
5,,b
6,,c
7,,d


In [88]:
df.index

Int64Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64')

In [89]:
df['age']

0    1.0
1    2.0
2    3.0
3    4.0
4    NaN
5    NaN
6    NaN
7    NaN
Name: age, dtype: float64

In [90]:
import numpy as np

In [106]:
data = np.random.randint(10,size=(5,5))

In [107]:
data

array([[8, 8, 8, 2, 8],
       [4, 4, 4, 9, 9],
       [1, 6, 1, 6, 4],
       [8, 4, 0, 2, 2],
       [7, 5, 2, 0, 8]])

In [108]:
#How to attach column names and index to a numpy array that was defined randomly
df1 = pd.DataFrame(data, columns = ['a','b','c','d','e'], index = ['r1','r2','r3','r4','r5'])

In [109]:
df1

Unnamed: 0,a,b,c,d,e
r1,8,8,8,2,8
r2,4,4,4,9,9
r3,1,6,1,6,4
r4,8,4,0,2,2
r5,7,5,2,0,8


In [110]:
type(df1)

pandas.core.frame.DataFrame

In [111]:
type(df)

pandas.core.frame.DataFrame

In [112]:
#Columned storage mechanism
df1['d']['r4']

2

In [118]:
#Using :: skips as per the definition and in this case 2
df1[::2]

Unnamed: 0,a,b,c,d,e
r1,8,8,8,2,8
r3,1,6,1,6,4
r5,7,5,2,0,8


## Pandas Index Object
- Adding an index using range. We can define the Index not index as a variable and call it when in need of an index

In [122]:
ab = pd.Index(range(5,10))

In [123]:
data = pd.DataFrame({'Number': range(10,15)},index = ab)

In [124]:
data

Unnamed: 0,Number
5,10
6,11
7,12
8,13
9,14


## Data Indexing and Selection 

#Here we will learn:
- loc : access elements using configured index
- iloc: accesses elements using index location
- ix: if labels are there it behaves like loc else iloc


In [134]:
d = pd.Series(range(10,20), index = range(100,120,2))

In [135]:
d

100    10
102    11
104    12
106    13
108    14
110    15
112    16
114    17
116    18
118    19
dtype: int64

In [136]:
d.loc[110]

15

In [137]:
d.iloc[9]

19