# Pandas
##### software library in python for data manipulation and analysis
##### offers data structures and operations for manipulating numerical tables and time series
### Pandas main data structures
##### 1. Dataframes-----excel file or tabular data
##### 2. Series-----single colunm and row of a table---the data is arranged in 1-D array
#####               in series we can set index for dataframe
                    

# Series
### Syntax 
##### series_obj=pd.Series( dataobject,[index] ) --------- index is optional

#### 1. from scalar values

In [4]:
import pandas as pd 
s1=pd.Series([1,2,3])
print(s1)

0    1
1    2
2    3
dtype: int64


In [5]:
s2=pd.Series([1,2,3],index=['a','b','c'])
print(s2)

a    1
b    2
c    3
dtype: int64


#### 2. from dictionary

In [6]:
s3=pd.Series({1:'a',2:'b',3:'c'})
print(s3)

1    a
2    b
3    c
dtype: object


#### 3. from numpy

In [8]:
import numpy as np
a=np.array([3,5,7])
s=pd.Series(a)
print(s)

0    3
1    5
2    7
dtype: int64


In [11]:
import numpy as np
a=np.array([3,5,7])
s=pd.Series(a,index=['!','@','#'])
print(s)

!    3
@    5
#    7
dtype: int64


## Accessing elements of Series

### Indexing 
#### 1.positional indexing --> accessing element based on position
#### 2.labelled indexing --> accessing element based on the index

In [19]:
s2=pd.Series([1,2,3],index=['a','b','c'])
print(s2)
print(s2[1])    # print(s2.iloc[1])
print(s2['b'])

a    1
b    2
c    3
dtype: int64
2
2


  print(s2[1])    # print(s2.iloc[1])


### Slicing
#### positional --> excludes last element
#### labelled --> includes last element

In [21]:
s2=pd.Series([10,20,30,40,50],index=['a','b','c','d','e'])
print(s2[0:2])
print(s2['a':'c'])

a    10
b    20
dtype: int64
a    10
b    20
c    30
dtype: int64


## Attributes of Series
#### name  --> assign name to series
#### index.name --> can assign colunm name
#### size --> gives the size of the array 
#### empty --> returns True if array is empty
#### values --> prints only values without index

In [32]:
s2=pd.Series([[1,2,3,4,5],[6,7,8,9,10],11])
print(s2)

0     [1, 2, 3, 4, 5]
1    [6, 7, 8, 9, 10]
2                  11
dtype: object


In [33]:
s2.name='Marks'
s2

0     [1, 2, 3, 4, 5]
1    [6, 7, 8, 9, 10]
2                  11
Name: Marks, dtype: object

In [34]:
s2.index.name='roll no'
s2

roll no
0     [1, 2, 3, 4, 5]
1    [6, 7, 8, 9, 10]
2                  11
Name: Marks, dtype: object

In [35]:
s2.size

3

In [36]:
s2.empty

False

In [37]:
s2.values

array([list([1, 2, 3, 4, 5]), list([6, 7, 8, 9, 10]), 11], dtype=object)

## Methods of Series
#### head(n) --> prints first n values --> by default first 5 values
#### tail(n) --> prints last n values --> by default last 5 values
#### count() --> count no of values

In [38]:
s2.head(1)

roll no
0    [1, 2, 3, 4, 5]
Name: Marks, dtype: object

In [41]:
s2.head(-2)

roll no
0    [1, 2, 3, 4, 5]
Name: Marks, dtype: object

In [42]:
s2.head(-1)

roll no
0     [1, 2, 3, 4, 5]
1    [6, 7, 8, 9, 10]
Name: Marks, dtype: object

In [43]:
s2.head()

roll no
0     [1, 2, 3, 4, 5]
1    [6, 7, 8, 9, 10]
2                  11
Name: Marks, dtype: object

In [44]:
s2.tail(1)

roll no
2    11
Name: Marks, dtype: object

In [45]:
s2.tail(-1)

roll no
1    [6, 7, 8, 9, 10]
2                  11
Name: Marks, dtype: object

In [46]:
s2.tail(-2)

roll no
2    11
Name: Marks, dtype: object

In [47]:
s2.tail()

roll no
0     [1, 2, 3, 4, 5]
1    [6, 7, 8, 9, 10]
2                  11
Name: Marks, dtype: object

In [48]:
s2.count()

3

## Mathematical operations

In [55]:
#add performed for same index , NaN is printed for non-matching indexes
a=pd.Series([1,2,3,4,15],index=['a',2,'c',4,'e'])
b=pd.Series([6,7,8,9,10],index=['a','b','c',4,5])
a+b

2     NaN
4    13.0
5     NaN
a     7.0
b     NaN
c    11.0
e     NaN
dtype: float64

In [56]:
a.add(b,fill_value=3) # missing value is replaced with 3 and then add is performed

2     5.0
4    13.0
5    13.0
a     7.0
b    10.0
c    11.0
e    18.0
dtype: float64

In [57]:
a-b # a.sub(b) is also same

2    NaN
4   -5.0
5    NaN
a   -5.0
b    NaN
c   -5.0
e    NaN
dtype: float64

# Dataframes
### can be created in 4 ways
#### 1. from ndarray
#### 2. list of dicts
#### 3. dict of lists
#### 4. from Series

#### 1. from ndarray

In [71]:
s2=np.array([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15]])
pd.DataFrame(s2)

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,5
1,6,7,8,9,10
2,11,12,13,14,15


#### 2. list of dicts

In [82]:
l=[{1:'a',2:'b'},{1:'c',4:'d',7:'e'}]
pd.DataFrame(l)

Unnamed: 0,1,2,4,7
0,a,b,,
1,c,,d,e


#### 3.dict of lists

In [84]:
d={1:['a','b'],2:['c','d'],3:['e','c']} # arrays must be of same length
pd.DataFrame(d)

Unnamed: 0,1,2,3
0,a,c,e
1,b,d,c


In [85]:
d={1:['a','b'],1:['c','d'],2:['e','c']} # arrays must be of same length
pd.DataFrame(d)

Unnamed: 0,1,2
0,c,e
1,d,c


#### 4.from Series

In [86]:
a=pd.Series([1,2,3,4,15],index=['a',2,'c',4,'e'])
b=pd.Series([6,7,8,9,10],index=['a','b','c',4,5])
pd.DataFrame([a,b])

Unnamed: 0,a,2,c,4,e,b,5
0,1.0,2.0,3.0,4.0,15.0,,
1,6.0,,8.0,9.0,,7.0,10.0


### Operations