### Pandas (Panel Data-s): 
- Open source python library providing a data analysis with powerful data structure.
- Data Structures in form of **Series, DataFrame**.
  - A **Series** is one one-dimensional array-like object that provides us with many ways to index data.
    - Series acts like an ndarray, but it supports many data types(integers, strings, floating point numbers,Python objects etc) as a part of the array.
    - The axis labels are collectively referred to as the index, and we can get and set values by these index labels.
  - A **DataFrame** is a 2-D elastic data structure that supports heterogeneous data with labeled axis for rows and columns.
    - Arithmetic operations can appear on both row and column labels.    
- Pandas provides a very efficient library for breaking data sets, transforming, and recombining.
- Pandas simplifies the exploratory data analysis(EDA).
- Pandas library handles **time-series** data effectively via native methods it provides to ingest, transform, and analyze time-series data.
- Pandas have the ability to take advantage of native methods to handle missing data and data pivoting, easy data sorting, and description capabilities, fast generation of data plots, and Boolean indexing for fast image processing and other masking operations, etc

**Import:** Import the Pandas package as pd and then use pd to access the functions from that package.

In [1]:
import pandas as pd

### Pandas Series:  
- Series is a 1-D labeled array capable of holding any data type (Integers,strings,floating-point number,python objects etc). The axis labels are collectively referred to as the index.
- Syntax: pd.Series(data, index=index)
  - Data can be:- an n-D array, - a python dictionary, - a scalar value(like 5)
  - The passed index is a list of axis labels


- **Create series with default index:**

In [2]:
s1=pd.Series(['Python','C','C++'])
s1

0    Python
1         C
2       C++
dtype: object

- **Indexing manually:**

In [3]:
s1.index=['a','b','c']
s1

a    Python
b         C
c       C++
dtype: object

- **Create series with manual index:**

In [4]:
s1=pd.Series(data=['Python','C','C++'],index=['a','b','c'])
s1

a    Python
b         C
c       C++
dtype: object

In [5]:
s1=pd.Series(['Python','C','C++'],index=range(1,4))
s1

1    Python
2         C
3       C++
dtype: object

In [6]:
s1=pd.Series(['Python','C','C++'],index=pd.RangeIndex(1,4))
s1

1    Python
2         C
3       C++
dtype: object

In [7]:
s2=pd.Series(['drama', 'love', 'love', 'action', 'drama', 'thriller', 'drama', 'comedy', 'animation', 'love'])
s2

0        drama
1         love
2         love
3       action
4        drama
5     thriller
6        drama
7       comedy
8    animation
9         love
dtype: object

- **Check index label:**

In [8]:
s2.index

RangeIndex(start=0, stop=10, step=1)

- **Check dimension:**

In [9]:
s2.ndim

1

- **Check shape:**

In [10]:
s2.shape

(10,)

- **Check element datatype:**

In [11]:
s2.dtypes

dtype('O')

- **Check unique elements:**

In [12]:
s2.unique()

array(['drama', 'love', 'action', 'thriller', 'comedy', 'animation'],
      dtype=object)

- **Check no. of unique elements:**

In [13]:
s2.nunique()

6

- **Check frequency of each unique elements:**

In [14]:
s2.value_counts()

love         3
drama        3
action       1
animation    1
thriller     1
comedy       1
dtype: int64

- **Check whether each elements of series is unique:**

In [15]:
s2.is_unique

False

### **Access Data:**
- **Slicing require index value of series:**

**NOTE: Pass only those index labels which are defined either manually or default by pandas.**

In [16]:
s2[1]

'love'

- **Get the value out of the location using loc():**

In [17]:
s2.loc[1]

'love'

- **Access multiple locations:** Pass array of indexes

In [18]:
s2.loc[[1, 4]]

1     love
4    drama
dtype: object

- **Access data at the series location using a numeric indices:**

In [19]:
s2.iloc[1:5:3]

1     love
4    drama
dtype: object

- **Convert to nArrays datatype:**

In [20]:
s2.values

array(['drama', 'love', 'love', 'action', 'drama', 'thriller', 'drama',
       'comedy', 'animation', 'love'], dtype=object)

- **Convert to list datatype:**

In [21]:
s2.tolist()

['drama',
 'love',
 'love',
 'action',
 'drama',
 'thriller',
 'drama',
 'comedy',
 'animation',
 'love']

### Pandas DataFrame:
- 2-D labeled data structure with columns of different types.
- Represent and work with tabular data.
- Accepts different kinds of input (suchas list, dict, series, numpy, ndarrays, another dataframe)


- **Create DataFrame with default row index and column label:**

In [22]:
df1 = pd.DataFrame(['Python','C','C++'])
df1

Unnamed: 0,0
0,Python
1,C
2,C++


- **Now changing row index and column label:**

In [23]:
df1.columns = ['A']
df1.index = ['a','b','c']
df1

Unnamed: 0,A
a,Python
b,C
c,C++


- **Create DataFrame with manual row index and column label:**

In [24]:
df1 = pd.DataFrame(['Python','C','C++'], columns=['A'], index=['a','b','c'])
df1

Unnamed: 0,A
a,Python
b,C
c,C++


In [25]:
df2 = pd.DataFrame([['abc',23,4500,'Delhi'],['xyz',26,5000,'Mumbai'],['prq',25,6200,'Pune']])
df2

Unnamed: 0,0,1,2,3
0,abc,23,4500,Delhi
1,xyz,26,5000,Mumbai
2,prq,25,6200,Pune


In [26]:
df2.columns=['name', 'age', 'salary', 'location']
df2

Unnamed: 0,name,age,salary,location
0,abc,23,4500,Delhi
1,xyz,26,5000,Mumbai
2,prq,25,6200,Pune


- **Create DataFrame from list of Python dictionaries:**

In [27]:
l = [{'name': 'abc', 'age': 23, 'salary':4500, 'location':'Delhi'}, 
     {'name': 'xyz', 'age': 26, 'salary':5000, 'location':'Mumbai'},
     {'name': 'pqr', 'age': 25, 'salary':6200, 'location':'Pune'}]
df2= pd.DataFrame(l)
df2

Unnamed: 0,age,location,name,salary
0,23,Delhi,abc,4500
1,26,Mumbai,xyz,5000
2,25,Pune,pqr,6200


- **Create DataFrame from dictionary of Python Series:**`

In [28]:
d= {'name' : pd.Series(['abc','xyz','pqr'], index=[0,1,2]), 
    'age' : pd.Series([23,26,25], index=[0,1,2]),
    'salary' : pd.Series([4500,5000,6200,], index=[0,1,2]),
    'location' : pd.Series(['Delhi','Mumbai', 'Pune'], index=[0,1,2])}
df2= pd.DataFrame(d)
df2

Unnamed: 0,name,age,salary,location
0,abc,23,4500,Delhi
1,xyz,26,5000,Mumbai
2,pqr,25,6200,Pune


In [29]:
df2['name']
# type(df2['name'])

0    abc
1    xyz
2    pqr
Name: name, dtype: object

In [30]:
df2[['name']]

Unnamed: 0,name
0,abc
1,xyz
2,pqr


In [31]:
df2[['name','location']]

Unnamed: 0,name,location
0,abc,Delhi
1,xyz,Mumbai
2,pqr,Pune


In [32]:
df2.loc[0:1, 'age':'location']

Unnamed: 0,age,salary,location
0,23,4500,Delhi
1,26,5000,Mumbai


In [33]:
df2.loc[[0,1], ['age','salary','location']]

Unnamed: 0,age,salary,location
0,23,4500,Delhi
1,26,5000,Mumbai


In [34]:
df2.iloc[0:2, 1:]

Unnamed: 0,age,salary,location
0,23,4500,Delhi
1,26,5000,Mumbai


In [35]:
df2.iloc[[0,1], [1,2,3]]

Unnamed: 0,age,salary,location
0,23,4500,Delhi
1,26,5000,Mumbai


- **Access a row with specific condition:**

In [36]:
df2[df2.salary>5000]

Unnamed: 0,name,age,salary,location
2,pqr,25,6200,Pune


- **Adding new column to DataFrame:**

In [37]:
df2['post'] = ['p1','p2','p3']
df2

Unnamed: 0,name,age,salary,location,post
0,abc,23,4500,Delhi,p1
1,xyz,26,5000,Mumbai,p2
2,pqr,25,6200,Pune,p3


- **Adding column at particular column index:**

In [38]:
df2.insert(0,'emp_ID', [101, 102, 103])
df2

Unnamed: 0,emp_ID,name,age,salary,location,post
0,101,abc,23,4500,Delhi,p1
1,102,xyz,26,5000,Mumbai,p2
2,103,pqr,25,6200,Pune,p3


- **Adding new row to DataFrame:**

In [39]:
df2.loc[3] = [104, 'rst', 22, 2100, 'Hyderabad', 'p4']
df2

Unnamed: 0,emp_ID,name,age,salary,location,post
0,101,abc,23,4500,Delhi,p1
1,102,xyz,26,5000,Mumbai,p2
2,103,pqr,25,6200,Pune,p3
3,104,rst,22,2100,Hyderabad,p4


In [40]:
type(df2)

pandas.core.frame.DataFrame

In [41]:
df2.ndim

2

In [42]:
df2.index

Int64Index([0, 1, 2, 3], dtype='int64')

In [43]:
df2.columns

Index(['emp_ID', 'name', 'age', 'salary', 'location', 'post'], dtype='object')

In [44]:
df2.shape

(4, 6)

In [45]:
df2.dtypes

emp_ID       int64
name        object
age          int64
salary       int64
location    object
post        object
dtype: object

In [46]:
df2.location.unique()

array(['Delhi', 'Mumbai', 'Pune', 'Hyderabad'], dtype=object)

In [47]:
df2['location'].nunique()

4

In [48]:
df2.location.value_counts()

Mumbai       1
Pune         1
Delhi        1
Hyderabad    1
Name: location, dtype: int64

- **Sorting:**

In [49]:
df2.sort_values('salary', inplace=False)

Unnamed: 0,emp_ID,name,age,salary,location,post
3,104,rst,22,2100,Hyderabad,p4
0,101,abc,23,4500,Delhi,p1
1,102,xyz,26,5000,Mumbai,p2
2,103,pqr,25,6200,Pune,p3


- **Remove column:**

In [50]:
df2.drop('emp_ID', axis=1, inplace=False)  # Value removed from each row of emp_ID column

Unnamed: 0,name,age,salary,location,post
0,abc,23,4500,Delhi,p1
1,xyz,26,5000,Mumbai,p2
2,pqr,25,6200,Pune,p3
3,rst,22,2100,Hyderabad,p4


In [51]:
df2.drop(3, axis=0, inplace=False)   # give index of row and values from all columns will be removed

Unnamed: 0,emp_ID,name,age,salary,location,post
0,101,abc,23,4500,Delhi,p1
1,102,xyz,26,5000,Mumbai,p2
2,103,pqr,25,6200,Pune,p3


In [52]:
del df2['post']
df2

Unnamed: 0,emp_ID,name,age,salary,location
0,101,abc,23,4500,Delhi
1,102,xyz,26,5000,Mumbai
2,103,pqr,25,6200,Pune
3,104,rst,22,2100,Hyderabad


In [53]:
popp= df2.pop('emp_ID')
popp

0    101
1    102
2    103
3    104
Name: emp_ID, dtype: int64

In [54]:
df2

Unnamed: 0,name,age,salary,location
0,abc,23,4500,Delhi
1,xyz,26,5000,Mumbai
2,pqr,25,6200,Pune
3,rst,22,2100,Hyderabad


- **Check for missing values:**

In [55]:
df2.isnull().sum()

name        0
age         0
salary      0
location    0
dtype: int64

#### Appending DataFrame:

In [57]:
df3 = pd.DataFrame([[1,2],[3,4]],columns=['A','B'])
df3

Unnamed: 0,A,B
0,1,2
1,3,4


In [58]:
df4 = pd.DataFrame([[5,6],[7,8]],columns=['A','B'])
df4

Unnamed: 0,A,B
0,5,6
1,7,8


In [61]:
df5 =df4.append(df4, ignore_index=True, sort=False)
df5

Unnamed: 0,A,B
0,5,6
1,7,8
2,5,6
3,7,8


In [62]:
df4=pd.DataFrame([[5,6],[7,8]])
df4

Unnamed: 0,0,1
0,5,6
1,7,8


In [63]:
df5 = df4.append(df1, ignore_index=True)
df5

Unnamed: 0,0,1,A
0,5.0,6.0,
1,7.0,8.0,
2,,,Python
3,,,C
4,,,C++
