# PANDAS

**INTRODUCTION  
Pandas is an open-source, Python library which provides easy-to-use data structures for the data analysis.**
**Pandas is great for data manipulation, data analysis, and data visualization.**

#### WHY PANDAS?

1. We can easily read and write from and to CSV files, or even databases
+ Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
+ We can manipulate the data by columns,.Columns can be inserted and deleted from DataFrame and higher dimensional objects
+ Intuitive merging and joining data sets
    5. Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
    6. Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving/loading data.



In [1]:
import pandas as pd
import numpy as np

#### DATAFRAME

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype)

#Creating dataframe from dictionary
data = [['Harry', 15], ['John', 14], ['Andrew', 13]]
df = pd.DataFrame(data, columns=['Name','Age'])
df

# Creating random data

In [2]:
data = np.random.randint(0,10,(5,4)) #Ranging from 0-10 with 5*4 matrix

In [3]:
print(data)

[[7 9 8 2]
 [9 5 8 6]
 [9 7 3 9]
 [5 1 8 2]
 [3 0 7 7]]


In [4]:
#creating dataframe from random numbers
my_index = '1 2 3 4 5'.split()
print(my_index)
df = pd.DataFrame(data,index=my_index,columns='A B C D'.split())
df

['1', '2', '3', '4', '5']


Unnamed: 0,A,B,C,D
1,7,9,8,2
2,9,5,8,6
3,9,7,3,9
4,5,1,8,2
5,3,0,7,7


# Indexing columns

In [5]:
df[['B','D']] #df.B can also be used

Unnamed: 0,B,D
1,9,2
2,5,6
3,7,9
4,1,2
5,0,7


In [6]:
df['Sum'] = df['A'] + df['B']
df

Unnamed: 0,A,B,C,D,Sum
1,7,9,8,2,16
2,9,5,8,6,14
3,9,7,3,9,16
4,5,1,8,2,6
5,3,0,7,7,3


In [7]:
df.drop('Sum',axis=1) #Column drop axis=1

Unnamed: 0,A,B,C,D
1,7,9,8,2
2,9,5,8,6
3,9,7,3,9
4,5,1,8,2
5,3,0,7,7


In [8]:
df = df.drop('Sum',axis=1) #Column drop axis=1

In [9]:
df = df.drop('1',axis=0) #row drop axis=1
df

Unnamed: 0,A,B,C,D
2,9,5,8,6
3,9,7,3,9
4,5,1,8,2
5,3,0,7,7


In [10]:
df.loc['4']

A    5
B    1
C    8
D    2
Name: 4, dtype: int32

In [11]:
df.loc['2','D'] #selecting particular cell

6

In [12]:
df.iloc[2,3] #selecting particular cell using index

2

In [13]:
df.loc[['3','2'],['A','B']]

Unnamed: 0,A,B
3,9,7
2,9,5


In [14]:
df

Unnamed: 0,A,B,C,D
2,9,5,8,6
3,9,7,3,9
4,5,1,8,2
5,3,0,7,7


In [15]:
df>0

Unnamed: 0,A,B,C,D
2,True,True,True,True
3,True,True,True,True
4,True,True,True,True
5,True,False,True,True


In [16]:
df[df>0]

Unnamed: 0,A,B,C,D
2,9,5.0,8,6
3,9,7.0,3,9
4,5,1.0,8,2
5,3,,7,7
