# PANDAS

**INTRODUCTION  
Pandas is an open-source, Python library which provides easy-to-use data structures for the data analysis.**
**Pandas is great for data manipulation, data analysis, and data visualization.**

#### WHY PANDAS?

1. We can easily read and write from and to CSV files, or even databases
+ Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
+ We can manipulate the data by columns,.Columns can be inserted and deleted from DataFrame and higher dimensional objects
+ Intuitive merging and joining data sets
    5. Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
    6. Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving/loading data.



In [66]:
import pandas as pd
import numpy as np

#### DATAFRAME

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype)

#Creating dataframe from dictionary
data = [['Harry', 15], ['John', 14], ['Andrew', 13]]
df = pd.DataFrame(data, columns=['Name','Age'])
df

# Creating random data

In [68]:
data = np.random.randint(0,10,(5,4)) #Ranging from 0-10 with 5*4 matrix

In [52]:
print(data)

[[1 0 2 6]
 [1 2 1 5]
 [7 0 3 2]
 [5 7 8 5]
 [1 4 0 2]]


In [69]:
#creating dataframe from random numbers
my_index = '1 2 3 4 5'.split()
print(my_index)
df = pd.DataFrame(data,index=my_index,columns='A B C D'.split())
df

['1', '2', '3', '4', '5']


Unnamed: 0,A,B,C,D
1,4,8,0,2
2,5,8,8,4
3,1,5,1,2
4,6,5,3,6
5,5,6,7,6


# Indexing columns

In [54]:
df[['B','D']] #df.B can also be used

Unnamed: 0,B,D
1,0,6
2,2,5
3,0,2
4,7,5
5,4,2


In [55]:
df['Sum'] = df['A'] + df['B']
df

Unnamed: 0,A,B,C,D,Sum
1,1,0,2,6,1
2,1,2,1,5,3
3,7,0,3,2,7
4,5,7,8,5,12
5,1,4,0,2,5


In [70]:
df.drop('Sum',axis=1) #Column drop axis=1

ValueError: labels ['Sum'] not contained in axis

In [70]:
df = df.drop('Sum',axis=1) #Column drop axis=1

ValueError: labels ['Sum'] not contained in axis

In [57]:
df = df.drop('1',axis=0) #row drop axis=1
df

Unnamed: 0,A,B,C,D
2,1,2,1,5
3,7,0,3,2
4,5,7,8,5
5,1,4,0,2


In [59]:
df.loc['4']

A    5
B    7
C    8
D    5
Name: 4, dtype: int32

In [60]:
df.loc['2','D'] #selecting particular cell

5

In [61]:
df.iloc[2,3] #selecting particular cell using index

5

In [62]:
df.loc[['3','2'],['A','B']]

Unnamed: 0,A,B
3,7,0
2,1,2


In [63]:
df

Unnamed: 0,A,B,C,D
2,1,2,1,5
3,7,0,3,2
4,5,7,8,5
5,1,4,0,2


In [64]:
df>0

Unnamed: 0,A,B,C,D
2,True,True,True,True
3,True,False,True,True
4,True,True,True,True
5,True,True,False,True


In [65]:
df[df>0]

Unnamed: 0,A,B,C,D
2,1,2.0,1.0,5
3,7,,3.0,2
4,5,7.0,8.0,5
5,1,4.0,,2
