# PANDAS

Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. 
The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

In 2008, developer Wes McKinney started developing pandas when in need of high performance, flexible tool for analysis of data.

Prior to Pandas, Python was majorly used for data munging and preparation. It had very little contribution towards data analysis. Pandas solved this problem. 

Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.

Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc.

In [3]:
import numpy as np
import pandas as pd

In [4]:
labels = ['a', 'b', 'c']
my_data = [10, 20, 30]
arr = np.array(my_data)
d = { 'a' : 10 , 'b' : 20 , 'c' : 30}

In [5]:
pd.Series(my_data)

0    10
1    20
2    30
dtype: int64

In [6]:
pd.Series(data=labels,
    index=my_data)

10    a
20    b
30    c
dtype: object

In [13]:
df = pd.DataFrame(np.random.randn(5,4), ['A', 'B', 'C', 'D', 'E'], ['W', 'X', 'Y', 'Z'])
df

Unnamed: 0,W,X,Y,Z
A,-0.045352,0.219573,-0.971534,0.721871
B,0.030455,-1.369694,-0.576068,-0.037799
C,-1.101202,0.520145,-1.387104,1.373238
D,0.292514,0.13266,-0.257738,-1.363254
E,0.030203,-1.000912,-2.426332,0.524185


In [9]:
df['W']

A    0.666206
B    0.408021
C    0.058140
D    1.031532
E    0.730754
Name: W, dtype: float64

In [10]:
df['W']['B']

0.4080214683684016

In [11]:
df['new'] = df['W'] + df['Z']

In [26]:
df

Unnamed: 0,W,X,Y,Z
A,-0.045352,0.219573,-0.971534,0.721871
B,0.030455,-1.369694,-0.576068,-0.037799
C,-1.101202,0.520145,-1.387104,1.373238
D,0.292514,0.13266,-0.257738,-1.363254
E,0.030203,-1.000912,-2.426332,0.524185


In [30]:
df.loc['B']

W    0.030455
X   -1.369694
Y   -0.576068
Z   -0.037799
Name: B, dtype: float64

In [28]:
df.iloc[0]

W   -0.045352
X    0.219573
Y   -0.971534
Z    0.721871
Name: A, dtype: float64

In [31]:
df>0

Unnamed: 0,W,X,Y,Z
A,False,True,False,True
B,True,False,False,False
C,False,True,False,True
D,True,True,False,False
E,True,False,False,True


In [32]:
df[df>0]

Unnamed: 0,W,X,Y,Z
A,,0.219573,,0.721871
B,0.030455,,,
C,,0.520145,,1.373238
D,0.292514,0.13266,,
E,0.030203,,,0.524185


In [35]:
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("The transpose of the data series is:")
df.T

The transpose of the data series is:


Unnamed: 0,0,1,2,3,4,5,6
Name,Tom,James,Ricky,Vin,Steve,Smith,Jack
Age,25,26,25,23,30,29,23
Rating,4.23,3.24,3.98,2.56,3.2,4.6,3.8


In [36]:
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data series is:")
df

Our data series is:


Unnamed: 0,Name,Age,Rating
0,Tom,25,4.23
1,James,26,3.24
2,Ricky,25,3.98
3,Vin,23,2.56
4,Steve,30,3.2
5,Smith,29,4.6
6,Jack,23,3.8
