## Pandas 
It is a Python package that provides fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive.

In [1]:
import pandas as pd

<b> Data Creation </b>
There are basically 2 type of objects in pandas<br>
Dataframe and <br>
Series

In [17]:
# DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.
# You can think of it like a spreadsheet or SQL table, or a dict of Series objects.

A=pd.DataFrame({"A1":[2,3,4],"A2":[5,1,9]})
A

Unnamed: 0,A1,A2
0,2,5
1,3,1
2,4,9


In [6]:
#Series is a one-dimensional labeled array capable of holding data of any type 
#  that is ,integer, string, float, python objects, etc.. The axis labels are collectively called index.

pd.Series([1,2,5,7,3,4])

0    1
1    2
2    5
3    7
4    3
5    4
dtype: int64

In [10]:
pd.Series(['A','B','C','D'], index=['N1','N2','N3','N4'])

N1    A
N2    B
N3    C
N4    D
dtype: object

<b>Reading csv Files </b> 
For e.g csv file
pd.read_csv("path\filename.csv")

In [12]:
df=pd.read_csv(r"C:\Users\ASUS\Desktop\Iris.csv")

data details

In [14]:
df.head(5)

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [15]:
df.shape

(150, 6)

<b> Saving to csv

In [18]:
# file_name.to_csv("path/saved file name.csv")

 <b> Indexing </b>
Pandas has its own accessor operators, loc and iloc

<b>loc()</b> is label based data selecting method which means that we have to pass the name of the row or column which we want to select.

In [19]:
df.loc[df.Species=='Iris-setosa']

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
5,6,5.4,3.9,1.7,0.4,Iris-setosa
6,7,4.6,3.4,1.4,0.3,Iris-setosa
7,8,5.0,3.4,1.5,0.2,Iris-setosa
8,9,4.4,2.9,1.4,0.2,Iris-setosa
9,10,4.9,3.1,1.5,0.1,Iris-setosa


 <b> iloc() </b>is a indexed based selecting method which means that we have to pass integer index in the method to select specific row/column.

In [20]:
df.iloc[5:10,:2]

Unnamed: 0,Id,SepalLengthCm
5,6,5.4
6,7,4.6
7,8,5.0
8,9,4.4
9,10,4.9


In [21]:
df.iloc[[1,2,3],[1,5]]

Unnamed: 0,SepalLengthCm,Species
1,4.9,Iris-setosa
2,4.7,Iris-setosa
3,4.6,Iris-setosa


Counting Categorical data value

In [23]:
df.Species.value_counts()

Iris-versicolor    50
Iris-virginica     50
Iris-setosa        50
Name: Species, dtype: int64

In [24]:
df.describe()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
count,150.0,150.0,150.0,150.0,150.0
mean,75.5,5.843333,3.054,3.758667,1.198667
std,43.445368,0.828066,0.433594,1.76442,0.763161
min,1.0,4.3,2.0,1.0,0.1
25%,38.25,5.1,2.8,1.6,0.3
50%,75.5,5.8,3.0,4.35,1.3
75%,112.75,6.4,3.3,5.1,1.8
max,150.0,7.9,4.4,6.9,2.5


In [26]:
df.SepalLengthCm.mean()

5.843333333333334

<b>map() </b>is used to map values from two series having one column same. For mapping two series, the last column of the first series should be same as index column of the second series, also the values should be unique.

In [40]:
A=pd.Series([2,3,4,5,1,9])
m=A.mean()
A.map(lambda x:x-m)

0   -2.0
1   -1.0
2    0.0
3    1.0
4   -3.0
5    5.0
dtype: float64

In [41]:
df['Species'].map('Type is {}'.format)

0         Type is Iris-setosa
1         Type is Iris-setosa
2         Type is Iris-setosa
3         Type is Iris-setosa
4         Type is Iris-setosa
                ...          
145    Type is Iris-virginica
146    Type is Iris-virginica
147    Type is Iris-virginica
148    Type is Iris-virginica
149    Type is Iris-virginica
Name: Species, Length: 150, dtype: object

In [44]:
df.dtypes

Id                 int64
SepalLengthCm    float64
SepalWidthCm     float64
PetalLengthCm    float64
PetalWidthCm     float64
Species           object
dtype: object

Searching for <b>Missing Values

In [47]:
df.isnull().sum()

Id               0
SepalLengthCm    0
SepalWidthCm     0
PetalLengthCm    0
PetalWidthCm     0
Species          0
dtype: int64

In [48]:
df.Species.replace('Iris-setosa','setosa')

0              setosa
1              setosa
2              setosa
3              setosa
4              setosa
            ...      
145    Iris-virginica
146    Iris-virginica
147    Iris-virginica
148    Iris-virginica
149    Iris-virginica
Name: Species, Length: 150, dtype: object

Combining Datsets

In [51]:
A=pd.DataFrame({"A1":[2,3,4],"A2":[5,1,9]})
B=pd.DataFrame({"B1":[12,13,14],"B2":[51,11,19]})
pd.concat([A,B],axis=1)


Unnamed: 0,A1,A2,B1,B2
0,2,5,12,51
1,3,1,13,11
2,4,9,14,19
