# Pandas
Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool

Tutorial reference - https://www.tutorialspoint.com/python_pandas


## Content 
This notebook tutorial will cover these topics (and few extra):-

- Section 1: read_csv() - reading csv files
- Section 2: df.head() - checking data
- Section 3: df.describe() - getting basic statistics about data
- Section 5: filtering - how to select data in a column (jusk like excel)
- Section 6: sum - how to sum column or rows values
- Section 6: count - how to count column or rows element
- Section 7: inplace - how to do operation not on copy of variable but on same variable
- fillna - how to fill missing values
- drop - how to delete particular colum or row
- rename - how to rename column
- loc - how to get data from a specific row or column
- iloc - how to get data from a specific row or column without chaining
- index - get indexes of dataframe
- creating data frame - how to create data frame
- append - add two data frame or add a row in a dataframe
- apply - apply any operation to each cell.

### Let's start!!  Pandas is Excel of python

![image](media/excel.jpg)

image reference - https://www.addictivetips.com/microsoft-office

### Pandas deals with the following data structures −

- Series
- DataFrame

Data Structure | Dimensions	| Description
--- | --- | ---
Series | 1	| 1D labeled homogeneous array, sizeimmutable.
Data Frames	| 2	| General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns.

### Series

Series is a one-dimensional array like structure with homogeneous data. For example, the following series is a collection of integers 10, 23, 56, …

10 | 23	| 56 | 17 | 52	| 61	| 73	| 90	| 26	| 72

- Homogeneous data

### DataFrame
DataFrame is a two-dimensional array with heterogeneous data. For example,

Name | Age	| Gender	| Rating
--- | --- | --- | ---
Steve	| 32 | Male	| 3.45
Lia	| 28	| Female	| 4.6
Vin	| 45	| Male	| 3.9
Katie | 38	| Female	| 2.78

### Data Type of Columns
The data types of the four columns are as follows −

Column	| Type
--- | ---
Name | String
Age	| Integer
Gender	| String
Rating	| Float


### Install pandas from notebook
Execute the below command to install

In [None]:
%%sh
pip install pandas
pip install numpy

## Lets dive into handling data
We will import csv file (which we generally get from client or any prepared problem)

#### Import pandas

You have to import python package to use it. execute below cell to import pandas and other necessary packages

In [1]:
import pandas as pd
import numpy as np

### <a name="load_data"></a> 1. Loading CSV

syntax:
- df=pd.read_csv(filepath)

In [2]:
data=pd.read_csv('data/titanic_train.csv')

### <a name="quick_view"></a>2. Quickly Viewing few records of data

syntax:
- df.head(number of data from top)
- df.tail(number of data from bottom)

In [3]:
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


### <a name="quick_stat"></a>3. Quick data statistics

syntax:
- df.describe()

In [4]:
data.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


#### Extras

In [5]:
# 3.1 to get all columns in the data
data.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Gender', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [6]:
# 3.2 get some info about each column
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Gender         891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB


### 4. Digging deeper into the data

#### lets see what was the survival ratio of male and female

#### 4.1 Dejavu
Arrays: How do you select particular data from an array. Lets see.

In [12]:
x=[1,2,3,4,5]

#1 to select 3rd element i.e number =3. we do
print("first print: ",x[2])

y=np.array([[1,2,3],
           [4,5,6],
           [7,8,9]])

#2 select data on 2nd row 1st column i.e number =4. we do
print("second print: ",y[1,0])
print("thiird print: ",y[1])

first print:  3
second print:  4
thiird print:  [4 5 6]


#### 4.2 Back to Pandas - Selecting columns
Let see how to select column in Pandas

In [13]:
# selecting only one column aka FILTER in excel
data['Gender']

#to select multiple column you can pass many columns name e.g. data[['Sex','Survived']]

0        male
1      female
2      female
3      female
4        male
5        male
6        male
7        male
8      female
9      female
10     female
11     female
12       male
13       male
14     female
15     female
16       male
17       male
18     female
19     female
20       male
21       male
22     female
23       male
24     female
25     female
26       male
27       male
28     female
29       male
        ...  
861      male
862    female
863    female
864      male
865    female
866    female
867      male
868      male
869      male
870      male
871    female
872      male
873      male
874    female
875    female
876      male
877      male
878      male
879    female
880    female
881      male
882    female
883      male
884      male
885    female
886      male
887    female
888    female
889      male
890      male
Name: Gender, Length: 891, dtype: object

### <a name="filters"></a>5. Applying filters in Pandas (same as in Excel)

In [14]:
#applying filter
data[data['Gender']=='female']

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7000,G6,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.5500,C103,S
14,15,0,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14.0,0,0,350406,7.8542,,S
15,16,1,2,"Hewlett, Mrs. (Mary D Kingcome)",female,55.0,0,0,248706,16.0000,,S
18,19,0,3,"Vander Planke, Mrs. Julius (Emelia Maria Vande...",female,31.0,1,0,345763,18.0000,,S


In [15]:
# multi- filters
data[(data['Gender']=='female') & (data['Survived']==1)]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7000,G6,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.5500,C103,S
15,16,1,2,"Hewlett, Mrs. (Mary D Kingcome)",female,55.0,0,0,248706,16.0000,,S
19,20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.2250,,C
22,23,1,3,"McGowan, Miss. Anna ""Annie""",female,15.0,0,0,330923,8.0292,,Q


In [16]:
#Question - what percent of females survived
233/314

0.7420382165605095

In [17]:
# Advanced way to calculate percentage of female survived
data[['Gender','Survived']].groupby('Gender').mean()

Unnamed: 0_level_0,Survived
Gender,Unnamed: 1_level_1
female,0.742038
male,0.188908


### <a name="math_function"></a>6. Introducing basic math functions
How many people intotal survived
- use of df.sum() function as in excel
- use of df.count() function as in exccel

In [18]:
data['Survived'].sum()

342

In [19]:
#percentage of people survived
print("Total number of people",data.shape[0])
data['Survived'].sum()/data.shape[0]
#you can also use - data['Survived'].sum()/data['Survived'].count()

Total number of people 891


0.3838383838383838

### <a name="add_column"></a>7. Adding new column into the present dataframe
New concept introduced:-

- np.nan
- fillna()
- inplace

In [20]:
#check current state of data
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [21]:
#1. create new column with value 2 and check
data['new_column1']=2
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,new_column1
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,2
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,2
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,2
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,2
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,2


In [22]:
#2. create new column with null value
data['new_column2']=np.nan
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,new_column1,new_column2
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,2,
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,2,
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,2,
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,2,
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,2,


In [23]:
# 3. fill holes i.e. fill some values which are not defined in the table
data['new_column2'].fillna(0,inplace=True)
data.head()


Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,new_column1,new_column2
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,2,0.0
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,2,0.0
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,2,0.0
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,2,0.0
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,2,0.0


In [24]:
#calculate Fare/age for each passenger
data['ratio']=data['Fare']/data['Age']

### <a name="delete_rename"></a>8. Lets Delete unwanted columns or Rename the columns 

New concept introduced-
- df.drop()
- df.rename()

In [26]:
#drop column. (Remember axis=1 represent column so we need to use axis=1)
data.drop(['new_column2'],axis=1)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,new_column1,ratio
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,2,0.329545
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,2,1.875876
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,2,0.304808
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,2,1.517143
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,2,0.230000
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q,2,
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S,2,0.960417
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S,2,10.537500
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S,2,0.412344
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C,2,2.147914


In [28]:
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,new_column1,ratio
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,2,0.329545
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,2,1.875876
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,2,0.304808
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,2,1.517143
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,2,0.23


In [34]:
# drop column INPLACE
data.drop(['new_column2'],axis=1,inplace=True)
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,ratio
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,0.329545
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,1.875876
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,0.304808
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,1.517143
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,0.23


In [29]:
data.head()
data.rename(columns={'Gender':'Blunder'}) #this is not in place

Unnamed: 0,PassengerId,Survived,Pclass,Name,Blunder,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,new_column1,ratio
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,2,0.329545
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,2,1.875876
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,2,0.304808
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,2,1.517143
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,2,0.230000
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q,2,
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S,2,0.960417
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S,2,10.537500
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S,2,0.412344
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C,2,2.147914


### <a name="loc_row"></a>9. Time to manuplate rows
New concept introdued:-

1. df.index
2. df.iloc - for number based index
3. df.loc - for label based index or condition based

[Appendix 1](#Apendix1) contains more df.loc usecases

##### iloc

selecting data froma row and column

In [30]:
data['Fare'].iloc[3]

53.1

##### loc

In [31]:
data.loc[(data['Fare'] > 100) & (data['Fare'] <= 200)]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,new_column1,ratio
31,32,1,1,"Spencer, Mrs. William Augustus (Marie Eugenie)",female,,1,0,PC 17569,146.5208,B78,C,2,
195,196,1,1,"Lurette, Miss. Elise",female,58.0,0,0,PC 17569,146.5208,B80,C,2,2.526221
215,216,1,1,"Newell, Miss. Madeleine",female,31.0,1,0,35273,113.275,D36,C,2,3.654032
268,269,1,1,"Graham, Mrs. William Thompson (Edith Junkins)",female,58.0,0,1,PC 17582,153.4625,C125,S,2,2.645905
269,270,1,1,"Bissette, Miss. Amelia",female,35.0,0,0,PC 17760,135.6333,C99,S,2,3.875237
297,298,0,1,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,S,2,75.775
305,306,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S,2,164.728261
306,307,1,1,"Fleming, Miss. Margaret",female,,0,0,17421,110.8833,,C,2,
307,308,1,1,"Penasco y Castellana, Mrs. Victor de Satode (M...",female,17.0,1,0,PC 17758,108.9,C65,C,2,6.405882
318,319,1,1,"Wick, Miss. Mary Natalie",female,31.0,0,2,36928,164.8667,C7,S,2,5.318281


In [36]:
#Deleting row
#print(data.head())
idx=data[(data['Pclass']==2)].index
data.drop(idx)


Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,ratio
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,0.329545
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,1.875876
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,0.304808
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,1.517143
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,0.230000
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q,
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S,0.960417
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S,10.537500
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S,0.412344
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7000,G6,S,4.175000


#### 9.1 APPEND function

In [50]:
### Adding new rows to DataFrame

#Method 1:
#create new dataframe with values.Keep cloumns same as original column
df2=pd.DataFrame([[900,0,3,'abhishek','male',23,1,0,'dasd',8,'c123','s',2.3]],columns=data.columns)

# append this data frame to old one
data.append(df2,ignore_index=True)



Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,ratio
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,0.329545
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,1.875876
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,0.304808
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,1.517143
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,0.230000
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q,
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S,0.960417
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S,10.537500
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S,0.412344
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C,2.147914


In [42]:
df2.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,ratio
0,900,0,3,abhishek,male,23,1,0,dasd,8,c123,s,2.3


In [43]:
#Method 2
length=df2.shape[0]
df2.loc[length]=[900,0,3,'abhishek','male',23,1,0,'dasd',8,'c123','s',2.3]

In [45]:
df2.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,ratio
0,900,0,3,abhishek,male,23,1,0,dasd,8,c123,s,2.3
1,900,0,3,abhishek,male,23,1,0,dasd,8,c123,s,2.3


#### 9.2 CONCAT function

In [49]:
pd.concat([data,df2],axis=0) #remember inplace

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,ratio
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,0.329545
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,1.875876
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,0.304808
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,1.517143
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,0.230000
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q,
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S,0.960417
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S,10.537500
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S,0.412344
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C,2.147914


### <a name="create_from_np"></a>10. Creating DataFrame from numpy array

New concepts introduced

In [51]:
new_df=pd.DataFrame([[1,2,3],[1,4,3],[1,5,3],[1,6,3]],columns=['a','b','c'])

In [52]:
new_df

Unnamed: 0,a,b,c
0,1,2,3
1,1,4,3
2,1,5,3
3,1,6,3


In [53]:
new_df.iloc[3][['b']]

b    6
Name: 3, dtype: int64

### <a name="apply"></a>11. Applying operations in dataframe 
New concept introduced
- df.apply()


In [56]:
df = pd.DataFrame([[4, 9],] * 3, columns=['A', 'B'])
df

Unnamed: 0,A,B
0,4,9
1,4,9
2,4,9


In [57]:
df.apply(np.sqrt)

Unnamed: 0,A,B
0,2.0,3.0
1,2.0,3.0
2,2.0,3.0


In [58]:
#operation column wise
df.apply(np.sum, axis=0)

A    12
B    27
dtype: int64

In [59]:
#applying operations row wise
df.apply(np.sum, axis=1)

0    13
1    13
2    13
dtype: int64

In [60]:
#gives new unique value count in a series
df['A'].value_counts()

4    3
Name: A, dtype: int64

### <a name="lambda"></a>12. Lambda Function with df.apply function

In [61]:
df['A'].apply(lambda x: x/1.25) 

0    3.2
1    3.2
2    3.2
Name: A, dtype: float64

In [62]:
df2 = pd.DataFrame([[4, 9],] * 2, columns=['A', 'B'])

In [63]:
df

Unnamed: 0,A,B
0,4,9
1,4,9
2,4,9


In [64]:
df3=pd.concat([df,df2],axis=1)

In [65]:
df3

Unnamed: 0,A,B,A.1,B.1
0,4,9,4.0,9.0
1,4,9,4.0,9.0
2,4,9,,


### <a name="Apendix1"></a>Apendix 1

- Some more loc usecases

Collection taken from [here](https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/#loc-selection)

In [None]:

# Select rows with first name Antonio, # and all columns between 'city' and 'email'
data.loc[data['first_name'] == 'Antonio', 'city':'email']
 
# Select rows where the email column ends with 'hotmail.com', include all columns
data.loc[data['email'].str.endswith("hotmail.com")]   
 
# Select rows with first_name equal to some values, all columns
data.loc[data['first_name'].isin(['France', 'Tyisha', 'Eric'])]   
       
# Select rows with first name Antonio AND hotmail email addresses
data.loc[data['email'].str.endswith("gmail.com") & (data['first_name'] == 'Antonio')] 
 
# select rows with id column between 100 and 200, and just return 'postal' and 'web' columns
data.loc[(data['id'] > 100) & (data['id'] <= 200), ['postal', 'web']] 
 
# A lambda function that yields True/False values can also be used.
# Select rows where the company name has 4 words in it.
data.loc[data['company_name'].apply(lambda x: len(x.split(' ')) == 4)] 
 
# Selections can be achieved outside of the main .loc for clarity:
# Form a separate variable with your selections:
idx = data['company_name'].apply(lambda x: len(x.split(' ')) == 4)
# Select only the True values in 'idx' and only the 3 columns specified:
data.loc[idx, ['email', 'first_name', 'company']]