# Pandas Deep Dive

## Outline


* Basic Data Exploration
* Missing Values
* Duplicates
* Selecting / Dropping Columns
* String operations on whole columns
* Column splits/transformations
* Querying a Dataframe
* `DataFrame.at()` vs. `DataFrame.loc`
* Joins
* Sorting
* Column / Row Concatenation

In [1]:
import pandas as pd
import numpy as np

# Making copy of data

It is imporatant to make a copy of data first, as in future if we corrupt the df by accident then we must have a copy to read data.

In [2]:
data = pd.read_csv('startup_funding.csv',thousands=',')
df = data.copy()

In [3]:
df

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,0,0,01/08/2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000.0,
1,1,1,02/08/2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,,
2,2,2,02/08/2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,,
3,3,3,02/08/2017,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000.0,
4,4,4,02/08/2017,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000.0,
...,...,...,...,...,...,...,...,...,...,...,...
2417,45,45,28/07/2017,Jhakaas,Consumer Internet,App-based Aggregator of Offline Businesses,Mumbai,Amen Dhyllon,Seed Funding,,
2418,46,46,28/07/2017,BigStylist,Consumer Internet,Beauty Services Marketplace,Mumbai,Info Edge (India) Ltd,Private Equity,1250000.0,
2419,47,47,28/07/2017,Gympik.com,Consumer Internet,online marketplace for discovering fitness cen...,bangalore,RoundGlass Partners,Seed Funding,,
2420,48,48,01/06/2017,Tripeur,Technology,Mobile based travel ERP platform,Bangalore,"Grace Grace Techno Ventures LLP, Rajul Garg & ...",Seed Funding,,


## Basic Data Exploration

**Getting Shape of the Data**

In [4]:
df.shape

(2422, 11)

There are 2422 rows and 11 columns in the data

**Getting a list of all columns in the dataframe**

In [5]:
df.columns

Index(['index', 'SNo', 'Date', 'StartupName', 'IndustryVertical',
       'SubVertical', 'CityLocation', 'InvestorsName', 'InvestmentType',
       'AmountInUSD', 'Remarks'],
      dtype='object')

**Checking data types of all columns**

In [6]:
df.dtypes

index                 int64
SNo                   int64
Date                 object
StartupName          object
IndustryVertical     object
SubVertical          object
CityLocation         object
InvestorsName        object
InvestmentType       object
AmountInUSD         float64
Remarks              object
dtype: object

**Getting top/bottom 5 values**

By default the df.head() will show the first five rows of the data frame.

In [7]:
df.head(50)

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,0,0,01/08/2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000.0,
1,1,1,02/08/2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,,
2,2,2,02/08/2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,,
3,3,3,02/08/2017,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000.0,
4,4,4,02/08/2017,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000.0,
5,5,5,01/07/2017,Billion Loans,Consumer Internet,Peer to Peer Lending platform,Bangalore,Reliance Corporate Advisory Services Ltd,Seed Funding,1000000.0,
6,6,6,03/07/2017,Ecolibriumenergy,Technology,Energy management solutions provider,Ahmedabad,"Infuse Ventures, JLL",Private Equity,2600000.0,
7,7,7,04/07/2017,Droom,eCommerce,Online marketplace for automobiles,Gurgaon,"Asset Management (Asia) Ltd, Digital Garage Inc",Private Equity,20000000.0,
8,8,8,05/07/2017,Jumbotail,eCommerce,online marketplace for food and grocery,Bangalore,"Kalaari Capital, Nexus India Capital Advisors",Private Equity,8500000.0,
9,9,9,05/07/2017,Moglix,eCommerce,B2B marketplace for Industrial products,Noida,"International Finance Corporation, Rocketship,...",Private Equity,12000000.0,


By default the df.tail() will show the last five rows of the data frame.

In [8]:
df.tail()

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
2417,45,45,28/07/2017,Jhakaas,Consumer Internet,App-based Aggregator of Offline Businesses,Mumbai,Amen Dhyllon,Seed Funding,,
2418,46,46,28/07/2017,BigStylist,Consumer Internet,Beauty Services Marketplace,Mumbai,Info Edge (India) Ltd,Private Equity,1250000.0,
2419,47,47,28/07/2017,Gympik.com,Consumer Internet,online marketplace for discovering fitness cen...,bangalore,RoundGlass Partners,Seed Funding,,
2420,48,48,01/06/2017,Tripeur,Technology,Mobile based travel ERP platform,Bangalore,"Grace Grace Techno Ventures LLP, Rajul Garg & ...",Seed Funding,,
2421,49,49,02/06/2017,RentOnGo,eCommerce,"Online Marketplace for Renting Bikes, Electron...",Bangalore,TVS Motor Company,Private Equity,,


**Getting top/bottom n values**

In [9]:
df.head(3)

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,0,0,01/08/2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000.0,
1,1,1,02/08/2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,,
2,2,2,02/08/2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,,


In [10]:
df.tail(7)

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
2415,43,43,26/07/2017,ThinkerBell,Consumer Internet,Assisted Learning Startup,Bangalore,"Indian Angel Network, Anand Mahindra",Seed Funding,200000.0,
2416,44,44,27/07/2017,1mg,eCommerce,Online Pharmacy,Gurgaon,"HBM Healthcare Investments, Maverick Capital V...",Private Equity,15000000.0,
2417,45,45,28/07/2017,Jhakaas,Consumer Internet,App-based Aggregator of Offline Businesses,Mumbai,Amen Dhyllon,Seed Funding,,
2418,46,46,28/07/2017,BigStylist,Consumer Internet,Beauty Services Marketplace,Mumbai,Info Edge (India) Ltd,Private Equity,1250000.0,
2419,47,47,28/07/2017,Gympik.com,Consumer Internet,online marketplace for discovering fitness cen...,bangalore,RoundGlass Partners,Seed Funding,,
2420,48,48,01/06/2017,Tripeur,Technology,Mobile based travel ERP platform,Bangalore,"Grace Grace Techno Ventures LLP, Rajul Garg & ...",Seed Funding,,
2421,49,49,02/06/2017,RentOnGo,eCommerce,"Online Marketplace for Renting Bikes, Electron...",Bangalore,TVS Motor Company,Private Equity,,


**Getting a summary of all columns**

This will display the basic stats of the **numerical columns** in the data frame

In [11]:
df.describe()

Unnamed: 0,index,SNo,AmountInUSD
count,2422.0,2422.0,1553.0
mean,1161.532205,1161.532205,11923850.0
std,697.598259,697.598259,63466800.0
min,0.0,0.0,16000.0
25%,555.25,555.25,375000.0
50%,1160.5,1160.5,1100000.0
75%,1765.75,1765.75,6000000.0
max,2371.0,2371.0,1400000000.0


This will display the basic stats of the **object type** columns in the data frame. The 'O' means object in the following code. 

In [12]:
df.describe(include=['O'])

Unnamed: 0,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,Remarks
count,2422,2422,2251,1486,2243,2413,2421,419
unique,701,2001,743,1364,71,1885,7,69
top,02/02/2015,Swiggy,Consumer Internet,Online Pharmacy,Bangalore,Undisclosed Investors,Seed Funding,Series A
freq,11,7,795,10,649,33,1300,177


To display all the columns at once use the following command.

In [13]:
df.describe(include='all')

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
count,2422.0,2422.0,2422,2422,2251,1486,2243,2413,2421,1553.0,419
unique,,,701,2001,743,1364,71,1885,7,,69
top,,,02/02/2015,Swiggy,Consumer Internet,Online Pharmacy,Bangalore,Undisclosed Investors,Seed Funding,,Series A
freq,,,11,7,795,10,649,33,1300,,177
mean,1161.532205,1161.532205,,,,,,,,11923850.0,
std,697.598259,697.598259,,,,,,,,63466800.0,
min,0.0,0.0,,,,,,,,16000.0,
25%,555.25,555.25,,,,,,,,375000.0,
50%,1160.5,1160.5,,,,,,,,1100000.0,
75%,1765.75,1765.75,,,,,,,,6000000.0,


**Getting unique values of  single column**

After running this command you will see that it returns all unique values that are present in the column **InvestmentType**.

Note that **PrivateEquity** and **Private Equity** are two unique values.

In [14]:
df['InvestmentType'].unique()

array(['Private Equity', 'Seed Funding', 'Debt Funding', nan,
       'SeedFunding', 'PrivateEquity', 'Crowd funding', 'Crowd Funding'],
      dtype=object)

## Missing Values

**Checking which columns have missing values and how many**

In [15]:
df.isnull().sum()

index                  0
SNo                    0
Date                   0
StartupName            0
IndustryVertical     171
SubVertical          936
CityLocation         179
InvestorsName          9
InvestmentType         1
AmountInUSD          869
Remarks             2003
dtype: int64

**Treating Missing Values of a single column (Series)**

In [16]:
df['AmountInUSD'] = df['AmountInUSD'].fillna(0)

After running the following command you'll see that AmountInUSD is now filled with 0 and it doesn't have any missing/null values.

In [17]:
df.isnull().sum()

index                  0
SNo                    0
Date                   0
StartupName            0
IndustryVertical     171
SubVertical          936
CityLocation         179
InvestorsName          9
InvestmentType         1
AmountInUSD            0
Remarks             2003
dtype: int64

**Treating all Missing Values at once**

In [18]:
df.fillna('Others', inplace=True)

The **inplace=True** parameter in the above code acts like an assignment operator and re-assigns the the changed changed df.

In [19]:
df.isnull().sum()

index               0
SNo                 0
Date                0
StartupName         0
IndustryVertical    0
SubVertical         0
CityLocation        0
InvestorsName       0
InvestmentType      0
AmountInUSD         0
Remarks             0
dtype: int64

Now you can see that all the columns have no **0** or **null values**

## Duplicates

**Checking for duplicates in a specefic column**

In [20]:
df.duplicated(['StartupName']).sum()

421

This means that there are **421** StartupName instances that are exactly the same.





**Checking for whole row duplicates**

In [21]:
df.duplicated().sum()

50

this returns the total number of rows that are duplicated.

In [22]:
df[df.duplicated()]

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
2372,0,0,01/08/2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000.0,Others
2373,1,1,02/08/2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,0.0,Others
2374,2,2,02/08/2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,0.0,Others
2375,3,3,02/08/2017,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000.0,Others
2376,4,4,02/08/2017,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000.0,Others
2377,5,5,01/07/2017,Billion Loans,Consumer Internet,Peer to Peer Lending platform,Bangalore,Reliance Corporate Advisory Services Ltd,Seed Funding,1000000.0,Others
2378,6,6,03/07/2017,Ecolibriumenergy,Technology,Energy management solutions provider,Ahmedabad,"Infuse Ventures, JLL",Private Equity,2600000.0,Others
2379,7,7,04/07/2017,Droom,eCommerce,Online marketplace for automobiles,Gurgaon,"Asset Management (Asia) Ltd, Digital Garage Inc",Private Equity,20000000.0,Others
2380,8,8,05/07/2017,Jumbotail,eCommerce,online marketplace for food and grocery,Bangalore,"Kalaari Capital, Nexus India Capital Advisors",Private Equity,8500000.0,Others
2381,9,9,05/07/2017,Moglix,eCommerce,B2B marketplace for Industrial products,Noida,"International Finance Corporation, Rocketship,...",Private Equity,12000000.0,Others


This shows that these are the rows that are duplicated. They may have more than 2 instances.

**Deleting duplicates from a specefic column**

In [23]:
df.shape

(2422, 11)

Following code will delete the duplicates that are present in **StartupName**. The **keep='First'** parameter keeps the first occurance of the instance that is duplicated and deletes the rest. So in resutl only unique instances are left

In [24]:
df.drop_duplicates(['StartupName'], keep='first').shape # Default is keep=first

(2001, 11)

In [25]:
df.drop_duplicates(['StartupName'], keep='last').shape #keep=last keeps the last instance and deletes the rest

(2001, 11)

In [26]:
df.drop_duplicates(['StartupName'], keep=False).shape # this deletes all the occurances of the duplicates so none is left.

(1679, 11)

**NOTE:** In all the above occurances neither did we use the inplace=True parameter nor manually assigned it. So there won't be any change in the df

**Deleting whole row duplicates**

In [27]:
df.drop_duplicates().shape

(2372, 11)

In [28]:
df.shape

(2422, 11)

Now we will use the inplace=True parameter to save the data frame

In [29]:
df.drop_duplicates(inplace=True)
df.shape

(2372, 11)

## Selecting / Dropping Columns

**Selecting Columns**

In order to select multiple columns we have to pass their names as list in the data frame

In [30]:
df[['Date', 'CityLocation', 'AmountInUSD']]

Unnamed: 0,Date,CityLocation,AmountInUSD
0,01/08/2017,Bangalore,1300000.0
1,02/08/2017,Mumbai,0.0
2,02/08/2017,New Delhi,0.0
3,02/08/2017,Mumbai,500000.0
4,02/08/2017,Hyderabad,850000.0
...,...,...,...
2367,29/01/2015,Others,4500000.0
2368,29/01/2015,Others,825000.0
2369,30/01/2015,Others,1500000.0
2370,30/01/2015,Others,0.0


**Dropping Columns**

In [31]:
df.head()

Unnamed: 0,index,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,0,0,01/08/2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000.0,Others
1,1,1,02/08/2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,0.0,Others
2,2,2,02/08/2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,0.0,Others
3,3,3,02/08/2017,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000.0,Others
4,4,4,02/08/2017,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000.0,Others


In [32]:
df.drop('index', axis=1).shape

(2372, 10)

In [33]:
df.shape

(2372, 11)

use the inplace true paramter to save the data frame

In [34]:
df.drop('index', axis=1,inplace=True)

In [35]:
df.shape

(2372, 10)

In [36]:
df.head()

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,0,01/08/2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000.0,Others
1,1,02/08/2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,0.0,Others
2,2,02/08/2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,0.0,Others
3,3,02/08/2017,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000.0,Others
4,4,02/08/2017,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000.0,Others


## String Operations on whole columns

**String Replacement**

In [37]:
df['InvestmentType'].unique()

array(['Private Equity', 'Seed Funding', 'Debt Funding', 'Others',
       'SeedFunding', 'PrivateEquity', 'Crowd funding', 'Crowd Funding'],
      dtype=object)

We want to replace *PrivateEquatiy* which is written together to be written as *Private Equatiy*

In [38]:
# first parameter in str.replace is the one which we want to replace and
#the second one is with which we are replacing it
df['InvestmentType'].replace('PrivateEquity', 'Private Equity').unique() 

array(['Private Equity', 'Seed Funding', 'Debt Funding', 'Others',
       'SeedFunding', 'Crowd funding', 'Crowd Funding'], dtype=object)

In [39]:
df['InvestmentType'].unique()

array(['Private Equity', 'Seed Funding', 'Debt Funding', 'Others',
       'SeedFunding', 'PrivateEquity', 'Crowd funding', 'Crowd Funding'],
      dtype=object)

As you can see it is back in the df because we did not re assign it. **Inplace=True** does **NOT** work here so we have to reassign it manually

In [40]:
df['InvestmentType'] = df['InvestmentType'].str.replace('PrivateEquity', 'Private Equity')
df['InvestmentType'] = df['InvestmentType'].str.replace('SeedFunding', 'Seed Funding')
df['InvestmentType'] = df['InvestmentType'].str.replace('Crowd funding', 'Crowd Funding')

In [41]:
df['InvestmentType'].unique()

array(['Private Equity', 'Seed Funding', 'Debt Funding', 'Others',
       'Crowd Funding'], dtype=object)

**Capitalization**

To capitalize every name in a specific column use the follwing code

In [42]:
df['CityLocation'].str.lower()

0       bangalore
1          mumbai
2       new delhi
3          mumbai
4       hyderabad
          ...    
2367       others
2368       others
2369       others
2370       others
2371       others
Name: CityLocation, Length: 2372, dtype: object

**Checking if there is a substring in each column value**

In [43]:
df[df['InvestorsName'].str.contains('Khan')]

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
81,81,16/06/2017,Fincash,Consumer Internet,Personal Finance platform,Mumbai,"Mohammed Khan, Sameer Narayan & Others",Seed Funding,100000.0,Others
780,780,17/08/2016,MaalGaadi,Logistics,Online Logistics Marketplace,Indore,"Swan Angel Network,Sachin Khandelwal and others",Seed Funding,375000.0,Others
871,871,15/07/2016,BaggOut,eCommerce,Women’s Fashion etailer,New Delhi,"Sumit Jain, Sumit Jain, Anurag Gupta, Varun Kh...",Seed Funding,0.0,Others
920,920,10/06/2016,Kickstart Jobs,Technology,Entry level hiring platform,Gurgaon,"Vivek Joshi, Mohit Satyanand, Amit Banati, Aru...",Seed Funding,0.0,Others
947,947,17/06/2016,BYG,Consumer Internet,Fitness centre Discovery & Booking Mobile app,Bangalore,"Sanjay Verma, Amit Khanna (LetsVenture)",Seed Funding,0.0,Others
1094,1094,13/04/2016,Legalraasta,Consumer Internet,Online legal Services for Startups,New Delhi,"Pravin Khandelwal, Yatin Kumar Jain",Seed Funding,1000000.0,Others
1155,1155,03/3/2016,Imarticus Learning,Education,Financial Services & Analytics Education Insti...,Mumbai,"Blinc Advisors, Amit Nanavati, Tashwinder Sing...",Private Equity,1000000.0,Others
1581,1581,19/11/2015,PlaceofOrigin,Online Gourmet Food Marketplace,Others,Bangalore,"S.D. Shibulal, Kris Gopalakrishnan, Srinath Ba...",Seed Funding,0.0,Others
1598,1598,25/11/2015,Tooler,On Demand Laundry Services App,Others,New Delhi,"Raghu Khanna, Sameer Gupta",Seed Funding,110000.0,Others
1793,1793,29/09/2015,LoanCircle,Consumer lending marketplace,Others,Bangalore,"Zishaan Hayath, Rahul Khanna & Others",Seed Funding,0.0,Others


In [44]:
df[df['InvestorsName'].str.contains(' Khan')]

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
81,81,16/06/2017,Fincash,Consumer Internet,Personal Finance platform,Mumbai,"Mohammed Khan, Sameer Narayan & Others",Seed Funding,100000.0,Others
780,780,17/08/2016,MaalGaadi,Logistics,Online Logistics Marketplace,Indore,"Swan Angel Network,Sachin Khandelwal and others",Seed Funding,375000.0,Others
871,871,15/07/2016,BaggOut,eCommerce,Women’s Fashion etailer,New Delhi,"Sumit Jain, Sumit Jain, Anurag Gupta, Varun Kh...",Seed Funding,0.0,Others
920,920,10/06/2016,Kickstart Jobs,Technology,Entry level hiring platform,Gurgaon,"Vivek Joshi, Mohit Satyanand, Amit Banati, Aru...",Seed Funding,0.0,Others
947,947,17/06/2016,BYG,Consumer Internet,Fitness centre Discovery & Booking Mobile app,Bangalore,"Sanjay Verma, Amit Khanna (LetsVenture)",Seed Funding,0.0,Others
1094,1094,13/04/2016,Legalraasta,Consumer Internet,Online legal Services for Startups,New Delhi,"Pravin Khandelwal, Yatin Kumar Jain",Seed Funding,1000000.0,Others
1155,1155,03/3/2016,Imarticus Learning,Education,Financial Services & Analytics Education Insti...,Mumbai,"Blinc Advisors, Amit Nanavati, Tashwinder Sing...",Private Equity,1000000.0,Others
1581,1581,19/11/2015,PlaceofOrigin,Online Gourmet Food Marketplace,Others,Bangalore,"S.D. Shibulal, Kris Gopalakrishnan, Srinath Ba...",Seed Funding,0.0,Others
1598,1598,25/11/2015,Tooler,On Demand Laundry Services App,Others,New Delhi,"Raghu Khanna, Sameer Gupta",Seed Funding,110000.0,Others
1793,1793,29/09/2015,LoanCircle,Consumer lending marketplace,Others,Bangalore,"Zishaan Hayath, Rahul Khanna & Others",Seed Funding,0.0,Others


In [45]:
df['InvestorsName'].str.contains('Khan').sum()

11

In [46]:
df['InvestorsName'].str.contains('khan').sum()

3

**What do you observe?**
<br>The str.contains() function is case sensative. If you pass Khan with capital K then it will search it with capital K otherwise if you pass lower k then it will search accordingly.

## Column Transformations / Splits

**Changing Data types**

___1. Normal conversion___

In [47]:
df.dtypes

SNo                   int64
Date                 object
StartupName          object
IndustryVertical     object
SubVertical          object
CityLocation         object
InvestorsName        object
InvestmentType       object
AmountInUSD         float64
Remarks              object
dtype: object

In [48]:
df['AmountInUSD']=df['AmountInUSD'].astype(np.int64)

In [49]:
df.dtypes

SNo                  int64
Date                object
StartupName         object
IndustryVertical    object
SubVertical         object
CityLocation        object
InvestorsName       object
InvestmentType      object
AmountInUSD          int64
Remarks             object
dtype: object

___2. Dates Conversion___

This will return an error as there are unknown string formats present. The below function will only work when date is in a specific format like 1/1/2001

In [50]:
#pd.to_datetime(df['Date'])

In [51]:
df['Date'].str.contains('//').sum()

1

we saw in the data that there were some anomolies present. Like in the dates double slashes were used instead of single so we had to replace them
<br> The errors='coerce' parameter ignores the errors and converts the rest.

In [52]:
df.head()

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks
0,0,01/08/2017,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000,Others
1,1,02/08/2017,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,0,Others
2,2,02/08/2017,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,0,Others
3,3,02/08/2017,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000,Others
4,4,02/08/2017,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000,Others


In [53]:
df['New Date'] = pd.to_datetime(df['Date'].str.replace('//', '/'), dayfirst=True, errors='coerce')

In [54]:
df.dtypes

SNo                          int64
Date                        object
StartupName                 object
IndustryVertical            object
SubVertical                 object
CityLocation                object
InvestorsName               object
InvestmentType              object
AmountInUSD                  int64
Remarks                     object
New Date            datetime64[ns]
dtype: object

In [55]:
df.isnull().sum()

SNo                 0
Date                0
StartupName         0
IndustryVertical    0
SubVertical         0
CityLocation        0
InvestorsName       0
InvestmentType      0
AmountInUSD         0
Remarks             0
New Date            0
dtype: int64

In [56]:
df[df['New Date'].isnull()]

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks,New Date


In [57]:
df['Date']=df['Date'].str.replace('//', '/')
df['Date']=df['Date'].str.replace('.', '/')

In [58]:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')

In [59]:
df.dtypes

SNo                          int64
Date                datetime64[ns]
StartupName                 object
IndustryVertical            object
SubVertical                 object
CityLocation                object
InvestorsName               object
InvestmentType              object
AmountInUSD                  int64
Remarks                     object
New Date            datetime64[ns]
dtype: object

In [60]:
df.isnull().sum()

SNo                 0
Date                0
StartupName         0
IndustryVertical    0
SubVertical         0
CityLocation        0
InvestorsName       0
InvestmentType      0
AmountInUSD         0
Remarks             0
New Date            0
dtype: int64

In [61]:
df.drop('New Date',axis=1,inplace=True)

**Getting months/year/days as separate columns from a datetime column**

In [62]:
df['month'] = pd.DatetimeIndex(df['Date']).month

In [63]:
df.head()

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks,month
0,0,2017-08-01,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000,Others,8
1,1,2017-08-02,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,0,Others,8
2,2,2017-08-02,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,0,Others,8
3,3,2017-08-02,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000,Others,8
4,4,2017-08-02,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000,Others,8


**Exercise:** Extract day and year

In [64]:
df['day'] = pd.DatetimeIndex(df['Date']).day
df['year'] = pd.DatetimeIndex(df['Date']).year
df.head()

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks,month,day,year
0,0,2017-08-01,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000,Others,8,1,2017
1,1,2017-08-02,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,0,Others,8,2,2017
2,2,2017-08-02,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,0,Others,8,2,2017
3,3,2017-08-02,Zepo,Consumer Internet,DIY Ecommerce platform,Mumbai,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding,500000,Others,8,2,2017
4,4,2017-08-02,Click2Clinic,Consumer Internet,healthcare service aggregator,Hyderabad,"Narottam Thudi, Shireesh Palle",Seed Funding,850000,Others,8,2,2017


Extract current date time

In [65]:
import datetime

datetime.datetime.today()

datetime.datetime(2020, 12, 8, 23, 19, 8, 948473)

**Extracting new columns from existing string columns**

In [66]:
df['InvestmentType'].str.split(' ')

0       [Private, Equity]
1       [Private, Equity]
2         [Seed, Funding]
3         [Seed, Funding]
4         [Seed, Funding]
              ...        
2367    [Private, Equity]
2368    [Private, Equity]
2369    [Private, Equity]
2370    [Private, Equity]
2371      [Seed, Funding]
Name: InvestmentType, Length: 2372, dtype: object

In [67]:
df['InvestmentType'].str.split(' ').str[0].head(10)

0    Private
1    Private
2       Seed
3       Seed
4       Seed
5       Seed
6    Private
7    Private
8    Private
9    Private
Name: InvestmentType, dtype: object

## Querying a Dataframe

**Querying based on a single value**

In [68]:
df['AmountInUSD'] < 100000

0       False
1        True
2        True
3       False
4       False
        ...  
2367    False
2368    False
2369    False
2370     True
2371    False
Name: AmountInUSD, Length: 2372, dtype: bool

In [69]:
df[df['AmountInUSD'] < 100000]

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks,month,day,year
1,1,2017-08-02,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,0,Others,8,2,2017
2,2,2017-08-02,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,0,Others,8,2,2017
11,11,2017-07-06,Minjar,Technology,Cloud Solutions provider,Bangalore,"Blume Ventures, Contrarian Capital India Partn...",Seed Funding,0,Others,7,6,2017
12,12,2017-07-06,MyCity4kids,Consumer Internet,parenting blog and kids’ events discovery plat...,Gurgaon,Others,Seed Funding,0,Others,7,6,2017
14,14,2017-07-07,Upwardly.in,Consumer Internet,MF investment platform,Bangalore,"Sreeram Iyer, Suvo Sarkar, Anita Gupta, Likemi...",Seed Funding,0,Others,7,7,2017
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2355,2355,2015-05-21,Knit,Others,Others,Others,"Rohit Jain, Amit Rambhia & Others",Seed Funding,0,Others,5,21,2015
2358,2358,2015-01-22,Freshmonk,Others,Others,Others,"August Capital Partners, Michael Blakey",Seed Funding,0,Others,1,22,2015
2359,2359,2015-01-22,Englishleap.com,Others,Others,Others,ANALEC,Private Equity,0,Majority Stake,1,22,2015
2363,2363,2015-01-24,Impartus,Others,Others,Others,Kaizen Private Equity,Private Equity,0,Series A,1,24,2015


**Querying based on multiple columns**

In [70]:
df[(df['AmountInUSD'] < 100000) & (df['CityLocation'] == 'New Delhi')]

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks,month,day,year
2,2,2017-08-02,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,0,Others,8,2,2017
36,36,2017-07-21,FableStreet,eCommerce,Women Work wear etailer,New Delhi,"Harmeet Bajaj, Pameela P, Fusiontech Ventures ...",Seed Funding,0,Others,7,21,2017
37,37,2017-07-21,Monsoon Fintech,Technology,Machine Learning Access platform,New Delhi,"Sunil Kalra, Aditya Singh, Rishi Srivastava, R...",Seed Funding,0,Others,7,21,2017
41,41,2017-07-26,Creator’s Gurukul,Others,Co-Working Space Provider,New Delhi,Yuvraj Singh,Seed Funding,0,Others,7,26,2017
57,57,2017-06-06,GyanDhan,Consumer Internet,Education Marketplace,New Delhi,Sundaram Finance Holdings,Private Equity,0,Others,6,6,2017
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2127,2127,2015-05-25,Newgen Payments,Payments Solution Provider,Others,New Delhi,Jan Manten,Seed Funding,0,Others,5,25,2015
2175,2175,2015-04-22,FindYahan,Hyperlocal services marketplace,Others,New Delhi,The Phoenix Fund,Private Equity,0,Series A,4,22,2015
2183,2183,2015-04-23,EazyDiner,Restaurant reservation app,Others,New Delhi,"Deepak Shahdadpuri, Gulpreet Kohli",Seed Funding,0,Others,4,23,2015
2184,2184,2015-04-23,Phone Warrior,Spam Call block App,Others,New Delhi,Lightspeed Ventures,Seed Funding,0,Pre-Series A,4,23,2015


**Querying based on a list of values**

In [71]:
df['CityLocation'].isin(['New Delhi', 'Bangalore', 'Mumbai'])

0        True
1        True
2        True
3        True
4       False
        ...  
2367    False
2368    False
2369    False
2370    False
2371    False
Name: CityLocation, Length: 2372, dtype: bool

In [72]:
df[(df['CityLocation'].isin(['New Delhi', 'Bangalore', 'Mumbai'])) & (df['AmountInUSD'] < 100000)]

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks,month,day,year
1,1,2017-08-02,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,0,Others,8,2,2017
2,2,2017-08-02,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,0,Others,8,2,2017
11,11,2017-07-06,Minjar,Technology,Cloud Solutions provider,Bangalore,"Blume Ventures, Contrarian Capital India Partn...",Seed Funding,0,Others,7,6,2017
14,14,2017-07-07,Upwardly.in,Consumer Internet,MF investment platform,Bangalore,"Sreeram Iyer, Suvo Sarkar, Anita Gupta, Likemi...",Seed Funding,0,Others,7,7,2017
18,18,2017-07-11,Design Cafe,Consumer Internet,Online Interior Design platform,Bangalore,"Fireside Ventures, Apurva Salarpuria, Sidharth...",Seed Funding,0,Others,7,11,2017
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2183,2183,2015-04-23,EazyDiner,Restaurant reservation app,Others,New Delhi,"Deepak Shahdadpuri, Gulpreet Kohli",Seed Funding,0,Others,4,23,2015
2184,2184,2015-04-23,Phone Warrior,Spam Call block App,Others,New Delhi,Lightspeed Ventures,Seed Funding,0,Pre-Series A,4,23,2015
2189,2189,2015-04-27,GIBBS,Clean Tech,Others,New Delhi,Infuse Ventures fund,Private Equity,0,Series B,4,27,2015
2191,2191,2015-04-28,Urban Ladder,Online Furniture ecommerce,Others,Bangalore,"Anand Rajaraman, Venky Harinarayan",Private Equity,0,Series D,4,28,2015


## `DataFrame.at` vs. `DataFrame.loc`

In [73]:
#df.at[133,['StartupName', 'AmountInUSD']]

In [74]:
df.at[133,'StartupName']="Zubair Labs"

In [75]:
df.at[133,'StartupName']

'Zubair Labs'

In [76]:
df.loc[133,['StartupName', 'AmountInUSD']]

StartupName    Zubair Labs
AmountInUSD          50000
Name: 133, dtype: object

In [77]:
%timeit df.at[133,'StartupName']

3.91 µs ± 273 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [78]:
%timeit df.loc[133,'StartupName']

8.25 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


## Joins

In [79]:
rows1 = df[0:10]
rows2 = df[5:12]
print(rows1.shape)
rows1
table1 = rows1[['SNo', 'Date', 'CityLocation']]
table2 = rows2[['SNo', 'StartupName', 'IndustryVertical', 'InvestmentType']]
print(table1.shape)
table1

(10, 13)
(10, 3)


Unnamed: 0,SNo,Date,CityLocation
0,0,2017-08-01,Bangalore
1,1,2017-08-02,Mumbai
2,2,2017-08-02,New Delhi
3,3,2017-08-02,Mumbai
4,4,2017-08-02,Hyderabad
5,5,2017-07-01,Bangalore
6,6,2017-07-03,Ahmedabad
7,7,2017-07-04,Gurgaon
8,8,2017-07-05,Bangalore
9,9,2017-07-05,Noida


In [80]:
print(table2.shape)
table2

(7, 4)


Unnamed: 0,SNo,StartupName,IndustryVertical,InvestmentType
5,5,Billion Loans,Consumer Internet,Seed Funding
6,6,Ecolibriumenergy,Technology,Private Equity
7,7,Droom,eCommerce,Private Equity
8,8,Jumbotail,eCommerce,Private Equity
9,9,Moglix,eCommerce,Private Equity
10,10,Timesaverz,Consumer Internet,Private Equity
11,11,Minjar,Technology,Seed Funding


In [81]:
table1.merge(table2, how='right', on='SNo')

Unnamed: 0,SNo,Date,CityLocation,StartupName,IndustryVertical,InvestmentType
0,5,2017-07-01,Bangalore,Billion Loans,Consumer Internet,Seed Funding
1,6,2017-07-03,Ahmedabad,Ecolibriumenergy,Technology,Private Equity
2,7,2017-07-04,Gurgaon,Droom,eCommerce,Private Equity
3,8,2017-07-05,Bangalore,Jumbotail,eCommerce,Private Equity
4,9,2017-07-05,Noida,Moglix,eCommerce,Private Equity
5,10,NaT,,Timesaverz,Consumer Internet,Private Equity
6,11,NaT,,Minjar,Technology,Seed Funding


## Sorting

In [82]:
df.sort_values(by=['CityLocation'])

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks,month,day,year
696,696,2016-09-13,ShoeKonnect,eCommerce,Footwear & Leather B2B App,Agra,Indian Angels Network,Seed Funding,0,Others,9,13,2016
619,619,2016-10-13,Shoekonnect,eCommerce,B2B Marketplace for Shoes,Agra,Indian Angel Network,Seed Funding,0,Others,10,13,2016
1655,1655,2015-10-12,Gridle,Cloud Based Collaboration platform,Others,Ahmedabad,LetsVenture,Seed Funding,100000,Others,10,12,2015
2112,2112,2015-05-18,Awaaz De,Enterprise Communication Platform,Others,Ahmedabad,Samir Shah,Seed Funding,0,Others,5,18,2015
1287,1287,2016-02-16,Salebhai,eCommerce,"Sweets, Eatables, Handicrafts Online Marketplace",Ahmedabad,"Virendra Shekhawat, Deepak Chokhani, Yogesh Pa...",Seed Funding,0,Others,2,16,2016
...,...,...,...,...,...,...,...,...,...,...,...,...,...
259,259,2017-03-02,Kreate Konnect,Technology,End-to-End Seller e-commerce solutions Provider,Vadodara,Langoor,Seed Funding,0,Others,3,2,2017
1555,1555,2015-11-11,Gingercrush,Product Customization Platform,Others,Vadodara,TV Mohandas Pai’s family office,Seed Funding,0,Others,11,11,2015
935,935,2016-06-15,Oneway.cab,Consumer Internet,Taxi Rental Platform,Vadodara,Indian Angel Network,Seed Funding,450000,Others,6,15,2016
1251,1251,2016-02-05,DawaiLelo,Consumer Internet,Healthcare Services & Online Pharmacy Mobile App,Varanasi,Undisclosed investors,Seed Funding,52000,Others,2,5,2016


In [83]:
df.sort_values(by=['AmountInUSD'],ascending=False)

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks,month,day,year
294,294,2017-03-21,Flipkart,eCommerce,ECommerce Marketplace,Bangalore,"Microsoft, eBay, Tencent Holdings",Private Equity,1400000000,Others,3,21,2017
158,158,2017-05-18,Paytm,ECommerce,Mobile Wallet & ECommerce platform,Bangalore,SoftBank Group,Private Equity,1400000000,Others,5,18,2017
1976,1976,2015-07-28,Flipkart.com,Online Marketplace,Others,Bangalore,Steadview Capital and existing investors,Private Equity,700000000,"Late Stage, 10th Round More here",7,28,2015
1787,1787,2015-09-29,Paytm,E-Commerce & M-Commerce platform,Others,New Delhi,"Alibaba Group, Ant Financial",Private Equity,680000000,Late Stage (Alibaba @ 40% equity),9,29,2015
1572,1572,2015-11-18,Ola,Car Aggregator & Retail Mobile App,Others,Bangalore,"Baillie Gifford, Falcon Edge Capital, Tiger Gl...",Private Equity,500000000,Series F ( More Details Here),11,18,2015
...,...,...,...,...,...,...,...,...,...,...,...,...,...
394,394,2017-01-14,SecururAX,Technology,Cloud-based solutions provider,Bangalore,"Axilor Ventures, Parampara Early Stage Opportu...",Private Equity,0,Others,1,14,2017
1171,1171,2016-03-08,Finomena,Technology,Credit Worthiness Big Data Analytics,New Delhi,Matrix Partners India,Private Equity,0,Others,3,8,2016
1746,1746,2015-09-15,Mubble,Prepaid Mobile Bill Manager App,Others,Bangalore,Nandan Nilekani,Seed Funding,0,Others,9,15,2015
1169,1169,2016-03-07,Wooplr,ECommerce,Curated Fashion ECommerce portal,Bangalore,"Naveen Tewari, Abhay Singhal, Amit Gupta, Piyu...",Seed Funding,0,Others,3,7,2016


## Column / Row Concatenation

In [84]:
df.columns

Index(['SNo', 'Date', 'StartupName', 'IndustryVertical', 'SubVertical',
       'CityLocation', 'InvestorsName', 'InvestmentType', 'AmountInUSD',
       'Remarks', 'month', 'day', 'year'],
      dtype='object')

In [85]:
part1 = df[['SNo', 'Date', 'StartupName']]
part2 = df[['InvestorsName', 'InvestmentType']]
print(part1.shape)
part1.head()

(2372, 3)


Unnamed: 0,SNo,Date,StartupName
0,0,2017-08-01,TouchKin
1,1,2017-08-02,Ethinos
2,2,2017-08-02,Leverage Edu
3,3,2017-08-02,Zepo
4,4,2017-08-02,Click2Clinic


In [86]:
print(part2.shape)
part2.head()

(2372, 2)


Unnamed: 0,InvestorsName,InvestmentType
0,Kae Capital,Private Equity
1,Triton Investment Advisors,Private Equity
2,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding
3,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding
4,"Narottam Thudi, Shireesh Palle",Seed Funding


In [87]:
pd.concat([part1, part2], axis=1)

Unnamed: 0,SNo,Date,StartupName,InvestorsName,InvestmentType
0,0,2017-08-01,TouchKin,Kae Capital,Private Equity
1,1,2017-08-02,Ethinos,Triton Investment Advisors,Private Equity
2,2,2017-08-02,Leverage Edu,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding
3,3,2017-08-02,Zepo,"Kunal Shah, LetsVenture, Anupam Mittal, Hetal ...",Seed Funding
4,4,2017-08-02,Click2Clinic,"Narottam Thudi, Shireesh Palle",Seed Funding
...,...,...,...,...,...
2367,2367,2015-01-29,Printvenue,Asia Pacific Internet Group,Private Equity
2368,2368,2015-01-29,Graphene,KARSEMVEN Fund,Private Equity
2369,2369,2015-01-30,Mad Street Den,"Exfinity Fund, GrowX Ventures.",Private Equity
2370,2370,2015-01-30,Simplotel,MakeMyTrip,Private Equity


**Concatenation** 


concatenate function can be used to concatenate two arrays either row-wise or column-wise. Concatenate function can take two or more arrays of the same shape and by default it concatenates row-wise i.e. axis=0.

**Concatenate row wise**
![Image of Yaktocat](http://www.datasciencemadesimple.com/wp-content/uploads/2017/11/rowbind-in-python-pandas-0.png)

**Concatenate Column wise**
![Image of Yaktocat](http://www.datasciencemadesimple.com/wp-content/uploads/2017/11/column-bind-in-python-pandas-1.png)

In [88]:
rows1 = df[:3]
rows2 = df[5:7]
print(rows1.shape)
rows1

(3, 13)


Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks,month,day,year
0,0,2017-08-01,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000,Others,8,1,2017
1,1,2017-08-02,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,0,Others,8,2,2017
2,2,2017-08-02,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,0,Others,8,2,2017


In [89]:
print(rows2.shape)
rows2

(2, 13)


Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks,month,day,year
5,5,2017-07-01,Billion Loans,Consumer Internet,Peer to Peer Lending platform,Bangalore,Reliance Corporate Advisory Services Ltd,Seed Funding,1000000,Others,7,1,2017
6,6,2017-07-03,Ecolibriumenergy,Technology,Energy management solutions provider,Ahmedabad,"Infuse Ventures, JLL",Private Equity,2600000,Others,7,3,2017


In [90]:
pd.concat([rows1, rows2], axis=0)

Unnamed: 0,SNo,Date,StartupName,IndustryVertical,SubVertical,CityLocation,InvestorsName,InvestmentType,AmountInUSD,Remarks,month,day,year
0,0,2017-08-01,TouchKin,Technology,Predictive Care Platform,Bangalore,Kae Capital,Private Equity,1300000,Others,8,1,2017
1,1,2017-08-02,Ethinos,Technology,Digital Marketing Agency,Mumbai,Triton Investment Advisors,Private Equity,0,Others,8,2,2017
2,2,2017-08-02,Leverage Edu,Consumer Internet,Online platform for Higher Education Services,New Delhi,"Kashyap Deorah, Anand Sankeshwar, Deepak Jain,...",Seed Funding,0,Others,8,2,2017
5,5,2017-07-01,Billion Loans,Consumer Internet,Peer to Peer Lending platform,Bangalore,Reliance Corporate Advisory Services Ltd,Seed Funding,1000000,Others,7,1,2017
6,6,2017-07-03,Ecolibriumenergy,Technology,Energy management solutions provider,Ahmedabad,"Infuse Ventures, JLL",Private Equity,2600000,Others,7,3,2017
