<img src="https://i.imgur.com/6U6q5jQ.png"/>

# Concatenating Data Frames in Python

Appending is an operation at the data frame level. It is an easy operation when all the data frames have the **same** column names, and in the same position (vertical appending).

Let me bring some data frames:

In [5]:
import pandas as pd
import glob
import os

all_names = glob.glob(os.path.join('FilesToConcatenate' , "*P.csv"))
all_names

['FilesToConcatenate/wars1P.csv',
 'FilesToConcatenate/wars2P.csv',
 'FilesToConcatenate/wars3P.csv',
 'FilesToConcatenate/wars4P.csv']

Now, I will create a list of data frames:

In [7]:
dfs=[]
for name in all_names:
    dfs.append(pd.read_csv(name))

Let me check if the column names are the same:

In [8]:
for df in dfs:
    print(df.columns)

Index(['War', 'Deathrange', 'Date', 'Combatants', 'Location', 'Notes'], dtype='object')
Index(['War', 'Deathrange', 'Date', 'Combatants', 'Location', 'Notes'], dtype='object')
Index(['War', 'Deathrange', 'Date', 'Combatants', 'Location', 'Notes'], dtype='object')
Index(['Deathrange', 'War', 'Date', 'url', 'Location', 'Notes'], dtype='object')


In this situation, I need to work on the column names of the last one:

In [9]:
# keep in the rigth order
dfs[3]['Combatants']=None
dfs[3]=dfs[3][['War', 'Deathrange', 'Date', 'Combatants', 'Location', 'Notes']]

Let's verify:

In [10]:
# do this again:
for df in dfs:
    print(df.columns)

Index(['War', 'Deathrange', 'Date', 'Combatants', 'Location', 'Notes'], dtype='object')
Index(['War', 'Deathrange', 'Date', 'Combatants', 'Location', 'Notes'], dtype='object')
Index(['War', 'Deathrange', 'Date', 'Combatants', 'Location', 'Notes'], dtype='object')
Index(['War', 'Deathrange', 'Date', 'Combatants', 'Location', 'Notes'], dtype='object')


Now we can concatenate them, and count the amount of rows:

In [13]:
allWars=pd.concat(objs=dfs, # DFs as a list
                  axis=0, # one DF on top of the other
                  ignore_index=True, #very important
                  copy=False)
allWars.shape

(361, 6)

In [14]:
# this is it:

allWars.head(20)

Unnamed: 0,War,Deathrange,Date,Combatants,Location,Notes
0,Conquests of Cyrus the Great,"100,000+",549 BC–530 BC,Persian Empire vs. various states,Middle East,Number given is the sum of all deaths in battl...
1,Greco–Persian Wars,"300,000+",499 BC–449 BC,Greek City-States vs. Persian Empire,Greece,
2,Samnite Wars,"33,500+",343 BC–290 BC,Roman Republic vs. Samnites,Italy,Number given is the sum of all deaths in battl...
3,Wars of Alexander the Great,"142,000+",336 BC–323 BC,Macedonian Empire and other Greek City-States ...,Middle East / North Africa / Central Asia / India,Number given is the sum of all deaths in battl...
4,Punic Wars,"1,250,000–1,850,000",264 BC–146 BC,Roman Republic vs. Carthaginian Empire,Western Europe / North Africa,
5,First Punic War,"400,000+",264 BC–241 BC,Roman Republic vs. Carthaginian Empire,Southern Europe / North Africa,– Part of the Punic Wars
6,Second Punic War,"770,000+",218 BC–201 BC,Roman Republic vs. Carthaginian Empire,Western Europe / North Africa,[1] – Part of the Punic Wars
7,Third Punic War,"150,000–250,000",149 BC–146 BC,Roman Republic vs. Carthaginian Empire,Tunisia,– Part of the Punic Wars
8,Kalinga War,"150,000–200,000",262 BC–261 BC,Maurya Empire vs. State of Kalinga,India,
9,Qin's Wars of Unification,"700,000+[citation needed]",230 BC–221 BC,"Qin state vs. Han, Zhao, Yan, Wei, Chu, Qi States",China,– Part of Warring States Period


You can save this now. This still needs cleaning and formatting.

In [None]:
# pathAndName=os.path.join('FilesToAppend','AllWars_messy.csv')
# allWars.to_csv(pathAndName,index=False)