In [1]:
import pandas as pd

This tutorial goes over how to use pandas concat function to join or append two or more dataframes. 

###### Creating our df's

In [5]:
india_weather = pd.DataFrame({
    "city": ["mumbai","delhi","bangalore"],
    "temperature": [32,45,30],
    "humidity": [80, 60, 78]
})
india_weather

Unnamed: 0,city,humidity,temperature
0,mumbai,80,32
1,delhi,60,45
2,bangalore,78,30


In [4]:
us_weather = pd.DataFrame({
    "city": ["new york","chicago","orlando"],
    "temperature": [21,14,35],
    "humidity": [68, 65, 75]
})
us_weather

Unnamed: 0,city,humidity,temperature
0,new york,68,21
1,chicago,65,14
2,orlando,75,35


###### Scenario - We want to join our two df's so that we can get a single df which has the weather data for all the cities in both the Indoa and the USA

In [6]:
df = pd.concat([india_weather, us_weather])

In the concat function, you onlt need to pass are the df's that you want to join together

In [7]:
df

Unnamed: 0,city,humidity,temperature
0,mumbai,80,32
1,delhi,60,45
2,bangalore,78,30
0,new york,68,21
1,chicago,65,14
2,orlando,75,35


We can see that a single df has been created which has data from all of the cities. While this works well, we do have a problem with the index. It is using the indices from the two original df's but this means that we have ended up with lots of duplicates. To cure this, we need to pass an extra argument to the concat function...

In [8]:
df = pd.concat([india_weather, us_weather], ignore_index = True)

In [9]:
df 

Unnamed: 0,city,humidity,temperature
0,mumbai,80,32
1,delhi,60,45
2,bangalore,78,30
3,new york,68,21
4,chicago,65,14
5,orlando,75,35


For more information on the arguments that the concat function takes visit - https://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html

###### keys
This allows you to associate a key for each df that you use in the concat function.

In [11]:
df = pd.concat([india_weather, us_weather], keys = ['India', 'USA'])
df

Unnamed: 0,Unnamed: 1,city,humidity,temperature
India,0,mumbai,80,32
India,1,delhi,60,45
India,2,bangalore,78,30
USA,0,new york,68,21
USA,1,chicago,65,14
USA,2,orlando,75,35


N.B - The keys argument does not work with ignore_index

To retrieve a subset of your df, you can now say...

In [12]:
df.loc['India'] # This is useful when you have merged or concatenated your df's into one big df

Unnamed: 0,city,humidity,temperature
0,mumbai,80,32
1,delhi,60,45
2,bangalore,78,30


###### Appending df's as columns rather than rows

In [13]:
temperature_df = pd.DataFrame({
    "city": ["mumbai","delhi","bangalore"],
    "temperature": [32,45,30],
}, index=[0,1,2])
temperature_df

Unnamed: 0,city,temperature
0,mumbai,32
1,delhi,45
2,banglore,30


In [14]:
windspeed_df = pd.DataFrame({
    "city": ["delhi","mumbai"],
    "windspeed": [7,12],
}, index=[1,0])
windspeed_df

Unnamed: 0,city,windspeed
1,delhi,7
0,mumbai,12


Use the index argument, when concatenating different df's to make sure that the index for the new df matches the index from the two old ones. For example, in the first df, Mumbai has an index of zero and delhi has an index of 1 as that is the order that they appear in the df. However, in the second df, their positions are reversed so when we specify the index we put 1 first and zero second to recognise the relative positions of the two cities. If we did not use the index argument, then the combined df's index would be confused between Mumbai being zero in one df and 1 in the other. Very important to use the index argument when concatenating df's.

When we append these two df's, ideally what we want to see is windspeed_df appear as a column next to temperature in a single df. Our df should have columns for city, temp and windspeed...

In [16]:
df02 = pd.concat([temperature_df, windspeed_df], axis = 1)
df02

Unnamed: 0,city,temperature,city.1,windspeed
0,mumbai,32,mumbai,12.0
1,delhi,45,delhi,7.0
2,banglore,30,,


By default, the axis argument is zero and, as we know, that refers to rows. So the concat function will, by default, concat df's by adding new rows. If we want it to add new columns then we have to pass it the axis = 1 argument.

###### Joining a Series to our DataFrame

In [17]:
s = pd.Series(["Humid","Dry","Rain"], name="event")
s

0    Humid
1      Dry
2     Rain
Name: event, dtype: object

We are going to concat our series data to the temperature_df df but as a column...

In [18]:
df03 = pd.concat([temperature_df,s], axis=1)
df03

Unnamed: 0,city,temperature,event
0,mumbai,32,Humid
1,delhi,45,Dry
2,banglore,30,Rain
