### Setup

In [1]:
import pandas as pd

__Our data__:

We have three DataFrames for Bangladesh, India and USA each with weather data of a few cities. We want to combine the data into a single DataFrame.

In [2]:
bd_weather = pd.DataFrame({
    'city': ['Dhaka', 'Chittagong', 'Rajshahi'],
    'temp': [34, 33, 30],
    'humidity': [80, 72, 65]
})
bd_weather

Unnamed: 0,city,temp,humidity
0,Dhaka,34,80
1,Chittagong,33,72
2,Rajshahi,30,65


In [3]:
us_weather = pd.DataFrame({
    'city': ['New York', 'Chicago', 'Boston'],
    'temp': [21, 14, 27],
    'humidity': [63, 55, 61]
})
us_weather

Unnamed: 0,city,temp,humidity
0,New York,21,63
1,Chicago,14,55
2,Boston,27,61


In [4]:
ind_weather = pd.DataFrame({
    'city': ['Mumbai', 'Delhi', 'Bangalore'],
    'temp': [32, 43, 33],
    'humidity': [85, 78, 80]
})
ind_weather

Unnamed: 0,city,temp,humidity
0,Mumbai,32,85
1,Delhi,43,78
2,Bangalore,33,80


# Concatenating DataFrames
Concatination simply means joining to the end of the rows. So when we join these __three DataFrames__, we join them end to end in the given order.

## Vertical Concatenation

In [5]:
df = pd.concat([bd_weather, ind_weather, us_weather])
df

Unnamed: 0,city,temp,humidity
0,Dhaka,34,80
1,Chittagong,33,72
2,Rajshahi,30,65
0,Mumbai,32,85
1,Delhi,43,78
2,Bangalore,33,80
0,New York,21,63
1,Chicago,14,55
2,Boston,27,61


If we want to keep a __continuous index__ we can set the `ignore_index` kwarg to `True`.

In [6]:
df = pd.concat([bd_weather, ind_weather, us_weather], ignore_index=True)
df

Unnamed: 0,city,temp,humidity
0,Dhaka,34,80
1,Chittagong,33,72
2,Rajshahi,30,65
3,Mumbai,32,85
4,Delhi,43,78
5,Bangalore,33,80
6,New York,21,63
7,Chicago,14,55
8,Boston,27,61


### Multi index concatination
We can also have a __Multi level index__ (`MultiIndex`) as our row index, by specifying the names of the groups in the `keys`kwarg.

In [7]:
df = pd.concat([bd_weather, ind_weather, us_weather], keys=['Bangladesh', 'India', 'USA'])
df

Unnamed: 0,Unnamed: 1,city,temp,humidity
Bangladesh,0,Dhaka,34,80
Bangladesh,1,Chittagong,33,72
Bangladesh,2,Rajshahi,30,65
India,0,Mumbai,32,85
India,1,Delhi,43,78
India,2,Bangalore,33,80
USA,0,New York,21,63
USA,1,Chicago,14,55
USA,2,Boston,27,61


Normal indexing is for columns

In [8]:
df['temp']

Bangladesh  0    34
            1    33
            2    30
India       0    32
            1    43
            2    33
USA         0    21
            1    14
            2    27
Name: temp, dtype: int64

Use `df.loc` get that groups rows.

In [9]:
df.loc['USA', :]

Unnamed: 0,city,temp,humidity
0,New York,21,63
1,Chicago,14,55
2,Boston,27,61


## Horizontal Concatenation
Before we added the new DataFrames as new rows.

Now we will add the new DataFrames as new columns.

__Out data__:

In [10]:
temp_df = pd.DataFrame({
    'city': ['Mumbai', 'Delhi', 'Bangalore'],
    'temp': [32, 45, 30]
})
temp_df

Unnamed: 0,city,temp
0,Mumbai,32
1,Delhi,45
2,Bangalore,30


__New data__:

In [11]:
wind_df = pd.DataFrame({
    'city': ['Delhi', 'Bangalore', 'Mumbai'],
    'windspeed': [12, 9, 7]
})
wind_df

Unnamed: 0,city,windspeed
0,Delhi,12
1,Bangalore,9
2,Mumbai,7


In [12]:
humid_df = pd.DataFrame({
    'city': ['Delhi', 'Mumbai'],
    'windspeed': [77, 65]
})
humid_df

Unnamed: 0,city,windspeed
0,Delhi,77
1,Mumbai,65


## Concat by columns
To concatenate by horizontally by columns, we set the `axis` kwarg to `1`.

In [13]:
pd.concat([temp_df, humid_df, wind_df], axis=1)

Unnamed: 0,city,temp,city.1,windspeed,city.2,windspeed.1
0,Mumbai,32,Delhi,77.0,Delhi,12
1,Delhi,45,Mumbai,65.0,Bangalore,9
2,Bangalore,30,,,Mumbai,7


However, this way we seem to be getting repeated city columns and our the indexes are misaligned! We can fix this by reindexing our DataFrames before concatenating them.

We must use the `set_index` method of each DataFrame and make sure that the same city data have the same indexes in each DataFrame.

___Remember___: 
- `set_index` assigns new indecies to each row, without chaning the existing order.

- `reindex` reorders the rows, without assigning any new indecies.

In [14]:
import numpy as np

In [15]:
temp_df.set_index(np.array([0, 1, 2]), inplace=True)
temp_df

Unnamed: 0,city,temp
0,Mumbai,32
1,Delhi,45
2,Bangalore,30


In [16]:
wind_df.set_index(np.array([1, 2, 0]), inplace=True)
wind_df

Unnamed: 0,city,windspeed
1,Delhi,12
2,Bangalore,9
0,Mumbai,7


In [17]:
humid_df.set_index(np.array([1, 0]), inplace=True)
humid_df

Unnamed: 0,city,windspeed
1,Delhi,77
0,Mumbai,65


Now that all cities have the same index in each DataFrame, we can concatenate them.

In [18]:
pd.concat([temp_df, humid_df, wind_df], axis=1)

Unnamed: 0,city,temp,city.1,windspeed,city.2,windspeed.1
0,Mumbai,32,Mumbai,65.0,Mumbai,7
1,Delhi,45,Delhi,77.0,Delhi,12
2,Bangalore,30,,,Bangalore,9


# Appending Series to DataFrame

In [19]:
temp_df

Unnamed: 0,city,temp
0,Mumbai,32
1,Delhi,45
2,Bangalore,30


In [20]:
ser = pd.Series(["Humid", "Dry", "Rain"], name="event")
ser

0    Humid
1      Dry
2     Rain
Name: event, dtype: object

## Added the Series as a new column

In [21]:
pd.concat([temp_df, ser], axis=1)

Unnamed: 0,city,temp,event
0,Mumbai,32,Humid
1,Delhi,45,Dry
2,Bangalore,30,Rain
