---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 3.20 (Pandas-12)</h1>

## _Appending and Concatenating Dataframes.ipynb_

## Learning agenda of this notebook

1. Append Dataframes using `df.append()`
2. Concatenating Dataframes using `pd.concat()`
    - Row wise Concatenation
    - Column wise Concatenation
    - Adding a single Row/Column in a Dataframe

<img align="right" width="310" height="100"  src="images/append.png"  >

## 1. Append DataFrames
- The `df1.append(df2)` method is used to concat the second dataframe’s records at the end of first dataframe. Columns not present in the first DataFrame are added as new columns
- The `df1.append(df2)` method considers the calling dataframe as main object and adds rows to that dataframe from the dataframes that are passed to the function as argument.
- It returns a new dataframe object consisting of the rows of caller and the rows of `other`. The dataframe that called the `append()` method,  remain unchanged.
```
df.append(other, ignore_index=False, verify_integrity=False, sort=False)
```

    - `other`: DataFrame or Series/dict-like object, or list of these (The data to append.)
    - `ignore_index`: If True, the resulting axis will be labeled 0, 1, …, n - 1 (default is False)
    - `verify_integrity`: If True, raise ValueError on creating index with duplicates (default is False)
    - `sort`: Sort columns if the columns of `self` and `other` are not aligned (default is False)

### a. Append Two DataFrames
- In most of the real life projects you will not get data from a single resource. You might need to combine data that you gather from multiple sources.
- There are different ways to join the data frames, let us use `df.append()` method to join two dataframes row-wise

In [None]:
# Let us create a simple data frame
import pandas as pd

Pak_Weather = pd.DataFrame({
    'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],
    'temperature' : [35, 39, 33, 29, 15],
    'humidity' : [76, 95, 72, 81, 70],
})
Pak_Weather

In [None]:
import pandas as pd

UAE_Weather = pd.DataFrame({
    'city': [ 'Dubai', 'Sharja', 'Ajman', 'Abu Dhabi'],
    'temperature' : [41, 44, 47, 45],
    'humidity' : [88, 99, 79, 86],
})
UAE_Weather

In [None]:
# append Dataframe
df2 =  Pak_Weather.append(UAE_Weather)
df2

In [None]:
# set the ignore_index to true
df2 =  Pak_Weather.append(UAE_Weather, ignore_index=True)
df2

### b. Append a Row in DataFrame

In [None]:
# Let us create a simple data frame
import pandas as pd

Pak_Weather = pd.DataFrame({
    'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],
    'temperature' : [35, 39, 33, 29, 15],
    'humidity' : [76, 95, 72, 81, 70],
})
Pak_Weather

#### Creating a row to be appended

In [None]:
d1 = pd.DataFrame({"city": "Multan", "temperature": 45, "humidity": 75}, index=[5])
d1

**Append this dataframe having single row to Pak_Weather dataframe**

In [None]:
df3 =  Pak_Weather.append(d1)
df3

## 2. Concatenation of  DataFrames (Row Wise + Column Wise)

<img align="left" width="350" height="90"  src="images/row.png"  >
<img align="right" width="490" height="100"  src="images/concat_2.png" >

<br><br><br><br><br><br><br><br><br><br>


- The `pd.concat()` method is used to concat pandas objects along a particular axis with optional set logic along the other axes. 
```
pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None)
```

Where,
- `objs`: a sequence or mapping of Series or DataFrame objects
- `axis`: The axis to concatenate along. {0/’index’, 1/’columns’}, default 0
- `join`{‘inner’, ‘outer’}, default ‘outer’ (How to handle indexes on other axis (or axes).)
- `ignore_index`: If True, the resulting axis will be labeled 0, …, n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information. (default is False)
- `keys`: sequence, default None (Construct hierarchical index using the passed keys as the outermost level.)

## 3. Row-Wise Concatenation
<img align="left" width="350" height="90"  src="images/row.png"  >

### a. Creating a two Simple Dataframe

In [1]:
# Let us create a simple data frame
import pandas as pd

Pak_Weather = pd.DataFrame({
    'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],
    'temperature' : [35, 39, 33, 29, 15],
    'humidity' : [76, 95, 72, 81, 70],
})
Pak_Weather

Unnamed: 0,city,temperature,humidity
0,Lahore,35,76
1,Karachi,39,95
2,Peshawer,33,72
3,Islamabad,29,81
4,Muree,15,70


In [2]:
import pandas as pd

UAE_Weather = pd.DataFrame({
    'city': [ 'Dubai', 'Sharja', 'Ajman', 'Abu Dhabi'],
    'temperature' : [41, 44, 47, 45],
    'humidity' : [88, 99, 79, 86],
})
UAE_Weather

Unnamed: 0,city,temperature,humidity
0,Dubai,41,88
1,Sharja,44,99
2,Ajman,47,79
3,Abu Dhabi,45,86


### b. Concatenate Dataframes (row-wise)

In [4]:
pd.concat([Pak_Weather, UAE_Weather], axis=0)

Unnamed: 0,city,temperature,humidity
0,Lahore,35,76
1,Karachi,39,95
2,Peshawer,33,72
3,Islamabad,29,81
4,Muree,15,70
0,Dubai,41,88
1,Sharja,44,99
2,Ajman,47,79
3,Abu Dhabi,45,86


- Notice the index is also concatenated as such
- To handle this pass `ignore_index` parameter a value of True, so that the resulting axis is be labeled 0, …, n - 1. 
- Useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information.
- Note the index values on the other axes (i.e., columns) have still respected in the join.

In [5]:
pd.concat([Pak_Weather,UAE_Weather], ignore_index=True)

Unnamed: 0,city,temperature,humidity
0,Lahore,35,76
1,Karachi,39,95
2,Peshawer,33,72
3,Islamabad,29,81
4,Muree,15,70
5,Dubai,41,88
6,Sharja,44,99
7,Ajman,47,79
8,Abu Dhabi,45,86


- Other than the numeric index, if you want to have an additional index for your sub groups, you can use the `keys` argument to `pd.concat()` method
- It provides multi-indexing
- Remember this will work only if the ignore_index argument is false which is the default

In [7]:
pd.concat([Pak_Weather, UAE_Weather], keys=["Pak", "UAE"])

Unnamed: 0,Unnamed: 1,city,temperature,humidity
Pak,0,Lahore,35,76
Pak,1,Karachi,39,95
Pak,2,Peshawer,33,72
Pak,3,Islamabad,29,81
Pak,4,Muree,15,70
UAE,0,Dubai,41,88
UAE,1,Sharja,44,99
UAE,2,Ajman,47,79
UAE,3,Abu Dhabi,45,86


- The advantage of doing this is you can use `df.loc` to get a subset of your dataframe
- So, after getting a big dataframe if you want to get the dataframe from which it was created keys arg is useful

In [8]:
df.loc['UAE']

Unnamed: 0,city,temperature,humidity
0,Dubai,41,88
1,Sharja,44,99
2,Ajman,47,79
3,Abu Dhabi,45,86


### c. What will Happen if one of the Dataframe has an Additional Column
- If you combine two DataFrame objects which do not have all the same columns, then the columns outside the intersection will be filled with NaN values.

In [14]:
import pandas as pd

# This dataframe doesn't have the Humidity Column
Pak_Weather = pd.DataFrame({
    'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],
    'temperature' : [35, 39, 33, 29, 15],
    
})
Pak_Weather

Unnamed: 0,city,temperature
0,Lahore,35
1,Karachi,39
2,Peshawer,33
3,Islamabad,29
4,Muree,15


In [15]:
import pandas as pd


UAE_Weather = pd.DataFrame({
    'city': [ 'Dubai', 'Sharja', 'Ajman', 'Abu Dhabi'],
    'temperature' : [41, 44, 47, 45],
    'humidity' : [88, 99, 79, 86],
})
UAE_Weather

Unnamed: 0,city,temperature,humidity
0,Dubai,41,88
1,Sharja,44,99
2,Ajman,47,79
3,Abu Dhabi,45,86


In [16]:
# NaN will be placed where values are missing
df = pd.concat([Pak_Weather,UAE_Weather], ignore_index=True)
df

Unnamed: 0,city,temperature,humidity
0,Lahore,35,
1,Karachi,39,
2,Peshawer,33,
3,Islamabad,29,
4,Muree,15,
5,Dubai,41,88.0
6,Sharja,44,99.0
7,Ajman,47,79.0
8,Abu Dhabi,45,86.0


## 4. Column Wise Concatenation
- It is not advised to concatenate dataframes column wise. If you want to then you need to take care of some checks like:
    - the number of rows must be same in both dataframes, and
    - Indexes of both dataframes are sorted
- If you are done with all the checks then you can simply use `axis=1` to do the job.

<img align="left" width="490" height="100"  src="images/concat_2.png"  >

### a. Creating a two Simple Dataframe

In [26]:
import pandas as pd

temp_df = pd.DataFrame({
    'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],
    'temperature' : [35, 39, 33, 29, 15],
})
temp_df

Unnamed: 0,city,temperature
0,Lahore,35
1,Karachi,39
2,Peshawer,33
3,Islamabad,29
4,Muree,15


In [27]:
wind_df = pd.DataFrame({
    'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],
    'wind speed' : [9, 12, 7, 13, 18],
})
wind_df

Unnamed: 0,city,wind speed
0,Lahore,9
1,Karachi,12
2,Peshawer,7
3,Islamabad,13
4,Muree,18


### b. Concatenate Dataframes (column-wise)

In [29]:
# We have to use the argument axis=1
df = pd.concat([temp_df,wind_df], axis=1)
df

Unnamed: 0,city,temperature,city.1,wind speed
0,Lahore,35,Lahore,9
1,Karachi,39,Karachi,12
2,Peshawer,33,Peshawer,7
3,Islamabad,29,Islamabad,13
4,Muree,15,Muree,18


### c. What will happen if we have missing data in our dataframes

In [30]:
import pandas as pd
# This dataframe do not have the temperature for Lahore
temp_df = pd.DataFrame({
    'city': [ 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],
    'temperature' : [39, 33, 29, 15],
})
temp_df

Unnamed: 0,city,temperature
0,Karachi,39
1,Peshawer,33
2,Islamabad,29
3,Muree,15


In [31]:
#This dataframe do not have the windspeed of Islamabad
wind_df = pd.DataFrame({
    'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Muree'],
    'wind speed' : [9, 12, 7, 18],
})
wind_df

Unnamed: 0,city,wind speed
0,Lahore,9
1,Karachi,12
2,Peshawer,7
3,Muree,18


In [32]:
df1 = pd.concat([temp_df,wind_df], axis=1)
df1

Unnamed: 0,city,temperature,city.1,wind speed
0,Karachi,39,Lahore,9
1,Peshawer,33,Karachi,12
2,Islamabad,29,Peshawer,7
3,Muree,15,Muree,18


**This doesnot look correct**
- We have missing data in the resulting dataframe, i.e., it does not contain record for Lahore, which was there in the second dataframe but not in the first
- Solution is while creating the dataframe you pass it the index
- In Pandas, while creating a DataFrame, you can pass an index, which is a way to align rows from different dataframes
- This is shown below
---

In [35]:
import pandas as pd
# This dataframe do not have the temperature for Lahore
temp_df = pd.DataFrame({
    'city': [ 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],
    'temperature' : [39, 33, 29, 15],
},index=[0,1,2,3])
temp_df

Unnamed: 0,city,temperature
0,Karachi,39
1,Peshawer,33
2,Islamabad,29
3,Muree,15


In [36]:
#This dataframe do not have the windspeed of Islamabad
wind_df = pd.DataFrame({
    'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Muree'],
    'wind speed' : [9, 12, 7, 18],
}, index=[4,0,1,3])
wind_df

Unnamed: 0,city,wind speed
4,Lahore,9
0,Karachi,12
1,Peshawer,7
3,Muree,18


---
#### Note the indexes in above two dataframes match. Now concatenation will be OK
---

In [37]:
df = pd.concat([temp_df,wind_df], axis=1)
df

Unnamed: 0,city,temperature,city.1,wind speed
0,Karachi,39.0,Karachi,12.0
1,Peshawer,33.0,Peshawer,7.0
2,Islamabad,29.0,,
3,Muree,15.0,Muree,18.0
4,,,Lahore,9.0


- Concatenating Dataframes along axis = 0 appends one Dataframe below the other
- Concatenating Dataframes along axis = 1 adds one Dataframe along the other. It is like a full outer join. Placing NaN for non-matching rows in the left as well as right Dataframes.
- By default, a concatenation results in a set union, where all data is preserved. As you've seen in the above example and you can specify this with the join parameter as well. 
- Students should try using the join parameter and passing it different values.
- This will be covered in the next session of merging and joining

## 5. Adding a Single Row/Column in a Dataframe
- The way we can append a series to a dataframe using the `df.append()` method, we can do the same with `df.concat()` method. Rather we can concatenate a column as well.

### a. Adding a Row in a Dataframe

In [40]:
import pandas as pd

df1 = pd.DataFrame({
    'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],
    'temperature' : [35, 39, 33, 29, 15],
    'humidity' : [76, 95, 72, 81, 70],
})
df1

Unnamed: 0,city,temperature,humidity
0,Lahore,35,76
1,Karachi,39,95
2,Peshawer,33,72
3,Islamabad,29,81
4,Muree,15,70


In [41]:
df2 = pd.DataFrame({"city": "Multan", "temperature": 45, "humidity": 75}, index=[5])
df2

Unnamed: 0,city,temperature,humidity
5,Multan,45,75


In [42]:
df3 = pd.concat([df1, df2], ignore_index=True, axis = 0)
df3

Unnamed: 0,city,temperature,humidity
0,Lahore,35,76
1,Karachi,39,95
2,Peshawer,33,72
3,Islamabad,29,81
4,Muree,15,70
5,Multan,45,75


In [43]:
# Let us place the new dataframe row at your desird location using slicing operator
df3 = pd.concat([df1[:2], df2, df1[2:]], ignore_index = True)
df3

Unnamed: 0,city,temperature,humidity
0,Lahore,35,76
1,Karachi,39,95
2,Multan,45,75
3,Peshawer,33,72
4,Islamabad,29,81
5,Muree,15,70


### b. Adding a Column in a Dataframe

In [44]:
import pandas as pd

Pak_Weather = pd.DataFrame({
    'city': [ 'Lahore', 'Karachi', 'Peshawer', 'Islamabad', 'Muree'],
    'temperature' : [35, 39, 33, 29, 15],
    'humidity' : [76, 95, 72, 81, 70],
})
Pak_Weather

Unnamed: 0,city,temperature,humidity
0,Lahore,35,76
1,Karachi,39,95
2,Peshawer,33,72
3,Islamabad,29,81
4,Muree,15,70


In [45]:
s = pd.Series(["Humid", 'Dry', 'Rainy', 'Humid', 'Rainy'], name="event")
s

0    Humid
1      Dry
2    Rainy
3    Humid
4    Rainy
Name: event, dtype: object

In [46]:
df = pd.concat([Pak_Weather, s], axis=1)
df

Unnamed: 0,city,temperature,humidity,event
0,Lahore,35,76,Humid
1,Karachi,39,95,Dry
2,Peshawer,33,72,Rainy
3,Islamabad,29,81,Humid
4,Muree,15,70,Rainy


<h1 style="color:red;">Conclusion</h1>

- The `df.append()` method add rows of second data frame to first dataframe iteratively one by one.
- The `df.concat()` method do a single operation to finish the job, which makes it faster than `append()`
- Concatenating Dataframes along axis = 0 appends one Dataframe below the other
- Concatenating Dataframes along axis = 1 adds one Dataframe along sides. It is like performing the Full Outer Join. Placing NaN for non-matching rows in the left as well as right Dataframes.