### Pandas Concatenation

Concatenation means joining multiple DataFrames together, one after the other — either vertically (row-wise) or horizontally (column-wise)

In [1]:
import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Name': ['Charlie', 'David'],
    'Age': [35, 40]
})

# Concatenate vertically
result = pd.concat([df1, df2])
result

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
0,Charlie,35
1,David,40


Notice: It kept the original indexes from df2 (0, 1).
You can fix that using ignore_index=True:

In [2]:
pd.concat([df1, df2], ignore_index=True)

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35
3,David,40




---



 ### Horizontal Concatenation
 (like gluing side by side) This time you're adding more columns to your table, side by side.

In [2]:
df3 = pd.DataFrame({
    'Height': [160, 175]
})

# Concatenate horizontally
result = pd.concat([df1, df3], axis=1)
result

Unnamed: 0,Name,Age,Height
0,Alice,25,160
1,Bob,30,175




---



### 5 Simple Exercises (for warm-up)

`1.`
Combine students1.csv and students2.csv vertically to make a single table of all students.

In [3]:
stud1 = pd.read_csv('students1.csv')
stud1

Unnamed: 0,Name,Age
0,Alice,22
1,Bob,24


In [4]:
stud2 = pd.read_csv('students2.csv')
stud2

Unnamed: 0,Name,Age
0,Charlie,23
1,David,25


In [5]:
students = pd.concat([stud1, stud2], ignore_index = True)
students

Unnamed: 0,Name,Age
0,Alice,22
1,Bob,24
2,Charlie,23
3,David,25




---



`2.` Join grades1.csv and grades2.csv vertically, and reset the index.

In [6]:
grade1 = pd.read_csv('grades1.csv')
grade1

Unnamed: 0,Grade,Subject
0,A,Math
1,B,English


In [7]:
grade2 = pd.read_csv('grades2.csv')
grade2

Unnamed: 0,Grade,Subject
0,C,Science
1,A,History


In [8]:
grades = pd.concat([grade1, grade2], ignore_index = True)
grades

Unnamed: 0,Grade,Subject
0,A,Math
1,B,English
2,C,Science
3,A,History




---



`3.` Concatenate students1.csv and info1.csv side-by-side (horizontally).

In [9]:
inf1 = pd.read_csv('info1.csv')
inf1

Unnamed: 0,Name,Height
0,Alice,160
1,Bob,170


In [10]:
stud_inf = pd.concat([stud1, inf1],axis = 1, ignore_index = True)  ## Concatenating
stud_inf

Unnamed: 0,0,1,2,3
0,Alice,22,Alice,160
1,Bob,24,Bob,170


In [11]:
st = pd.merge(stud1, inf1, on = 'Name', how = 'inner')  ## MERGING
st

Unnamed: 0,Name,Age,Height
0,Alice,22,160
1,Bob,24,170




---



`4.`  Concatenate grades1.csv and students1.csv vertically and observe what happens when columns don't match.

In [12]:
stud_grade = pd.concat([stud1, grade1], ignore_index = True)   ##  When columns do not match, by default null will appeare
stud_grade

Unnamed: 0,Name,Age,Grade,Subject
0,Alice,22.0,,
1,Bob,24.0,,
2,,,A,Math
3,,,B,English




---



`5.` Read students1.csv and concatenate it with itself 3 times vertically.

In [13]:
stud3 = pd.concat([stud1, stud1, stud1], ignore_index = True)
stud3

Unnamed: 0,Name,Age
0,Alice,22
1,Bob,24
2,Alice,22
3,Bob,24
4,Alice,22
5,Bob,24




---



### 5 Challenging Exercises (for deeper understanding)

`1.`
Concatenate info1.csv and info2.csv horizontally, and then merge it with students2.csv based on names.

In [14]:
inf2 = pd.read_csv('info2.csv')
inf2

Unnamed: 0,Name,Weight
0,Charlie,65
1,David,72


In [16]:
inf3 = pd.concat([inf1, inf2], axis = 1, ignore_index = True)
inf3

Unnamed: 0,0,1,2,3
0,Alice,160,Charlie,65
1,Bob,170,David,72


In [17]:
inf3.columns = ['name', 'Hight', 'Name', 'Weight']
inf_stud = pd.merge(inf3, stud2, on = 'Name', how = 'inner')
inf_stud

Unnamed: 0,name,Hight,Name,Weight,Age
0,Alice,160,Charlie,65,23
1,Bob,170,David,72,25


In [18]:
stud2

Unnamed: 0,Name,Age
0,Charlie,23
1,David,25




---



`2.` Concatenate mixed1.csv and mixed2.csv vertically, and analyze what happens to columns that are missing in one file.

In [19]:
mi1 = pd.read_csv('mixed1.csv', index_col = 0)
mi2 = pd.read_csv('mixed2.csv', index_col = 0)
print(mi1, '\n-------------------------')
print(mi2)

     Name  Score
ID              
1   Alice     88
2     Bob     90 
-------------------------
       Name    City
ID                 
3   Charlie   Paris
4     David  London


In [20]:
mixes = pd.concat([mi1, mi2])
mixes

Unnamed: 0_level_0,Name,Score,City
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Alice,88.0,
2,Bob,90.0,
3,Charlie,,Paris
4,David,,London




---



`3.` Create a DataFrame of your own with just one column: Name, then concatenate it vertically with students1.csv and handle the missing Age values.

In [21]:
stud1

Unnamed: 0,Name,Age
0,Alice,22
1,Bob,24


In [22]:
mydf = pd.DataFrame({
    'Name':['Kaizen', 'Jinwoo', 'Jinho']
})
mydf

Unnamed: 0,Name
0,Kaizen
1,Jinwoo
2,Jinho


In [23]:
studm = pd.concat([stud1, mydf])
studm

Unnamed: 0,Name,Age
0,Alice,22.0
1,Bob,24.0
0,Kaizen,
1,Jinwoo,
2,Jinho,


In [25]:
import numpy as np
studm = studm.fillna(studm['Age'].mean())
studm['Age'] = studm['Age'].astype(np.int8)
studm

Unnamed: 0,Name,Age
0,Alice,22
1,Bob,24
0,Kaizen,23
1,Jinwoo,23
2,Jinho,23




---



`4.`
Concatenate all four CSVs: students1.csv, students2.csv, grades1.csv, grades2.csv. Try both vertical and horizontal ways. Write what’s different in each result.

In [27]:
students = pd.concat([stud1, stud2, grade1, grade2], ignore_index = True)
students

Unnamed: 0,Name,Age,Grade,Subject
0,Alice,22.0,,
1,Bob,24.0,,
2,Charlie,23.0,,
3,David,25.0,,
4,,,A,Math
5,,,B,English
6,,,C,Science
7,,,A,History


In [37]:
from random import choice
students['Name'] = students['Name'].fillna('names')
students['Grade'] = students['Grade'].apply(
    lambda x: choice(['A', 'B', 'C']) if pd.isna(x) else x)
students

Unnamed: 0,Name,Age,Grade,Subject
0,Alice,22.0,C,
1,Bob,24.0,C,
2,Charlie,23.0,C,
3,David,25.0,C,
4,names,,A,Math
5,names,,B,English
6,names,,C,Science
7,names,,A,History


In [39]:
students['Age'] = students['Age'].fillna(students['Age'].mean()).astype(np.int8)
students['Subject'] = students['Subject'].fillna('Not Available')

In [40]:
students

Unnamed: 0,Name,Age,Grade,Subject
0,Alice,22,C,Not Available
1,Bob,24,C,Not Available
2,Charlie,23,C,Not Available
3,David,25,C,Not Available
4,names,23,A,Math
5,names,23,B,English
6,names,23,C,Science
7,names,23,A,History
