<div style="color:white; background-color:#5642C5; padding: 10px; border-radius: 15px; font-size: 150%; font-family: Verdana; text-align:center; -webkit-text-stroke-width: 1px; -webkit-text-stroke-color: black; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.7);">
    PANDAS | MERGE
</div>

<div style="margin:20px; padding:20px; border-radius:10px;
            border:2px solid #4CAF50; background-color:#E6F7E2;
            font-family: Arial, sans-serif; line-height:1.6; color:black;">

  <p><b>📌 Important:</b></p>

  <p><b>PURPOSE:</b> Combines two DataFrames based on a common key (like SQL JOIN)</p>

  <p><b>SYNTAX:</b> <span style="background:#f5f5f5; padding:2px 4px; border-radius:4px;">
  pd.merge(df1, df2, on="key", how="inner")</span></p>

  <p><b>PARAMETERS:</b></p>
  <ul>
    <li><b>on</b> → common column to join on</li>
    <li><b>how</b> → type of join:
      <ul>
        <li><b>"inner"</b> (default) → only matching rows</li>
        <li><b>"left"</b> → all from left, matching from right</li>
        <li><b>"right"</b> → all from right, matching from left</li>
        <li><b>"outer"</b> → all rows from both, missing filled with NaN</li>
      </ul>
    </li>
  </ul>
</div>


### (i) inner:

In [5]:
import pandas as pd
df1 = pd.DataFrame({"ID":[1,2,3],"Name":["A","B","C"]})
print(df1)
df2 = pd.DataFrame({"ID":[2,3,4],"Score":[90,85,88]})
print(df2)
result = pd.merge(df1,df2,on="ID",how="inner")
print(result)

   ID Name
0   1    A
1   2    B
2   3    C
   ID  Score
0   2     90
1   3     85
2   4     88
   ID Name  Score
0   2    B     90
1   3    C     85


### (ii) left:

In [9]:
#Keep all rows from left(df1)

df1 = pd.DataFrame({"ID":[1,2,3],"Name":["A","B","C"]})
print(df1)
df2 = pd.DataFrame({"ID":[2,3,4],"Score":[90,85,88]})
print(df2)
result = pd.merge(df1,df2,on="ID",how="left")
print(result)

   ID Name
0   1    A
1   2    B
2   3    C
   ID  Score
0   2     90
1   3     85
2   4     88
   ID Name  Score
0   1    A    NaN
1   2    B   90.0
2   3    C   85.0


  has_large_values = (abs_vals > 1e6).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()


### (iii) right 

In [10]:
# Keep all rows from left(df2)

df1 = pd.DataFrame({"ID":[1,2,3],"Name":["A","B","C"]})
print(df1)
df2 = pd.DataFrame({"ID":[2,3,4],"Score":[90,85,88]})
print(df2)
result = pd.merge(df1,df2,on="ID",how="right")
print(result)

   ID Name
0   1    A
1   2    B
2   3    C
   ID  Score
0   2     90
1   3     85
2   4     88
   ID Name  Score
0   2    B     90
1   3    C     85
2   4  NaN     88


###  (iv) outer

In [11]:
# Keep all rows from both dataframes.

df1 = pd.DataFrame({"ID":[1,2,3],"Name":["A","B","C"]})
print(df1)
df2 = pd.DataFrame({"ID":[2,3,4],"Score":[90,85,88]})
print(df2)
result = pd.merge(df1,df2,on="ID",how="outer")
print(result)

   ID Name
0   1    A
1   2    B
2   3    C
   ID  Score
0   2     90
1   3     85
2   4     88
   ID Name  Score
0   1    A    NaN
1   2    B   90.0
2   3    C   85.0
3   4  NaN   88.0


  has_large_values = (abs_vals > 1e6).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()


<div style="color:white; background-color:#5642C5; padding: 10px; border-radius: 15px; font-size: 150%; font-family: Verdana; text-align:center; -webkit-text-stroke-width: 1px; -webkit-text-stroke-color: black; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.7);">
    PANDAS | JOIN
</div>

<div style="margin:20px; padding:20px; border-radius:10px;
            border:2px solid #4CAF50; background-color:#E6F7E2;
            font-family: Arial, sans-serif; line-height:1.6; color:black;">

  <p><b>📌 Important:</b></p>

  <p><b>PURPOSE:</b> Joins two DataFrames by index (default) or by a key column.</p>

  <p>Simpler than merge, but less flexible.</p>

  <p><b>SYNTAX:</b> <span style="background:#f5f5f5; padding:2px 4px; border-radius:4px;">
  df1.join(df2, how="left")</span></p>

</div>

### (i) pd.join()

In [15]:
data1 = {'Name':['Jai','Prince','Gaurav','Anuj'],
         'Age':[27,24,22,32]}
data2 = {'Address':['Prayagraj','Varanasi','Prayagraj','Varanasi'],
         'Qualification':['MCA','Phd','BCom','BHons']}
df = pd.DataFrame(data1,index=['KO','KO1','KO2','KO3'])
df1 = pd.DataFrame(data2,index=['KO','KO2','KO3','KO4'])
print(df,"\n\n",df1)
result = df.join(df1)
print("\n\n",result)

       Name  Age
KO      Jai   27
KO1  Prince   24
KO2  Gaurav   22
KO3    Anuj   32 

        Address Qualification
KO   Prayagraj           MCA
KO2   Varanasi           Phd
KO3  Prayagraj          BCom
KO4   Varanasi         BHons


        Name  Age    Address Qualification
KO      Jai   27  Prayagraj           MCA
KO1  Prince   24        NaN           NaN
KO2  Gaurav   22   Varanasi           Phd
KO3    Anuj   32  Prayagraj          BCom


### (ii) Join Using "on" Argument-

In [18]:
data1 = {'Name':['Jai','Prince','Gaurav','Anuj'],
         'Age':[27,24,22,32],
         'Key':['KO','K1','K2','K3']}
data2 = {'Address':['Prayagraj','Varanasi','Prayagraj','Varanasi'],
         'Qualification':['MCA','Phd','BCom','BHons']}
df = pd.DataFrame(data1)
df1 = pd.DataFrame(data2,index=['KO','K2','K3','K4'])
print(df,"\n\n",df1)
result1 = df.join(df1,on="Key")
print("\n\n",result1)

     Name  Age Key
0     Jai   27  KO
1  Prince   24  K1
2  Gaurav   22  K2
3    Anuj   32  K3 

       Address Qualification
KO  Prayagraj           MCA
K2   Varanasi           Phd
K3  Prayagraj          BCom
K4   Varanasi         BHons


      Name  Age Key    Address Qualification
0     Jai   27  KO  Prayagraj           MCA
1  Prince   24  K1        NaN           NaN
2  Gaurav   22  K2   Varanasi           Phd
3    Anuj   32  K3  Prayagraj          BCom


<div style="color:white; background-color:#5642C5; padding: 10px; border-radius: 15px; font-size: 150%; font-family: Verdana; text-align:center; -webkit-text-stroke-width: 1px; -webkit-text-stroke-color: black; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.7);">
    PANDAS | CONCATENATING
</div>

<div style="margin:20px; padding:20px; border-radius:10px;
            border:2px solid #4CAF50; background-color:#E6F7E2;
            font-family: Arial, sans-serif; line-height:1.6; color:black;">

  <p><b>📌 Important:</b></p>

  <p><b>PURPOSE:</b> Concatenates multiple DataFrames either row-wise or column-wise.</p>

  <p><b>SYNTAX:</b></p>
  <ul>
    <li><span style="background:#f5f5f5; padding:2px 4px; border-radius:4px;">
      pd.concat([df1, df2], axis=0)   # rows
    </span></li>
    <li><span style="background:#f5f5f5; padding:2px 4px; border-radius:4px;">
      pd.concat([df1, df2], axis=1)   # columns
    </span></li>
  </ul>

</div>

In [21]:
# FOR ROWS-WISE:

df1 = pd.DataFrame({"ID":[1,2],"Name":["A","B"]})
df2 = pd.DataFrame({"ID":[3,4],"Name":["C","D"]})
result = pd.concat([df1,df2],axis=0)
print(result)

   ID Name
0   1    A
1   2    B
0   3    C
1   4    D


In [22]:
# FOR COLUMN-WISE:

df1 = pd.DataFrame({"ID":[1,2],"Name":["A","B"]})
df2 = pd.DataFrame({"ID":[3,4],"Name":["C","D"]})
result = pd.concat([df1,df2],axis=1)
print(result)


   ID Name  ID Name
0   1    A   3    C
1   2    B   4    D


<div style="color:white; background-color:#5642C5; padding: 10px; border-radius: 15px; font-size: 150%; font-family: Verdana; text-align:center; -webkit-text-stroke-width: 1px; -webkit-text-stroke-color: black; text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.7);">
    PANDAS | GROUPBY() 
</div>

<div style="margin:20px; padding:20px; border-radius:10px;
            border:2px solid #4CAF50; background-color:#E6F7E2;
            font-family: Arial, sans-serif; line-height:1.6; color:black;">

  <p><b>📌 Important:</b></p>

  <p><b>PURPOSE:</b> Splits data into groups, applies a function, and then combines results.</p>

  <p>Used for aggregation & analysis.</p>

  <p><b>SYNTAX:</b></p>
  <span style="background:#f5f5f5; padding:2px 4px; border-radius:4px;">
    df.groupby("column")["value"].mean()
  </span>

</div>

In [24]:
df = pd.DataFrame({"Team":["A","A","B","B"],
                   "Points":[10,20,15,25]})
result = df.groupby("Team")["Points"].mean()
print(result)

Team
A    15.0
B    20.0
Name: Points, dtype: float64


In [32]:
import pandas as pd

data = {
  'co2': [95, 90, 99, 104, 105, 94, 99, 104],
  'model': ['Citigo', 'Fabia', 'Fiesta', 'Rapid', 'Focus', 'Mondeo', 'Octavia', 'B-Max'],
  'car': ['Skoda', 'Skoda', 'Ford', 'Skoda', 'Ford', 'Ford', 'Skoda', 'Ford']
}

df = pd.DataFrame(data)

print(df.groupby("car")["co2"].mean())

car
Ford     100.5
Skoda     97.0
Name: co2, dtype: float64


In [37]:
import pandas as pd

data = {
  'co2': [95, 90, 99, 104, 105, 94, 99, 104],
  'model': ['Citigo', 'Fabia', 'Fiesta', 'Rapid', 'Focus', 'Mondeo', 'Octavia', 'B-Max'],
  'car': ['Skoda', 'Skoda', 'Ford', 'Skoda', 'Ford', 'Ford', 'Skoda', 'Ford'],
  'fuel': ['Petrol', 'Diesel', 'Petrol', 'Diesel', 'Petrol', 'Diesel', 'Petrol', 'Diesel']  
}

df = pd.DataFrame(data)
print(df.groupby(["car", "fuel"])["co2"].mean())

car    fuel  
Ford   Diesel     99.0
       Petrol    102.0
Skoda  Diesel     97.0
       Petrol     97.0
Name: co2, dtype: float64
