### What is a DataFrame?

- A **DataFrame** is a two-dimensional, size-mutable, heterogeneous **tabular data structure** with labeled axes:
  - **Rows** → `index`
  - **Columns** → `column labels`

- Think of it like:
  - A spreadsheet (Excel)
  - A SQL table
  - A dictionary of Series (columns share the same row index)

###  Why use a DataFrame?
- Easy and intuitive **row/column selection**
- Built-in support for **missing data handling**
- Powerful **grouping and aggregation** tools
- Seamless **I/O with CSV, Excel, SQL, JSON**, and more
- Built on **NumPy** → **fast** and **vectorized** computations

---

### Creating a DataFrame 
#### 1. from a Dictionary

####  Syntax:
```python
pd.DataFrame(data, index=None, columns=None, dtype=None)
Rule of thumb: Each key becomes a column; each value supplies that column’s data.

#### Variant A – Dict of Lists / Arrays

In [7]:
import pandas as pd

# 1 – Basic numeric data
data = {'Name': ['Ana', 'Ben', 'Cara'],
        'Age':  [23,   25,   22]}
df1 = pd.DataFrame(data)
print(df1)


   Name  Age
0   Ana   23
1   Ben   25
2  Cara   22


In [9]:
df1

Unnamed: 0,Name,Age
0,Ana,23
1,Ben,25
2,Cara,22


#### 2 – Mixed dtypes + custom row index

In [11]:
data = {'City':     ['Pune', 'Delhi', 'Mumbai'],
        'Temp_C':   [32.0,   36.5,    34.2],
        'Humidity': [60,     55,      70]}
df2 = pd.DataFrame(data, index=['Mon', 'Tue', 'Wed'])
print(df2)


       City  Temp_C  Humidity
Mon    Pune    32.0        60
Tue   Delhi    36.5        55
Wed  Mumbai    34.2        70


#### 3 – Select / reorder columns at construction

In [15]:
cols = ['Temp_C', 'City']          # omit Humidity on purpose
df3  = pd.DataFrame(data, columns=cols, index=['Mon', 'Tue', 'Wed'])
print(df3)


     Temp_C    City
Mon    32.0    Pune
Tue    36.5   Delhi
Wed    34.2  Mumbai


In [17]:
cols = ['City','Humidity']          # omit Humidity on purpose
df3  = pd.DataFrame(data, columns=cols, index=['Mon', 'Tue', 'Wed'])
print(df3)

       City  Humidity
Mon    Pune        60
Tue   Delhi        55
Wed  Mumbai        70


### Variant B — Creating a DataFrame from a **Dictionary of Series**

> **Pattern** `pd.DataFrame({col_name: series, …}, index=None)`

#### Rule of Thumb
> *“A DataFrame built from a dict of Series behaves like an **outer join on the row labels**, forming one column per Series.”*


In [32]:
s_sales  = pd.Series([250, 300, 400], index=['Q1', 'Q2', 'Q3'])
s_profit = pd.Series([ 80, 110],      index=['Q1', 'Q4'])
df4 = pd.DataFrame({'Sales': s_sales, 'Profit': s_profit})
print(df4)


    Sales  Profit
Q1  250.0    80.0
Q2  300.0     NaN
Q3  400.0     NaN
Q4    NaN   110.0


#### 2 – Supplying an explicit overall index

In [37]:
df5 = pd.DataFrame({'Sales': s_sales, 'Profit': s_profit},
                   index=['Q1', 'Q2', 'Q3', 'Q5'])
df5

Unnamed: 0,Sales,Profit
Q1,250.0,80.0
Q2,300.0,
Q3,400.0,
Q5,,


### 3 – Adding a constant (scalar) column

In [40]:
df5['Currency'] = 'INR'
df5

Unnamed: 0,Sales,Profit,Currency
Q1,250.0,80.0,INR
Q2,300.0,,INR
Q3,400.0,,INR
Q5,,,INR


### Variant C – Creating a DataFrame from a **Nested Dictionary**
> Pattern: `pd.DataFrame({col1: {row1: val1, …}, col2: {…}})`

---


#### 🔹 1. Outer keys ➜ Columns
- Each **outer dictionary key** becomes a **column label**.
- Each **inner dictionary** contains key–value pairs where:
  - **Inner keys** become **row labels (index)**.
  - **Inner values** become **data values** in the respective column.


In [45]:
nested = {'Math': {'Alice': 85, 'Bob': 78},
          'Sci' : {'Bob': 82, 'Cara': 91}}
df6 = pd.DataFrame(nested)
df6

Unnamed: 0,Math,Sci
Alice,85.0,
Bob,78.0,82.0
Cara,,91.0


### Some Operations on Data using Pandas

### 1. merge() – SQL-Style Joins in Pandas
* merge() in pandas is similar to SQL joins (INNER, LEFT, RIGHT, OUTER).
* It combines rows from two DataFrames based on common columns or index.
* Syntax:
* pd.merge(left, right, how='inner', on='key')


* Parameters:
    * left, right: DataFrames to merge
    * how: type of join – 'inner', 'outer', 'left', 'right'
    * on: column name(s) to join on
    * left_on, right_on: join columns in left and right DataFrames if column names differ



In [8]:
import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'Score': [85, 90, 75]})

result = pd.merge(df1, df2, on='ID', how='inner')
result


Unnamed: 0,ID,Name,Score
0,2,Bob,85
1,3,Charlie,90


In [10]:

result = pd.merge(df1, df2, on='ID', how='left')
result

Unnamed: 0,ID,Name,Score
0,1,Alice,
1,2,Bob,85.0
2,3,Charlie,90.0


### Merge on Different Column Names:


In [13]:
df3 = pd.DataFrame({'emp_id': [1, 2, 3], 'dept': ['HR', 'IT', 'Finance']})
result = pd.merge(df1, df3, left_on='ID', right_on='emp_id', how='inner')
result

Unnamed: 0,ID,Name,emp_id,dept
0,1,Alice,1,HR
1,2,Bob,2,IT
2,3,Charlie,3,Finance


In [15]:
result = df1.merge(df2, on='ID', how='inner')
result

Unnamed: 0,ID,Name,Score
0,2,Bob,85
1,3,Charlie,90


### Difference Between pd.merge() and df1.merge(df2)
| Feature            | `pd.merge()`                                       | `df1.merge(df2)`                                |
| ------------------ | -------------------------------------------------- | ----------------------------------------------- |
| **Calling Style**  | Function-style                                     | Method-style (called on a DataFrame)            |
| **First Argument** | Both DataFrames must be passed                     | `df1` is the calling object, `df2` is passed    |
| **Usage Context**  | Useful in scripting, chaining, or functional style | More intuitive in object-oriented style         |
| **Flexibility**    | Slightly more verbose but explicit                 | Cleaner when working with one primary DataFrame |


| Use `pd.merge()` when...                                         | Use `df.merge()` when...                       |
| ---------------------------------------------------------------- | ---------------------------------------------- |
| You want to merge two dataframes equally (no preference of left) | You are merging into a "main" DataFrame (`df`) |
| You’re using chaining or function-style logic                    | You prefer cleaner, object-oriented syntax     |


### join() – Join by Index

* join() is used to combine two DataFrames on index or a key column.
* Default behavior is left join.
* Simpler syntax than merge() when index-based joining is intended.
* Useful when: One DataFrame has meaningful index and the other has a key column or index to match.

In [23]:
df1 = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie']}, index=[1, 2, 3])
df2 = pd.DataFrame({'Score': [90, 85, 88]}, index=[2, 3, 4])

df1

Unnamed: 0,Name
1,Alice
2,Bob
3,Charlie


In [25]:
df2

Unnamed: 0,Score
2,90
3,85
4,88


In [29]:
result = df1.join(df2, how='inner')
result

Unnamed: 0,Name,Score
2,Bob,90
3,Charlie,85


In [35]:
result = df1.join(df2)
result

Unnamed: 0,Name,Score
1,Alice,
2,Bob,90.0
3,Charlie,85.0


In [37]:
df3 = pd.DataFrame({'Age': [25, 30, 22]}, index=[1, 2, 3])
result = df1.join([df2, df3])
result

Unnamed: 0,Name,Score,Age
1,Alice,,25.0
2,Bob,90.0,30.0
3,Charlie,85.0,22.0


### concat() – Stack DataFrames Vertically or Horizontally

* concat() is used to concatenate multiple DataFrames along a particular axis.
* Can be used to stack vertically (axis=0) or side-by-side (axis=1).
* It does not remove duplicates or merge based on key – just sticks DataFrames together.

In [42]:
result = pd.concat([df1, df2], axis=1)
result

Unnamed: 0,Name,Score
1,Alice,
2,Bob,90.0
3,Charlie,85.0
4,,88.0


In [47]:
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [3, 4], 'Name': ['Charlie', 'David']})

result = pd.concat([df1, df2], axis=1)
result

Unnamed: 0,ID,Name,ID.1,Name.1
0,1,Alice,3,Charlie
1,2,Bob,4,David


| Operation  | Method        | Joins On                     | Default Join | Notes                                |
| ---------- | ------------- | ---------------------------- | ------------ | ------------------------------------ |
| `merge()`  | `pd.merge()`  | Column(s) or index           | `inner`      | Most flexible, SQL-like joins        |
| `join()`   | `df1.join()`  | Index (or column with param) | `left`       | Simpler syntax for index-based joins |
| `concat()` | `pd.concat()` | No matching required         | N/A          | Concatenates DataFrames along axis   |
