### 1. What is a DataFrame?
A DataFrame is the core data structure in Pandas — a 2-dimensional, size-mutable, heterogeneous tabular data structure with labeled axes (rows and columns).

Think of it as an in-memory Excel spreadsheet with built-in high-performance operations.

In [33]:
## Creating a DataFrame
##  From a dictionary

import pandas as pd

data = {
    'Employee': ['Alice', 'Bob', 'Charlie'],
    'Department': ['HR', 'Finance', 'IT'],
    'Salary': [50000, 60000, 70000]
}

df = pd.DataFrame(data)
print(df)


  Employee Department  Salary
0    Alice         HR   50000
1      Bob    Finance   60000
2  Charlie         IT   70000


### Basic Attributes & Metadata

| Attribute    | Description               | Example           |
| ------------ | ------------------------- | ----------------- |
| `df.shape`   | Returns (rows, columns)   | `(3, 3)`          |
| `df.columns` | List of column labels     | `Index([...])`    |
| `df.index`   | Index object (row labels) | `RangeIndex(...)` |
| `df.dtypes`  | Data type of each column  | `object, int64`   |
| `df.info()`  | Metadata summary          | -                 |


### Viewing Data
| Method          | Description                    | Example         |
| --------------- | ------------------------------ | --------------- |
| `df.head(n)`    | First n rows                   | `df.head(5)`    |
| `df.tail(n)`    | Last n rows                    | `df.tail(5)`    |
| `df.sample(n)`  | Random n rows                  | `df.sample(2)`  |
| `df.describe()` | Summary statistics (numerical) | `df.describe()` |


### Selection and Filtering

In [10]:

## Column Access
print(df['Employee'])
print(df[['Employee', 'Salary']])



0      Alice
1        Bob
2    Charlie
Name: Employee, dtype: object
  Employee  Salary
0    Alice   50000
1      Bob   60000
2  Charlie   70000


In [11]:
### Row Access
print(df.iloc[0])       # By position
print(df.loc[0]) 

Employee      Alice
Department       HR
Salary        50000
Name: 0, dtype: object
Employee      Alice
Department       HR
Salary        50000
Name: 0, dtype: object


In [12]:
### Boolean Filtering

df[df['Salary'] > 55000]


Unnamed: 0,Employee,Department,Salary
1,Bob,Finance,60000
2,Charlie,IT,70000


### Adding / Modifying Columns

In [13]:
# Adding a new column
df['Bonus'] = df['Salary'] * 0.10

# Modifying an existing column
df['Salary'] = df['Salary'] + 5000

df

Unnamed: 0,Employee,Department,Salary,Bonus
0,Alice,HR,55000,5000.0
1,Bob,Finance,65000,6000.0
2,Charlie,IT,75000,7000.0


###  Dropping Rows or Columns

In [14]:
df.drop(columns='Bonus', inplace=True)     # Drop column
df.drop(index=1, inplace=True)             # Drop row by index
df


Unnamed: 0,Employee,Department,Salary
0,Alice,HR,55000
2,Charlie,IT,75000


### Sorting

In [30]:
print(df.sort_values(by='Salary', ascending=False))
print(df.sort_values(by='Salary', ascending=True))

  Employee Department  Salary
2  Charlie         IT   75000
0    Alice         HR   55000
  Employee Department  Salary
0    Alice         HR   55000
2  Charlie         IT   75000


### Aggregations and Grouping

In [36]:
df.groupby('Department')['Salary'].mean()     # Average salary per department


Department
Finance    60000.0
HR         50000.0
IT         70000.0
Name: Salary, dtype: float64

### Missing Data Handling

In [68]:
df['Bonus'] = 0
df

Unnamed: 0,Employee,Department,Salary,Bonus
0,Alice,HR,50000,0
1,Bob,Finance,60000,0
2,Charlie,IT,70000,0


In [69]:

print(df.isnull())                # Detect nulls
print(df.dropna())                # Drop rows with nulls
print(df.fillna(0))               # Replace nulls with 0


   Employee  Department  Salary  Bonus
0     False       False   False  False
1     False       False   False  False
2     False       False   False  False
  Employee Department  Salary  Bonus
0    Alice         HR   50000      0
1      Bob    Finance   60000      0
2  Charlie         IT   70000      0
  Employee Department  Salary  Bonus
0    Alice         HR   50000      0
1      Bob    Finance   60000      0
2  Charlie         IT   70000      0


In [None]:
print(df.isnull())                # Detect nulls
print(df.dropna())                # Drop rows with nulls
print(df.fillna(0))  

### Merge, Join, and Concatenate

In [74]:
df1 = pd.DataFrame({   
    'ID': [1, 2, 3],
    'Name': ['Alice', 'Bob', 'Charlie']
})
df2 = pd.DataFrame({
    'ID': [2, 3, 4],
    'Salary': [60000, 70000, 80000]
})

In [75]:
print(pd.concat([df1, df2]))                       # Stack vertically
print(pd.merge(df1, df2, on='ID'))                   # SQL-style join


   ID     Name   Salary
0   1    Alice      NaN
1   2      Bob      NaN
2   3  Charlie      NaN
0   2      NaN  60000.0
1   3      NaN  70000.0
2   4      NaN  80000.0
   ID     Name  Salary
0   2      Bob   60000
1   3  Charlie   70000


### Exporting & Importing

In [None]:
df.to_csv('output.csv', index=False)          # Export to CSV
              # Import from CSV


In [78]:
df = pd.read_csv('output.csv')  
df

Unnamed: 0,Employee,Department,Salary,Bonus
0,Alice,HR,50000,1000.0
1,Bob,Finance,60000,
2,Charlie,IT,70000,0.0


In [79]:
print(df.isnull())                # Detect nulls
print(df.dropna())                # Drop rows with nulls
print(df.fillna(0))  

   Employee  Department  Salary  Bonus
0     False       False   False  False
1     False       False   False   True
2     False       False   False  False
  Employee Department  Salary   Bonus
0    Alice         HR   50000  1000.0
2  Charlie         IT   70000     0.0
  Employee Department  Salary   Bonus
0    Alice         HR   50000  1000.0
1      Bob    Finance   60000     0.0
2  Charlie         IT   70000     0.0


###  Advanced Operations (Used in ML & Gen AI)
| Operation                    | Use Case                                       |
| ---------------------------- | ---------------------------------------------- |
| `df.apply(func)`             | Row/column-wise transformations                |
| `df.map(func)`               | Element-wise mapping (Series only)             |
| `df.pivot_table()`           | Summary aggregation                            |
| `df.corr()`                  | Correlation matrix (used in feature selection) |
| `df.query("Salary > 60000")` | SQL-like querying                              |


### Summary
| Capability                   | Business Value                                       |
| ---------------------------- | ---------------------------------------------------- |
| Fast columnar operations     | Efficient data wrangling and transformation          |
| Native missing data handling | Robust pipelines for real-world, incomplete datasets |
| Integration-ready            | Seamless use with NumPy, scikit-learn, TensorFlow    |
