## 📊 Pandas DataFrame - Explanation

A **DataFrame** is the core data structure in pandas used to store and manipulate **tabular data**, similar to an Excel sheet or a SQL table.

It consists of **rows** and **columns**, where:
- Each **column** is a `Series` (with a name and data type)
- Each **row** represents a single record or observation
- Each **cell** is a value at the intersection of a row and column

### ✅ Key Features:
- **2D labeled structure**: rows and columns both have labels (indexes and column names)
- **Heterogeneous data**: each column can have a different data type (e.g., string, integer, float)
- **Flexible indexing**: rows can be accessed by number (`iloc`) or label (`loc`)
- **Powerful tools**: supports filtering, sorting, grouping, merging, missing data handling, and much more

### 🧠 Analogy:
If a `Series` is like a single column or a single record (e.g., a user's data),
then a `DataFrame` is like a full **table of users** — with multiple records and fields.

DataFrames are the main tool used for:
- Cleaning and preparing data
- Analyzing and summarizing data
- Building datasets for machine learning and statistics


In [None]:
import pandas as pd
import numpy as np
from numpy.random import randn
np.random.seed(100212251) # to get the same random values, but have to run this line each time we create the related dataframe below "df2"

In [None]:
df1 = pd.DataFrame(1, index=['X', 'Y', 'Z'], columns = ['col1', 'col2'])
df1

In [None]:
df1.reset_index()

In [None]:
df2 = pd.DataFrame(randn(3, 3), index=['X', 'Y', 'Z'], columns=['C1', 'C2', 'C3'])
df2['C4'] = df2['C2'] + df2['C3']
print(df2[['C1', 'C2']])

print("-"*20)

print(df2)

print("-"*20)

print(df2.loc['X'])  # loc to get the row

print("-"*20)

print(df2.iloc[2])  # to get the row by num not char, x = 0  y = 1  z = 2

print("-"*20)

print(df2.loc[['X', 'Z']])

print("-"*20)

print(df2.loc[['X', 'Z'], ['C1', 'C2']])

print("-"*20)

print(df2.loc['X', 'C2'])

In [None]:
new_df2 = df2.drop('C1', axis=1)  # axis = 1 if the dropped item is a column and axis = 0 if it's a row
print(new_df2)

print("-"*20)

new_df2 = df2.drop('X', axis=0)  # axis = 1 if the dropped item is a column and axis = 0 if it's a row
print(new_df2)

print("-"*20)

df2.drop('C1', axis=1, inplace=True)  # inplace=True means update df2 by dropping the selected item from df2 itself 
print(df2)

print("-"*20)

df2.drop(['C2', 'C3'], axis=1, inplace=True)  # drop multiple items
print(df2)
