# üêº Pandas - Class 2: DataFrame Basics
Welcome to **Class 2**. Today we‚Äôll learn how to **inspect**, **select**, and **organize** data inside a DataFrame.

## 1 Inspecting Data
Quick ways to understand your dataset:
- `head(n)` ‚Üí first *n* rows (default 5)
- `tail(n)` ‚Üí last *n* rows
- `info()` ‚Üí columns, non-null counts, and dtypes
- `shape` ‚Üí (rows, columns)
- `dtypes` ‚Üí data type of each column

In [2]:
import pandas as pd

# Create a small DataFrame
a = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Emma"],
    "Age": [25, 30, 35, 40, 22],
    "Score": [88, 92, 79, 85, 95]
}

b = pd.DataFrame(a)

# 1. See the first rows
print("First 3 rows:")
print(b.head(3))

First 3 rows:
      Name  Age  Score
0    Alice   25     88
1      Bob   30     92
2  Charlie   35     79


In [3]:
# 2. See the last rows
print("\nLast 2 rows:")
print(b.tail(2))


Last 2 rows:
    Name  Age  Score
3  David   40     85
4   Emma   22     95


In [4]:
# 3. Get a concise summary
print("\nInfo about DataFrame:")
print(b.info())


Info about DataFrame:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   Score   5 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 252.0+ bytes
None


In [5]:
# 4. Shape of the DataFrame (rows, columns)
print("\nShape:")
print(b.shape)

# 5. Data types of each column
print("\nData types:")
print(b.dtypes)


Shape:
(5, 3)

Data types:
Name     object
Age       int64
Score     int64
dtype: object


## 2 Selecting Columns & Rows
Ways to access parts of a DataFrame:
- Single column ‚Üí `df['col']`
- Multiple columns ‚Üí `df[['col1','col2']]`
- Label-based selection ‚Üí `df.loc[row_label, col_label]`
- Integer-index selection ‚Üí `df.iloc[row_idx, col_idx]`
Tip: `:` means "all rows/columns" in that dimension.

In [12]:

# 1. Single column
print("Single column (Name):")
print(b["Name"])

# 2. Multiple columns
print("\nMultiple columns (Name & Score):")
print(b[["Name", "Score"]])
print()
print(b.head())

Single column (Name):
0      Alice
1        Bob
2    Charlie
3      David
4       Emma
Name: Name, dtype: object

Multiple columns (Name & Score):
      Name  Score
0    Alice     88
1      Bob     92
2  Charlie     79
3    David     85
4     Emma     95

      Name  Age  Score
0    Alice   25     88
1      Bob   30     92
2  Charlie   35     79
3    David   40     85
4     Emma   22     95


In [16]:
# 3. Label-based selection with .loc
print("\nLabel-based selection (row index 2, column 'Score'):")
print(b.loc[2, "Score"])

print("\nRows 1 to 3 with columns Name & Age:")
print(b.loc[0:3, ["Name", "Age"]])


Label-based selection (row index 2, column 'Score'):
79

Rows 1 to 3 with columns Name & Age:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
3    David   40


In [9]:
# 4. Integer-index selection with .iloc
print("\nRow at index 3 (all columns):")
print(b.iloc[3])

print("\nRows 0 to 2, columns 0 to 1:")
print(b.iloc[0:3, 0:2])


Row at index 3 (all columns):
Name     David
Age         40
Score       85
Name: 3, dtype: object

Rows 0 to 2, columns 0 to 1:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


## 3 Index & Columns Overview
- `df.index` ‚Üí row labels
- `df.columns` ‚Üí column names
- `df.values` ‚Üí underlying NumPy array (read-only; prefer using DataFrame ops)

In [19]:
print(b)

# Row labels (index)
print("\nIndex:")
print(b.index)

# Column names
print("\nColumns:")
print(b.columns)

      Name  Age  Score
0    Alice   25     88
1      Bob   30     92
2  Charlie   35     79
3    David   40     85
4     Emma   22     95

Index:
RangeIndex(start=0, stop=5, step=1)

Columns:
Index(['Name', 'Age', 'Score'], dtype='object')


In [20]:
# Underlying NumPy array
print("\nValues:")
print(b.values)

# Optional: shape of the underlying array
print("\nShape of values:")
print(b.values.shape)


Values:
[['Alice' 25 88]
 ['Bob' 30 92]
 ['Charlie' 35 79]
 ['David' 40 85]
 ['Emma' 22 95]]

Shape of values:
(5, 3)


## 4 Renaming Columns & Setting an Index
- Rename columns ‚Üí `df.rename(columns={'old':'new'}, inplace=False)`
- Set a column as index ‚Üí `df.set_index('column_name', inplace=False)`
- Reset index ‚Üí `df.reset_index(inplace=False)`
 If you don't use `inplace=True`, remember to assign the result back to a variable.

In [24]:

# 1. Rename columns
c = b.rename(columns={"Score": "Marks"})
print("After renaming 'Score' to 'Marks':")
print(c)

# b.rename(columns={"Score": "Marks"}, inplace= True)
# without assigning to any variable it makes changes in original dataframe b

After renaming 'Score' to 'Marks':
      Name  Age  Marks
0    Alice   25     88
1      Bob   30     92
2  Charlie   35     79
3    David   40     85
4     Emma   22     95


In [22]:
# 2. Set a column as index
d = c.set_index("Name")
print("\nAfter setting 'Name' as index:")
print(d)


After setting 'Name' as index:
         Age  Marks
Name               
Alice     25     88
Bob       30     92
Charlie   35     79
David     40     85
Emma      22     95


In [23]:
# 3. Reset the index back to default
e = d.reset_index()
print("\nAfter resetting the index:")
print(e)


After resetting the index:
      Name  Age  Marks
0    Alice   25     88
1      Bob   30     92
2  Charlie   35     79
3    David   40     85
4     Emma   22     95


## 5 Basic Attributes Recap
- `df.columns` ‚Üí names of all columns
- `df.index` ‚Üí row labels
- `df.values` ‚Üí NumPy array (data only)
These are handy for quick checks and loops (though vectorized ops are preferred).

## Mini Practice
1. Create a DataFrame with at least 5 rows and 4 columns (mix of numeric + text).
2. Show the first 3 rows, last 2 rows, and the info.


In [25]:
import pandas as pd

# Create the DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Emma"],
    "Age": [25, 30, 35, 40, 22],
    "City": ["Delhi", "Mumbai", "Pune", "Bangalore", "Chennai"],
    "Score": [85, 91, 78, 88, 95]
}

df = pd.DataFrame(data)

# 1. Show the first 3 rows, last 2 rows, and info
print("First 3 rows:")
print(df.head(3))

print("\nLast 2 rows:")
print(df.tail(2))

print("\nInfo about DataFrame:")
print(df.info())

First 3 rows:
      Name  Age    City  Score
0    Alice   25   Delhi     85
1      Bob   30  Mumbai     91
2  Charlie   35    Pune     78

Last 2 rows:
    Name  Age       City  Score
3  David   40  Bangalore     88
4   Emma   22    Chennai     95

Info about DataFrame:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   City    5 non-null      object
 3   Score   5 non-null      int64 
dtypes: int64(2), object(2)
memory usage: 292.0+ bytes
None


## 3. Select a sub-DataFrame with only 2 columns.

In [26]:
# 2. Select a sub-DataFrame with only 2 columns (Name and Score)
sub_df = df[["Name", "Score"]]
print("\nSub-DataFrame with Name and Score:")
print(sub_df)


Sub-DataFrame with Name and Score:
      Name  Score
0    Alice     85
1      Bob     91
2  Charlie     78
3    David     88
4     Emma     95


## 4. Rename one column and set any column as index, then reset it back.


In [29]:
# 3. Rename one column and set any column as index, then reset it back
df_renamed = df.rename(columns = {"Score":"Marks"})
print("\nAfter renaming Score to Marks:")
print(df_renamed)

df_indexed = df_renamed.set_index("Name")
print("\nAfter setting Name as index:")
print(df_indexed)

df_reset = df_indexed.reset_index()
print("\nAfter resetting the index:")
print(df_reset)


After renaming Score to Marks:
      Name  Age       City  Marks
0    Alice   25      Delhi     85
1      Bob   30     Mumbai     91
2  Charlie   35       Pune     78
3    David   40  Bangalore     88
4     Emma   22    Chennai     95

After setting Name as index:
         Age       City  Marks
Name                          
Alice     25      Delhi     85
Bob       30     Mumbai     91
Charlie   35       Pune     78
David     40  Bangalore     88
Emma      22    Chennai     95

After resetting the index:
      Name  Age       City  Marks
0    Alice   25      Delhi     85
1      Bob   30     Mumbai     91
2  Charlie   35       Pune     78
3    David   40  Bangalore     88
4     Emma   22    Chennai     95


## 5. Print `df.shape`, `df.dtypes`, and `df.columns`.

In [28]:
# 4. Print df.shape, df.dtypes, and df.columns
print("\nShape of original DataFrame:")
print(df.shape)

print("\nData types of each column:")
print(df.dtypes)

print("\nColumn names:")
print(df.columns)


Shape of original DataFrame:
(5, 4)

Data types of each column:
Name     object
Age       int64
City     object
Score     int64
dtype: object

Column names:
Index(['Name', 'Age', 'City', 'Score'], dtype='object')


---
## ‚úÖ Summary
- You learned to **inspect** a DataFrame (`head`, `tail`, `info`, `shape`, `dtypes`).
- You practiced **selecting** columns/rows using `[]`, `.loc`, and `.iloc`.
- You explored **index/columns** and basic attributes.
- You **renamed** columns and **set/reset** an index.

Next up: **Data Cleaning Essentials** (missing values, types, duplicates).