
This section teaches how to quickly examine and explore your datasets using:

- `head()` and `tail()`
- `info()`
- `describe()`
- Shape and size
- Column names
- Data types
- Extra advanced inspection tools

These tools help you understand structure, quality, and completeness of your data.


üü¶ 1. Prepare Sample DataFrame

In [1]:
import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva", "Frank"],
    "Age": [24, 19, 22, 28, 32, None],
    "Score": [88, 75, 90, 65, None, 72],
    "City": ["Toronto", "Montreal", "Vancouver", "Calgary", "Toronto", "Calgary"]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Score,City
0,Alice,24.0,88.0,Toronto
1,Bob,19.0,75.0,Montreal
2,Charlie,22.0,90.0,Vancouver
3,David,28.0,65.0,Calgary
4,Eva,32.0,,Toronto
5,Frank,,72.0,Calgary


üü¶ 2. head() and tail()

`head()` and `tail()` help you preview the dataset quickly.

In [3]:
df.head()    # first 5 rows

Unnamed: 0,Name,Age,Score,City
0,Alice,24.0,88.0,Toronto
1,Bob,19.0,75.0,Montreal
2,Charlie,22.0,90.0,Vancouver
3,David,28.0,65.0,Calgary
4,Eva,32.0,,Toronto


In [4]:
df.tail(3)   # last 3 rows

Unnamed: 0,Name,Age,Score,City
3,David,28.0,65.0,Calgary
4,Eva,32.0,,Toronto
5,Frank,,72.0,Calgary


üü¶ 3. info() ‚Äî DataFrame Summary

`info()` shows:
- number of rows
- column names
- non-null values
- data types
- memory usage

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    6 non-null      object 
 1   Age     5 non-null      float64
 2   Score   5 non-null      float64
 3   City    6 non-null      object 
dtypes: float64(2), object(2)
memory usage: 324.0+ bytes


üü¶ 4. describe() ‚Äî Descriptive Statistics

`describe()` computes basic statistics for numeric columns.

In [6]:
df.describe()

Unnamed: 0,Age,Score
count,5.0,5.0
mean,25.0,78.0
std,5.09902,10.700467
min,19.0,65.0
25%,22.0,72.0
50%,24.0,75.0
75%,28.0,88.0
max,32.0,90.0


In [7]:
df.describe(include="all")

Unnamed: 0,Name,Age,Score,City
count,6,5.0,5.0,6
unique,6,,,4
top,Alice,,,Toronto
freq,1,,,2
mean,,25.0,78.0,
std,,5.09902,10.700467,
min,,19.0,65.0,
25%,,22.0,72.0,
50%,,24.0,75.0,
75%,,28.0,88.0,


üü¶ 5. Checking Shape and Size

Useful to quickly understand dataset dimensions.

In [8]:
df.shape       # (rows, columns)

(6, 4)

In [11]:
df.size        # total number of elements

24

In [10]:
df.ndim        # number of dimensions

2

üü¶ 6. Viewing Column Names

In [12]:
df.columns

Index(['Name', 'Age', 'Score', 'City'], dtype='object')

In [13]:
df.index

RangeIndex(start=0, stop=6, step=1)

üü¶ 7. Checking Data Types

In [14]:
df.dtypes

Name      object
Age      float64
Score    float64
City      object
dtype: object

In [15]:
df.dtypes.value_counts()

object     2
float64    2
Name: count, dtype: int64

üü¶ 8. Checking for Missing Values

In [16]:
df.isna().sum()

Name     0
Age      1
Score    1
City     0
dtype: int64

üü¶ 9. Viewing Unique Values & Counts

In [17]:
df["City"].unique()

array(['Toronto', 'Montreal', 'Vancouver', 'Calgary'], dtype=object)

In [18]:
df["City"].value_counts()

City
Toronto      2
Calgary      2
Montreal     1
Vancouver    1
Name: count, dtype: int64

üü¶ 10. Basic Correlation Analysis (Advanced)

In [19]:
df.corr(numeric_only=True)

Unnamed: 0,Age,Score
Age,1.0,-0.425212
Score,-0.425212,1.0


üü¶ 11. Viewing Memory Usage

In [20]:
df.memory_usage(deep=True)

Index    132
Name     322
Age       48
Score     48
City     339
dtype: int64

üü¶ 12. Sampling Random Rows

In [21]:
df.sample(3)

Unnamed: 0,Name,Age,Score,City
0,Alice,24.0,88.0,Toronto
5,Frank,,72.0,Calgary
4,Eva,32.0,,Toronto


# Summary


In this subsection, you learned how to inspect and understand your data using:

### üîç **Previewing**
- `head()` and `tail()`
- `sample()`

### üß± **Structure & Schema**
- `info()`
- `shape`, `size`, `ndim`
- Column names and index
- Data types (`dtypes`)

### üìä **Statistics & Quality**
- `describe()` (numeric + all columns)
- Unique values & counts
- Missing value detection (`isna().sum()`)
- Correlations (`corr()`)

### üóÇ **Advanced Exploration**
- Memory usage (`memory_usage`)
- Dtype distribution (`dtypes.value_counts()`)

