## ðŸ’» Initial Data Exploration with Pandas

This notebook demonstrates fundamental techniques for auditing and exploring a dataset using Pandas. It covers:
*   **Loading Data:** Reading a dataset from a CSV file.
*   **Structural Inspection:** Checking dimensions, data types, and memory usage.
*   **Data Preview:** Viewing the first and last rows to understand the data format.
*   **Statistical Summaries:** Generating descriptive statistics for numerical and categorical features.
*   **Data Access:** selecting specific columns, rows, and elements.
*   **Data Auditing:** Identifying missing values (data sparsity).

In [1]:
import pandas as pd

# Load the Iris dataset
df = pd.read_csv('datasets/iris.csv')

# 1. Get a concise summary of the dataframe
print("DataFrame Info:")
df.info() 

# 2. View the first and last 5 rows
print("\nFirst 5 rows:")
display(df.head())

print("\nLast 5 rows:")
display(df.tail())

# 3. Statistical summary (Numerical)
print("\nStatistical Summary (Numerical):")
display(df.describe())

# 4. Statistical summary (All columns, including categorical)
print("\nStatistical Summary (All Columns):")
display(df.describe(include='all'))

# 5. Check dimensions (rows, columns)
print(f"\nShape of the dataset: {df.shape}")

# 6. Accessing Columns
print("\n--- Accessing Columns ---")
print("First 5 rows of 'sepal_length':")
display(df['sepal_length'].head())

print("\nFirst 5 rows of 'sepal_length' and 'species':")
display(df[['sepal_length', 'species']].head())

# 7. Accessing Rows
print("\n--- Accessing Rows ---")
print("Row 0 (by label .loc):")
display(df.loc[0])

print("\nRow 0 (by position .iloc):")
display(df.iloc[0])

print("\nFirst 3 rows (slicing):")
display(df.iloc[0:3])

# 8. Specific element access
val = df.loc[0, 'species']
print(f"\nValue at row 0, column 'species': {val}")

# 9. Checking for Missing Values
print("\n--- Missing Values ---")
display(df.isnull().sum())

DataFrame Info:
<class 'pandas.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    str    
dtypes: float64(4), str(1)
memory usage: 6.0 KB

First 5 rows:


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa



Last 5 rows:


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica
149,5.9,3.0,5.1,1.8,virginica



Statistical Summary (Numerical):


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5



Statistical Summary (All Columns):


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
count,150.0,150.0,150.0,150.0,150
unique,,,,,3
top,,,,,setosa
freq,,,,,50
mean,5.843333,3.057333,3.758,1.199333,
std,0.828066,0.435866,1.765298,0.762238,
min,4.3,2.0,1.0,0.1,
25%,5.1,2.8,1.6,0.3,
50%,5.8,3.0,4.35,1.3,
75%,6.4,3.3,5.1,1.8,



Shape of the dataset: (150, 5)

--- Accessing Columns ---
First 5 rows of 'sepal_length':


0    5.1
1    4.9
2    4.7
3    4.6
4    5.0
Name: sepal_length, dtype: float64


First 5 rows of 'sepal_length' and 'species':


Unnamed: 0,sepal_length,species
0,5.1,setosa
1,4.9,setosa
2,4.7,setosa
3,4.6,setosa
4,5.0,setosa



--- Accessing Rows ---
Row 0 (by label .loc):


sepal_length       5.1
sepal_width        3.5
petal_length       1.4
petal_width        0.2
species         setosa
Name: 0, dtype: object


Row 0 (by position .iloc):


sepal_length       5.1
sepal_width        3.5
petal_length       1.4
petal_width        0.2
species         setosa
Name: 0, dtype: object


First 3 rows (slicing):


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa



Value at row 0, column 'species': setosa

--- Missing Values ---


sepal_length    0
sepal_width     0
petal_length    0
petal_width     0
species         0
dtype: int64