
# STEP 1: Load & Inspect the Data (Complete Pandas Guide)

This notebook demonstrates **ALL possible practical ways** to load and inspect data in Pandas.
It is **interview-ready**, **project-ready**, and suitable for **real-world datasets**.


In [None]:

import pandas as pd
import numpy as np


## 1. Load Data from CSV

In [None]:

# Basic CSV load
# df = pd.read_csv("data.csv")

# CSV with custom separator
# df = pd.read_csv("data.csv", sep=';')

# Handle missing values while loading
# df = pd.read_csv("data.csv", na_values=['NA', 'null', 'N/A'])


## 2. Load Excel Data

In [None]:

# Single sheet
# df = pd.read_excel("data.xlsx")

# Specific sheet
# df = pd.read_excel("data.xlsx", sheet_name='Sheet1')

# All sheets
# dfs = pd.read_excel("data.xlsx", sheet_name=None)


## 3. Load Text / TSV Files

In [None]:

# df = pd.read_table("data.txt")
# df = pd.read_csv("data.tsv", sep='\t')


## 4. Load JSON Data

In [None]:

# df = pd.read_json("data.json")

# For nested JSON
# df = pd.json_normalize(json_data)


## 5. Load Data from SQL

In [None]:

# from sqlalchemy import create_engine
# engine = create_engine("mysql+pymysql://user:password@localhost/db")
# df = pd.read_sql("SELECT * FROM table_name", engine)


## 6. Load Data from URL / API

In [None]:

# df = pd.read_csv("https://example.com/data.csv")


## 7. Load Parquet / Feather (Big Data)

In [None]:

# df = pd.read_parquet("data.parquet")
# df = pd.read_feather("data.feather")


## 8. Load Pickle File

In [None]:

# df = pd.read_pickle("data.pkl")


## 9. Create DataFrame from Python Objects

In [None]:

df = pd.DataFrame({
    "id": [1,2,3,4],
    "name": ["Alice", "Bob", "Charlie", "David"],
    "salary": [50000, 60000, None, 80000],
    "department": ["IT", "HR", "IT", "Finance"]
})
df


## 10. Preview Data

In [None]:

df.head()
df.tail()
df.sample(2)


## 11. Shape & Size

In [None]:

df.shape
len(df)
df.size


## 12. Columns & Index

In [None]:

df.columns
df.index


## 13. Data Types

In [None]:

df.dtypes


## 14. Complete Info Summary

In [None]:

df.info()


## 15. Statistical Summary

In [None]:

df.describe()
df.describe(include='all')


## 16. Unique Values

In [None]:

df['department'].unique()
df['department'].nunique()


## 17. Value Counts

In [None]:

df['department'].value_counts()


## 18. Missing Values Check

In [None]:

df.isnull().sum()
(df.isnull().sum() / len(df)) * 100


## 19. Duplicate Check

In [None]:

df.duplicated()
df.duplicated().sum()


## 20. Memory Usage

In [None]:

df.memory_usage(deep=True)



## âœ… Summary
- `read_*()` methods load data
- `head()`, `sample()` preview data
- `info()` and `describe()` reveal structure & quality
- Missing values and duplicates are detected early
- This step defines your entire cleaning strategy
