# 📊 Pandas Data Loading & Inspection Cheatsheet

This guide demonstrates how to read various data file formats and perform basic exploratory steps using **pandas** in Python.

## 📥 Reading Data from Files

```python
import pandas as pd

# Reading different file formats
df_csv = pd.read_csv('sales_data.csv')
df_excel = pd.read_excel('quarterly_report.xlsx', sheet_name='Q1')
df_json = pd.read_json('api_response.json')
```

| File Type | Method Used |
|-----------|-------------|
| CSV | `pd.read_csv('file.csv')` |
| Excel | `pd.read_excel('file.xlsx')` |
| JSON | `pd.read_json('file.json')` |

## 📚 Accessing Built-in Datasets

```python
import seaborn as sns

# List all available datasets in seaborn
print("Available datasets:", sns.get_dataset_names())

# Load a specific dataset
df_tips = sns.load_dataset('tips')
df_iris = sns.load_dataset('iris')
df_flights = sns.load_dataset('flights')
```

| Source | Method Used |
|--------|-------------|
| Seaborn | `sns.get_dataset_names()` to list, `sns.load_dataset('name')` to load |
| Scikit-learn | `from sklearn.datasets import load_boston` (varies by dataset) |
| Statsmodels | `import statsmodels.api as sm; sm.datasets.get_rdataset('dataset_name')` |

## 🔍 Previewing the Data

```python
print(df.head())     # First 5 rows
print(df.tail())     # Last 5 rows
print(df.sample(10)) # Random 10 rows
```

## 🧠 Getting Data Overview

```python
df.info()       # Column types, null counts
df.describe()   # Statistical summary of numerical columns
df.shape        # Returns (row_count, column_count)
```

## 🛠️ Additional Useful Checks

```python
print("Columns:", df.columns.tolist())    # List all column names
print("Nulls:\n", df.isnull().sum())      # Count of missing values per column
print("Unique:\n", df.nunique())          # Number of unique values per column
```

## 💡 Notes

* Always use `.head()` or `.sample()` before working with unfamiliar data.
* Use `.info()` to detect missing data or unexpected data types early.
* Use `.describe()` to quickly get statistical insights on numeric columns.

## 🚀 Quick Start Examples

### Working with Your Own Data
```python
import pandas as pd

# Load your data
df = pd.read_csv('your_data.csv')

# Quick inspection workflow
print("Dataset shape:", df.shape)
print("\nFirst few rows:")
print(df.head())
print("\nData types and info:")
df.info()
print("\nMissing values:")
print(df.isnull().sum())
print("\nSummary statistics:")
print(df.describe())
```

### Working with Built-in Datasets
```python
import pandas as pd
import seaborn as sns

# Explore available datasets
print("Available datasets:")
for dataset in sns.get_dataset_names():
    print(f"  - {dataset}")

# Load and inspect a dataset
df = sns.load_dataset('tips')
print(f"\nLoaded '{dataset}' dataset:")
print("Shape:", df.shape)
print("\nColumns:", df.columns.tolist())
print("\nFirst few rows:")
print(df.head())

# Identifying missing data
null_counts = df.isnull().sum()
null_percent = (null_counts / len(df)) * 100

# Combine null info into a single DataFrame for better visibility
null_summary = pd.DataFrame({
    'Null Count': null_counts,
    'Null Percentage': null_percent
})

print("\nMissing Data Summary:")
print(null_summary)

___

Seaborn Available datasets:
  - anagrams
  - anscombe
  - attention
  - brain_networks
  - car_crashes
  - diamonds
  - dots
  - dowjones
  - exercise
  - flights
  - fmri
  - geyser
  - glue
  - healthexp
  - iris
  - mpg
  - penguins
  - planets
  - seaice
  - taxis
  - tips
  - titanic