# 📊 Hands-on: Load CSV dataset, Basic Filtering, Data Preview

In this session, you will learn how to:
- Load a CSV file into a Pandas DataFrame
- Preview data to understand its structure
- Perform basic filtering on the data

This is essential when working with real-world data.

---

## 1️⃣ What is CSV?
- CSV (Comma-Separated Values) is a common file format to store tabular data.
- Each line in a CSV represents a row, with columns separated by commas.
- It is widely used for exporting and sharing datasets.

## 2️⃣ Loading CSV files in Pandas
- Use `pd.read_csv()` to load CSV files into a DataFrame.
- Make sure the CSV file is in your current working directory or provide the full path.

In [1]:
# Import pandas
import pandas as pd

## 3️⃣ Loading sample CSV file

Let's load a sample CSV file `sample_data.csv`. It contains columns: `Name`, `Age`, `City`, `Salary`.

In [2]:
# Load CSV file into DataFrame
df = pd.read_csv('sample_data.csv')  # replace with your file path if needed

# Show first 5 rows
print("First 5 rows of the data:")
print(df.head())

First 5 rows of the data:
      Name  Age     City  Salary
0    Alice   28   Mumbai   45000
1      Bob   35    Delhi   52000
2  Charlie   22  Chennai   38000
3    David   40   Mumbai   60000
4      Eva   30    Delhi   48000


### Explanation:
- `pd.read_csv()` reads the CSV file.
- `df.head()` displays the first 5 rows by default (you can specify number like `df.head(10)`).

---

## 4️⃣ Basic Data Preview

You can learn about the data using these commands:

In [3]:
# Display number of rows and columns
print("Shape of the data (rows, columns):", df.shape)

# Display column names
print("Columns in the dataset:", df.columns.tolist())

# Summary statistics for numerical columns
print("Summary statistics for numerical data:")
print(df.describe())

# Display data types of columns
print("Data types of columns:")
print(df.dtypes)

Shape of the data (rows, columns): (10, 4)
Columns in the dataset: ['Name', 'Age', 'City', 'Salary']
Summary statistics for numerical data:
             Age        Salary
count  10.000000     10.000000
mean   29.900000  47700.000000
std     5.258855   7024.560089
min    22.000000  38000.000000
25%    27.250000  42750.000000
50%    29.500000  47500.000000
75%    32.500000  51750.000000
max    40.000000  60000.000000
Data types of columns:
Name      object
Age        int64
City      object
Salary     int64
dtype: object


---
## 5️⃣ Basic Filtering Examples

Filter the data based on conditions, such as Age, Salary, or City.

In [4]:
# Filter rows where Age > 30
age_above_30 = df[df['Age'] > 30]
print("Rows where Age is greater than 30:")
print(age_above_30)

# Filter rows where Salary is greater than 50000
high_salary = df[df['Salary'] > 50000]
print("Rows where Salary is greater than 50000:")
print(high_salary)

# Filter rows where City is 'Delhi'
city_delhi = df[df['City'] == 'Delhi']
print("Rows where City is Delhi:")
print(city_delhi)

Rows where Age is greater than 30:
    Name  Age    City  Salary
1    Bob   35   Delhi   52000
3  David   40  Mumbai   60000
6  Grace   33  Mumbai   55000
9   Jack   31  Mumbai   51000
Rows where Salary is greater than 50000:
    Name  Age    City  Salary
1    Bob   35   Delhi   52000
3  David   40  Mumbai   60000
6  Grace   33  Mumbai   55000
9   Jack   31  Mumbai   51000
Rows where City is Delhi:
     Name  Age   City  Salary
1     Bob   35  Delhi   52000
4     Eva   30  Delhi   48000
7  Hannah   24  Delhi   39000


### Filtering with multiple conditions
- Use `&` for AND, `|` for OR conditions.
- Use parentheses to group conditions.

In [5]:
# Filter rows where Age > 30 AND Salary > 50000
filtered_data = df[(df['Age'] > 30) & (df['Salary'] > 50000)]
print("Rows where Age > 30 AND Salary > 50000:")
print(filtered_data)

# Filter rows where Age < 25 OR City is Mumbai
filtered_data_or = df[(df['Age'] < 25) | (df['City'] == 'Mumbai')]
print("Rows where Age < 25 OR City is Mumbai:")
print(filtered_data_or)

Rows where Age > 30 AND Salary > 50000:
    Name  Age    City  Salary
1    Bob   35   Delhi   52000
3  David   40  Mumbai   60000
6  Grace   33  Mumbai   55000
9   Jack   31  Mumbai   51000
Rows where Age < 25 OR City is Mumbai:
      Name  Age     City  Salary
0    Alice   28   Mumbai   45000
2  Charlie   22  Chennai   38000
3    David   40   Mumbai   60000
6    Grace   33   Mumbai   55000
7   Hannah   24    Delhi   39000
9     Jack   31   Mumbai   51000


---
## 6️⃣ Task for Students
Try these on your own:
1. Load a CSV file named `sample_data.csv`.
2. Display the first 10 rows.
3. Show column names and data types.
4. Find all rows where Salary is less than 40000.
5. Find all rows where City is either 'Mumbai' or 'Chennai'.
6. Filter rows where Age is between 25 and 35 (inclusive).

---

## 7️⃣ MCQs (Multiple Choice Questions)

**Q1:** Which pandas function loads CSV data into a DataFrame?

- a) pd.load_csv() ❌
- b) pd.read_csv() ✅
- c) pd.csv_read() ❌
- d) pd.get_csv() ❌

---

**Q2:** What does `df.head()` show?

- a) First 5 rows of the DataFrame ✅
- b) Last 5 rows of the DataFrame ❌
- c) Summary statistics ❌
- d) Data types of columns ❌

---

**Q3:** How do you filter rows where Age is greater than 30?

- a) `df[df['Age'] > 30]` ✅
- b) `df[df.Age > 30]` ✅
- c) Both a and b ✅
- d) None of the above ❌

---

## ✅ Summary
- Loaded a CSV into a DataFrame
- Previewed data using `head()`, `shape`, `columns`, and `describe()`
- Filtered rows using simple and multiple conditions
- Practiced with simple exercises and tested knowledge with MCQs