# Module 4: Introduction to Data Analysis
## Lesson 1: Introduction to Pandas

Welcome to the world of data analysis! In this module, you'll learn how to use powerful libraries to work with data. We'll start with **Pandas**, the most popular library for data manipulation and analysis in Python.

### What is Pandas?

Pandas provides high-performance, easy-to-use data structures and data analysis tools. The primary data structure in Pandas is the **DataFrame**.

### What is a DataFrame?

A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it as a spreadsheet, a SQL table, or a dictionary of Series objects.

### Reading Data

One of the most common tasks is to read data from a file. Pandas can read from a variety of file formats, including CSV (Comma-Separated Values). We'll use the `read_csv()` function. We have provided a sample file at `../data/sample_student_data.csv`.

In [None]:
import pandas as pd # It's a standard convention to import pandas as pd

# Load the data from the CSV file into a DataFrame
df = pd.read_csv('../data/sample_student_data.csv')

# Display the first 5 rows of the DataFrame
df.head()

### Exploring the Data

Pandas provides many useful functions to quickly explore your data.

In [None]:
# Get the dimensions of the DataFrame (rows, columns)
print("Shape of the data:", df.shape)

# Get a concise summary of the DataFrame
print("\nInfo:")
df.info()

# Get descriptive statistics for the numerical columns
print("\nDescription:")
df.describe()

### Selecting Data

You can select columns and rows to work with specific parts of your data.

In [None]:
# Select a single column
names = df['Name']
print("The 'Name' column:")
print(names)

# Select multiple columns
name_and_grade = df[['Name', 'Grade']]
print("\nThe 'Name' and 'Grade' columns:")
print(name_and_grade)

### Filtering Data

You can also filter your data based on conditions.

In [None]:
# Find all students who scored 90 or above
high_achievers = df[df['Grade'] >= 90]
print(high_achievers)

### Practice Time!

1. Find the average (mean) grade of all students.
2. Select all students who are 15 years old.

In [None]:
# Write your code here

This is just the beginning of what you can do with Pandas! 

Next, we'll learn how to create visualizations from our data using **Matplotlib**.