# Pandas Crash Course

Welcome to this crash course on **pandas**, one of the most powerful libraries for data manipulation in Python. By the end of this course, you'll understand the basics of pandas and be able to use it to load, explore, and manipulate datasets.

## Topics Covered

1.  Introduction to Pandas

2.  Pandas Data Structures (Series and DataFrame)

3.  Loading Data into Pandas

4.  Inspecting Data

5.  Selecting Data

6.  Data Manipulation (Adding/Removing Columns)

7.  Filtering Data

8.  Handling Missing Data

9.  Grouping and Aggregating Data

10. Sorting Data

11. Applying Functions to Data

12. Loading Files with Pandas

## 1. Introduction to Pandas

Pandas is a Python library used for working with data. It makes data analysis and manipulation easy by providing powerful, flexible data structures and functions.

To use pandas, you first need to import it:

```python

import pandas as pd

```

In pandas, data is primarily handled using **Series** and **DataFrames**.


## 2. Pandas Data Structures: Series and DataFrame

### **Series**:

A **Series** is a one-dimensional array of data. You can think of it like a list but with additional features like indexing.

**Example**:


In [3]:
import pandas as pd

# Creating a Series
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print("Series:")
print(series)

Series:
0    1
1    2
2    3
3    4
4    5
dtype: int64


### **DataFrame**:

A **DataFrame** is a two-dimensional table of data. It's the most commonly used data structure in pandas.

**Example**:


In [4]:
# Creating a DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"],
}
df = pd.DataFrame(data)
print("DataFrame:")
print(df)

DataFrame:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


## 3. Loading Data into Pandas

You can load data from various sources into a pandas DataFrame. The most common format is a **CSV file**.

**Example**:


In [5]:
# Loading data from a CSV file
# df = pd.read_csv("your_file.csv")  # Uncomment this line to load your own file

# For demonstration, we'll create a small DataFrame instead.
df = pd.DataFrame(
    {
        "Name": ["Alice", "Bob", "Charlie"],
        "Age": [25, 30, 35],
        "City": ["New York", "Los Angeles", "Chicago"],
    }
)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


## 4. Inspecting Data

After loading data, it's important to inspect it. Pandas provides several useful functions:

- `.head()`: Shows the first few rows of the DataFrame.

- `.info()`: Shows information about the DataFrame, such as column names and data types.

- `.describe()`: Shows statistical summary of numerical columns.

**Example**:


In [6]:
# First 5 rows
print("First 5 rows:")
df.head()

First 5 rows:


Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


In [7]:
# Summary of DataFrame
print("\nDataFrame Info:")
df.info()


DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   City    3 non-null      object
dtypes: int64(1), object(2)
memory usage: 204.0+ bytes


In [8]:
# Statistical summary
print("\nStatistical Summary:")
df.describe()


Statistical Summary:


Unnamed: 0,Age
count,3.0
mean,30.0
std,5.0
min,25.0
25%,27.5
50%,30.0
75%,32.5
max,35.0


## 5. Selecting Data

Selecting data in pandas can be done using square brackets `[]`, `.loc[]`, or `.iloc[]`.

**Example**:


In [22]:
# Here is our original DF.
df

Unnamed: 0,Name,Age,City,Age Category
0,Alice,25,New York,Young
1,Bob,30,Los Angeles,Adult
2,Charlie,35,Chicago,Adult


In [24]:
# Selecting a single column
print("Selecting 'Name' column:")
df["Name"]

Selecting 'Name' column:


0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

In [23]:
# Selecting multiple columns
print("\nSelecting 'Name' and 'Age' columns:")
df[["Name", "Age"]]


Selecting 'Name' and 'Age' columns:


Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


In [25]:
# Selecting rows by index using .loc[] and .iloc[]
print("\nSelecting first row (by index):")
df.loc[0]  # loc uses label-based indexing


Selecting first row (by index):


Name               Alice
Age                   25
City            New York
Age Category       Young
Name: 0, dtype: object

In [26]:
print("\nSelecting first row (by position):")
df.iloc[0]  # iloc uses position-based indexing


Selecting first row (by position):


Name               Alice
Age                   25
City            New York
Age Category       Young
Name: 0, dtype: object

## 6. Data Manipulation (Adding/Removing Columns)

You can easily add or remove columns in a DataFrame.

**Example**:


In [29]:
# Adding a new column
df["Salary"] = [50000, 60000, 70000]
print("DataFrame after adding 'Salary' column:")
df

DataFrame after adding 'Salary' column:


Unnamed: 0,Name,Age,City,Age Category,Salary
0,Alice,25,New York,Young,50000
1,Bob,30,Los Angeles,Adult,60000
2,Charlie,35,Chicago,Adult,70000


In [28]:
# Removing a column
df = df.drop(columns=["Salary"])
print("\nDataFrame after removing 'Salary' column:")
df


DataFrame after removing 'Salary' column:


Unnamed: 0,Name,Age,City,Age Category
0,Alice,25,New York,Young
1,Bob,30,Los Angeles,Adult
2,Charlie,35,Chicago,Adult


## 7. Filtering Data

You can filter data based on conditions.

**Example**:


In [30]:
# Filtering rows where 'Age' is greater than 30
filtered_df = df[df["Age"] > 30]
print("Filtered DataFrame (Age > 30):")
filtered_df

Filtered DataFrame (Age > 30):


Unnamed: 0,Name,Age,City,Age Category,Salary
2,Charlie,35,Chicago,Adult,70000


## 8. Handling Missing Data

Pandas provides methods to handle missing data, such as `isna()`, `dropna()`, and `fillna()`.

**Example**:


In [33]:
# Creating a DataFrame with missing values
df_with_na = pd.DataFrame(
    {
        "Name": ["Alice", "Bob", None],
        "Age": [25, None, 35],
        "City": ["New York", "Los Angeles", "Chicago"],
    }
)
print("DataFrame with missing values:")
df_with_na

DataFrame with missing values:


Unnamed: 0,Name,Age,City
0,Alice,25.0,New York
1,Bob,,Los Angeles
2,,35.0,Chicago


In [34]:
# Filling missing values with a default value
df_filled = df_with_na.fillna("Unknown")
print("\nDataFrame after filling missing values:")
df_filled


DataFrame after filling missing values:


Unnamed: 0,Name,Age,City
0,Alice,25.0,New York
1,Bob,Unknown,Los Angeles
2,Unknown,35.0,Chicago


In [35]:
# Dropping rows with missing values
df_dropped = df_with_na.dropna()
print("\nDataFrame after dropping missing values:")
df_dropped


DataFrame after dropping missing values:


Unnamed: 0,Name,Age,City
0,Alice,25.0,New York


## 9. Grouping and Aggregating Data

You can group data based on certain columns and calculate aggregate values.

**Example**:


In [37]:
# Here is our original DF.
df

Unnamed: 0,Name,Age,City,Age Category,Salary
0,Alice,25,New York,Young,50000
1,Bob,30,Los Angeles,Adult,60000
2,Charlie,35,Chicago,Adult,70000


In [36]:
# Grouping by 'City' and calculating the average age
grouped_df = df.groupby("City")["Age"].mean()
print("Average age by city:")
grouped_df

Average age by city:


City
Chicago        35.0
Los Angeles    30.0
New York       25.0
Name: Age, dtype: float64

## 10. Sorting Data

You can sort the data using the `.sort_values()` method.

**Example**:


In [38]:
# Sorting DataFrame by 'Age' in descending order
sorted_df = df.sort_values(by="Age", ascending=False)
print("DataFrame sorted by 'Age':")
sorted_df

DataFrame sorted by 'Age':


Unnamed: 0,Name,Age,City,Age Category,Salary
2,Charlie,35,Chicago,Adult,70000
1,Bob,30,Los Angeles,Adult,60000
0,Alice,25,New York,Young,50000


## 11. Applying Functions to Data

You can apply custom functions to columns or rows using `.apply()`.

**Example**:


In [39]:
# Defining a function to categorize people by age
def categorize_age(age):
    if age < 30:
        return "Young"
    else:
        return "Adult"


# Applying the function to the 'Age' column
df["Age Category"] = df["Age"].apply(categorize_age)
print("DataFrame with 'Age Category':")
df

DataFrame with 'Age Category':


Unnamed: 0,Name,Age,City,Age Category,Salary
0,Alice,25,New York,Young,50000
1,Bob,30,Los Angeles,Adult,60000
2,Charlie,35,Chicago,Adult,70000


## 12. Loading Files with Pandas

Pandas makes it very easy to load data from files, such as CSV and Excel, into a DataFrame—a powerful 2D table of data that allows you to manipulate and analyze the data.

CSV (Comma Separated Values) files are one of the most common formats for storing tabular data. In pandas, you can load a CSV file into a DataFrame using the read_csv() function.

In [19]:
# Load a CSV file into a DataFrame
# df = pd.read_csv("YOUR PATH")

Excel files are a common format for spreadsheets. Pandas can read Excel files using the read_excel() function. You need to install the openpyxl library if it is not installed already, as pandas relies on it for reading .xlsx files.

In [20]:
# Load a Excel file into a DataFrame
# df = pd.read_excel("YOUR PATH")

If the Excel file has multiple sheets, you can specify which sheet to load:

In [21]:
# df = pd.read_excel("YOUR PATH", sheet_name="SHEET NAME")

## Conclusion

This pandas crash course covered:

- Pandas Data Structures (Series and DataFrames)

- Loading, inspecting, and selecting data

- Filtering, grouping, and manipulating data

- Handling missing data and applying custom functions

Pandas is a powerful tool for data analysis. Keep practicing with real datasets to improve your skills!
