# Pandas Mastery: From Zero to Hero

## Course Overview
This notebook is designed to be a complete introduction to **Pandas**, the most popular data analysis library in Python. 

We will cover:
1.  **Introduction**: What is Pandas and why do we use it?
2.  **Setup**: Installing and importing.
3.  **The Building Blocks**: Series (1D data) and DataFrames (2D data).
4.  **Working with Files**: Reading CSVs.
5.  **Data Analysis**: Inspecting your data with `head`, `tail`, and `info`.

---

## 1. What is Pandas?

**Pandas** is a Python library used for working with data sets. It allows us to analyze, clean, explore, and manipulate data.

### Why is it called "Pandas"?
The name is a play on the phrase "**Pan**el **Da**ta" (an econometrics term for multidimensional data) and "Python Data Analysis".

### Why do we need it?
* **Excel for Python**: Think of Pandas as a programmable version of Excel.
* **Data Cleaning**: Real-world data is messy. Pandas helps us fix missing values and wrong formats.
* **Analysis**: It can quickly calculate averages, max/min values, and correlations.

**Key Concept:** If you want to work with data tables in Python, you use Pandas.

## 2. Getting Started

### Installation
If you haven't installed it yet, you would typically run this in your terminal:
```bash
pip install pandas
```

### Import with Alias
We need to import the library to use it. The standard convention is to import pandas as `pd`.

* **Alias**: An alias is a nickname. Instead of typing `pandas.DataFrame`, we can just type `pd.DataFrame`.
* **Convention**: Almost every Python programmer uses `pd`, so you should too!

In [1]:
# Import pandas with the standard alias
import pandas as pd

# Let's check the version to make sure it loaded correctly
print("Pandas Version:", pd.__version__)

Pandas Version: 2.3.3


## 3. The Core Object: Pandas Series

### What is a Series?
A **Series** is a one-dimensional array holding data of any type (integers, strings, floats, etc.).

**Visualizing it:**
Think of a Series as **a single column** in an Excel sheet.

### Creating a Series
You can create a series easily from a Python list.

In [2]:
# A simple list of numbers
numbers = [10, 20, 30]

# Converting the list into a Pandas Series
my_series = pd.Series(numbers)

print(my_series)

0    10
1    20
2    30
dtype: int64


### Understanding the Output
When you printed `my_series` above, you saw two columns:
1.  **Left Column (0, 1, 2)**: This is the **Index**. It acts as the label or address for the data.
2.  **Right Column (10, 20, 30)**: This is your actual data.

### Custom Index (Labels)
By default, the index starts at 0. But we can give our data names (labels) using the `index` argument.

**Why do this?** It makes accessing data easier. Instead of asking for "item 0", you can ask for "item x".

In [3]:
data = [10, 20, 30]

# Create series with custom labels x, y, and z
labeled_series = pd.Series(data, index=["x", "y", "z"])

print(labeled_series)

# Accessing data using the label
print("Value at label 'y':", labeled_series["y"])

x    10
y    20
z    30
dtype: int64
Value at label 'y': 20


### Key/Value Objects (Dictionaries)
If you have a Python dictionary, you can turn it directly into a Series. The dictionary **keys** automatically become the Series **index**.

In [4]:
calories_dict = {"Day1": 420, "Day2": 380, "Day3": 390}

diet_series = pd.Series(calories_dict)
print(diet_series)

Day1    420
Day2    380
Day3    390
dtype: int64


--- 
### üìù Exercise 1: Series
**Task:**
1. Create a list of 3 items representing prices: `[5.50, 10.00, 2.50]`.
2. Create a Pandas Series from this list.
3. Give it an index of `["Burger", "Pizza", "Soda"]`.
4. Print the price of the "Pizza".

In [7]:
# --- WRITE YOUR CODE BELOW THIS LINE ---
prices = [5.50, 10.00, 2.50]
my_series = pd.Series(prices, index=["Burger", "Pizza", "Soda"])
print(my_series[1])


10.0


  print(my_series[1])


## 4. The Main Event: Pandas DataFrames

### What is a DataFrame?
A **DataFrame** is a 2-dimensional data structure. It is essentially a table with rows and columns.

* **Series** = One single column.
* **DataFrame** = A collection of Series (the whole table).

### Creating a DataFrame
The most common way to create a DataFrame manually is using a Dictionary of Lists.

In [8]:
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

# Convert the dictionary into a DataFrame
df = pd.DataFrame(data)

print(df)

   calories  duration
0       420        50
1       380        40
2       390        45


### Locating Rows (`loc`)
To access a row, we use the `.loc[]` attribute. This stands for **location**.

1.  **Single Row:** `df.loc[0]` returns the first row as a Series.
2.  **Multiple Rows:** `df.loc[[0, 1]]` returns a new DataFrame containing only those rows.

In [9]:
# Return row 0
print("--- Row 0 ---")
print(df.loc[0])

# Return row 0 and 1 using a list of indexes
print("\n--- Rows 0 and 1 ---")
print(df.loc[[0, 1]])

--- Row 0 ---
calories    420
duration     50
Name: 0, dtype: int64

--- Rows 0 and 1 ---
   calories  duration
0       420        50
1       380        40


### Named Indexes
Just like with Series, we can name the rows in our DataFrame so we don't have to use numbers like 0, 1, 2.

In [10]:
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

# Assigning custom labels for the index
df = pd.DataFrame(data, index = ["Monday", "Tuesday", "Wednesday"])

print(df)

# Now we can find data using the name
print("\nData for Monday:")
print(df.loc["Monday"])

           calories  duration
Monday          420        50
Tuesday         380        40
Wednesday       390        45

Data for Monday:
calories    420
duration     50
Name: Monday, dtype: int64


--- 
### üìù Exercise 2: DataFrames
**Task:**
1. Create a dictionary `student_data` with keys `"Name"` (values: "Alice", "Bob") and `"Age"` (values: 24, 27).
2. Convert it into a DataFrame.
3. Use `.loc[0]` to print the details of the first student.

In [17]:
# --- WRITE YOUR CODE BELOW THIS LINE ---
student_data = {
    "Name": ["Alice", "Bob"],
    "Age": [24, 27]
}
df = pd.DataFrame(student_data)
print(df.loc[0])

Name    Alice
Age        24
Name: 0, dtype: object


## 5. Working with CSV Files

In the real world, you won't type data manually. You will load it from files. The most common format is **CSV** (Comma Separated Values).

### 5.1 Creating a Practice File
First, run the code below. It will create a dummy file named `gym_data.csv` so we have something to practice with.

In [13]:
# This creates a file named 'gym_data.csv' in your current folder
csv_text = """
Duration,Pulse,Maxpulse,Calories
60,110,130,409.1
60,117,145,479.0
60,103,135,340.0
45,109,175,282.4
45,117,148,406.0
60,102,127,300.0
"""

with open("gym_data.csv", "w") as f:
    f.write(csv_text.strip())

print("File 'gym_data.csv' created successfully.")

File 'gym_data.csv' created successfully.


### 5.2 Reading the CSV
Use `pd.read_csv()` to load the file into a DataFrame.

In [14]:
df = pd.read_csv('gym_data.csv')

print(df)

   Duration  Pulse  Maxpulse  Calories
0        60    110       130     409.1
1        60    117       145     479.0
2        60    103       135     340.0
3        45    109       175     282.4
4        45    117       148     406.0
5        60    102       127     300.0


### 5.3 Displaying Large DataFrames
If a DataFrame is huge, Pandas will truncate the middle rows and only show the top and bottom. 

* **`to_string()`**: Forces Python to print the *entire* DataFrame (be careful if the file is massive!).
* **`pd.options.display.max_rows`**: This setting controls the threshold. If your DataFrame has more rows than this number, it gets truncated.

In [15]:
# Force print everything
print(df.to_string())

# Check system limit
print("Max rows limit:", pd.options.display.max_rows)

   Duration  Pulse  Maxpulse  Calories
0        60    110       130     409.1
1        60    117       145     479.0
2        60    103       135     340.0
3        45    109       175     282.4
4        45    117       148     406.0
5        60    102       127     300.0
Max rows limit: 60


## 6. Analyzing Data

Once the data is loaded, we need to inspect it to understand what we are working with.

### The `head()` Method
This is the most used command in Pandas. It shows the **top** rows.
* `df.head()` -> Shows the first 5 rows (default).
* `df.head(10)` -> Shows the first 10 rows.

In [18]:
df = pd.read_csv('gym_data.csv')

print("--- Top 2 Rows ---")
print(df.head(2))

--- Top 2 Rows ---
   Duration  Pulse  Maxpulse  Calories
0        60    110       130     409.1
1        60    117       145     479.0


### The `tail()` Method
Just like head, but shows the **bottom** (last) rows.

In [19]:
print("--- Bottom 2 Rows ---")
print(df.tail(2))

--- Bottom 2 Rows ---
   Duration  Pulse  Maxpulse  Calories
4        45    117       148     406.0
5        60    102       127     300.0


### The `info()` Method
`df.info()` is essentially a dashboard for your data. It tells you:
1.  **Entries**: How many rows?
2.  **Columns**: What are the column names?
3.  **Non-Null**: Are any values missing? (This is critical for data cleaning).
4.  **Dtype**: What is the data type? (int64 for numbers, object for text, etc.).

In [20]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Duration  6 non-null      int64  
 1   Pulse     6 non-null      int64  
 2   Maxpulse  6 non-null      int64  
 3   Calories  6 non-null      float64
dtypes: float64(1), int64(3)
memory usage: 324.0 bytes
None


--- 
### üìù Exercise 3: Analysis
**Task:**
1. Load `gym_data.csv` into a variable called `my_data`.
2. Print the first 3 rows using `head()`.
3. Run `info()` to see if there are any null values.

In [22]:
# --- WRITE YOUR CODE BELOW THIS LINE ---
df = pd.read_csv('gym_data.csv')
print(df.head(3))
print(df.info())

   Duration  Pulse  Maxpulse  Calories
0        60    110       130     409.1
1        60    117       145     479.0
2        60    103       135     340.0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Duration  6 non-null      int64  
 1   Pulse     6 non-null      int64  
 2   Maxpulse  6 non-null      int64  
 3   Calories  6 non-null      float64
dtypes: float64(1), int64(3)
memory usage: 324.0 bytes
None


## 7. Course Summary

Great job! You now have a solid foundation in Pandas.

**Recap:**
1.  **Setup**: `import pandas as pd`.
2.  **Series**: 1D data (a column) with an index.
3.  **DataFrame**: 2D data (a table) made of Series.
4.  **CSV**: Use `read_csv` to get data in, and `to_string` to see it all.
5.  **Inspection**: Use `head()` for a quick look and `info()` for a deep dive.

**Next Steps:** Try creating your own CSV file in Excel, save it, and try to open it here using `read_csv`!