The Goal: Creating New Columns
In Pandas, we create a new column just by assigning a value to a name that doesn't exist yet.


The Dynamic Duo: NumPy vs. Pandas
NumPy (Numerical Python): It is the "Math Engine." It’s designed to handle massive arrays of numbers extremely fast using C code under the hood. However, it’s a bit "raw"—it doesn't have column names or fancy labels. It just sees a grid of numbers.

Pandas: It is built on top of NumPy. It takes those fast NumPy arrays and adds "metadata" (like column names like rating or title and row indices).

Analogy: NumPy is like the raw engine and wheels of a car. Pandas is the dashboard, the steering wheel, and the leather seats that make it easy for a human to drive.

Why did I bring up NumPy for Day 3?
Because when you do "Vectorized" operations in Pandas (like df['rating'] > 8), Pandas is actually handing those numbers down to NumPy to do the math at lightning speed.

In [1]:
import pandas as pd
df = pd.read_csv("games.csv")
df

Unnamed: 0,game_id,title,genre,release_year,units_sold_m,rating
0,1,The Legend of Data,Adventure,2022,15.5,9.5
1,2,Python Quest,RPG,2021,8.2,8.0
2,3,SQL Arena,Strategy,2023,5.4,7.5
3,4,Pandas Paradise,Simulation,2022,12.1,9.0
4,5,Matrix Reloaded,Action,2020,3.3,6.5
5,6,Loop Hero,Indie,2021,2.1,8.5


In [3]:
#creating a new column
df['revenue'] = df['units_sold_m'] * 40
df

Unnamed: 0,game_id,title,genre,release_year,units_sold_m,rating,revenue
0,1,The Legend of Data,Adventure,2022,15.5,9.5,620.0
1,2,Python Quest,RPG,2021,8.2,8.0,328.0
2,3,SQL Arena,Strategy,2023,5.4,7.5,216.0
3,4,Pandas Paradise,Simulation,2022,12.1,9.0,484.0
4,5,Matrix Reloaded,Action,2020,3.3,6.5,132.0
5,6,Loop Hero,Indie,2021,2.1,8.5,84.0


2. If-Else Logic (The "Right" Way)
What if you want to label games as "Bestseller" if they sold more than 10M units, and "Standard" if they didn't?

Instead of a loop, we use np.where. It works like this: np.where(condition, value_if_true, value_if_false)

In [None]:
import numpy as np
df['category'] = np.where(df['units_sold_m'] > 10, 'Bestseller', 'Standard')
df

AttributeError: module 'pandas' has no attribute 'where'