First: some examples of both basic element-wise arithmetic and more advanced operations using NumPy:

In [8]:
import numpy as np

#arr = np.array(range(1,6,1))
#arr = np.array(range(1,6))
arr = np.array([1, 2, 3, 4, 5])
arr

array([1, 2, 3, 4, 5])

In [10]:
#1. Basic Arithmetic Operations (element-wise) These are performed element by element:
# Add 10 to each element
print(arr + 10)        # [11 12 13 14 15]

# Subtract 2 from each element
print(arr - 2)         # [-1  0  1  2  3]

# Multiply each element by 3
print(arr * 3)         # [ 3  6  9 12 15]

# Divide each element by 2
print(arr / 2)         # [0.5 1.  1.5 2.  2.5]

# Exponentiation (square each element)
print(arr ** 2)        # [ 1  4  9 16 25]


[11 12 13 14 15]
[-1  0  1  2  3]
[ 3  6  9 12 15]
[0.5 1.  1.5 2.  2.5]
[ 1  4  9 16 25]


In [11]:
#2. More Complicated Operations These use universal functions (ufuncs) in NumPy:
# Trigonometric functions
print(np.sin(arr))     # [0.841 0.909 0.141 -0.757 -0.959]

# Exponential
print(np.exp(arr))     # [2.718 7.389 20.085 54.598 148.413]

# Natural logarithm
print(np.log(arr))     # [0.    0.693 1.098 1.386 1.609]

# Square root
print(np.sqrt(arr))    # [1.    1.414 1.732 2.    2.236]
# Note: All these operations are vectorized — they happen across all elements without writing a loop.


[ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427]
[  2.71828183   7.3890561   20.08553692  54.59815003 148.4131591 ]
[0.         0.69314718 1.09861229 1.38629436 1.60943791]
[1.         1.41421356 1.73205081 2.         2.23606798]


In [17]:
print(np.sin(np.array([0,90]))) 

[0.         0.89399666]


In [22]:
'''
🟦 1. Unary Operations with Label Preservation
Definition: Unary operations are operations applied to one object at a time (e.g., negation -, np.sin, np.sqrt).

Pandas Twist: When you apply such operations to a Pandas DataFrame or Series, the labels (row and column names) are preserved in the result.
✅ Example:
'''
import pandas as pd

df = pd.DataFrame({
    'A': [0, np.pi/2, np.pi],
    'B': [1, 2, 3]
}, index=['x', 'y', 'z'])

print(df)
'''
          A  B
x  0.000000  1
y  1.570796  2
z  3.141593  3
'''


result = np.sin(df)
print(result)
'''
              A         B
x  0.000000e+00  0.841471
y  1.000000e+00  0.909297
z  1.224647e-16  0.141120

#✅ Notice: The original index (x, y, z) and column names (A, B) are preserved.
#That tiny number 1.224647e-16 is essentially zero for practical purposes, just slightly off due to precision errors.
'''
rounded_result = np.round(result, decimals=6)
print(rounded_result)

          A  B
x  0.000000  1
y  1.570796  2
z  3.141593  3
              A         B
x  0.000000e+00  0.841471
y  1.000000e+00  0.909297
z  1.224647e-16  0.141120
     A         B
x  0.0  0.841471
y  1.0  0.909297
z  0.0  0.141120


In [None]:
'''
🟦 2. Binary Operations with Index Alignment
Definition: Binary operations involve two operands (e.g., +, -, *, /).

Pandas Twist: When you perform a binary operation between two Pandas objects (Series or DataFrames), Pandas aligns the data by index (rows and columns). If labels don’t match, it fills with NaN.

✅ Example (Series with different indices):
'''
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([10, 20, 30], index=['b', 'c', 'd'])

result = s1 + s2
print(result)
#✅ Notice: Only matching indices (b, c) were added. Others got NaN.


#✅ Example (DataFrames with mismatched indices):
df1 = pd.DataFrame({'A': [1, 2]}, index=['x', 'y'])
df2 = pd.DataFrame({'A': [10, 20]}, index=['y', 'z'])

print(df1 + df2)
#✅ Index alignment makes combining data from different sources safer and more intuitive than raw NumPy arrays.

a     NaN
b    12.0
c    23.0
d     NaN
dtype: float64
      A
x   NaN
y  12.0
z   NaN


In [26]:
'''
🟦 3. Series vs. DataFrame Operations
Definition: A Series is 1D; a DataFrame is 2D. When combining them, Pandas tries to broadcast the Series across the DataFrame by index or columns, depending on context.

✅ Example (Row-wise broadcast):
'''
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

s = pd.Series([10, 20], index=['A', 'B'])

print(df + s)
print(s)
print(df)


    A   B
0  11  24
1  12  25
2  13  26
A    10
B    20
dtype: int64
   A  B
0  1  4
1  2  5
2  3  6


In [31]:
'''
combining data can be done with NumPy arrays, but:
    -It requires manual alignment (you must make sure the shapes and data match up correctly).
    -There’s no concept of labeled indices or columns in NumPy arrays.
    -It’s error-prone when working with real-world messy data (e.g., from multiple CSVs or APIs).
'''
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Just adds elements by position
print(a + b)  # [5 7 9]
#But what if b had its elements in a different order? NumPy doesn't know or care — it'll just blindly match elements by position.



#⚠️ Example: NumPy doesn't align by meaning, just by position
# Let's say we have some data for three people: Alice, Bob, and Charlie
# But these arrays are not in the same order!

# Array 1: Heights (cm)
heights = np.array([160, 170, 180])  # Alice, Bob, Charlie

# Array 2: Weights (kg), but in the order: Charlie, Alice, Bob
weights = np.array([70, 50, 60])     # Charlie, Alice, Bob

# Adding or doing any operation blindly:
bmi_like = weights / heights
print(bmi_like)
#Output (misaligned data): [0.4375     0.29411765 0.33333333]
'''
You get some result — but it’s meaningless because:
    -70 (Charlie’s weight) was divided by 160 (Alice’s height),
    -50 (Alice’s weight) was divided by 170 (Bob’s height),
    -60 (Bob’s weight) was divided by 180 (Charlie’s height).
So the math ran fine… but the logic is broken.
'''
#✅ The Fix with Pandas (alignment by name)
heights = pd.Series([160, 170, 180], index=['Alice', 'Bob', 'Charlie'])
weights = pd.Series([70, 50, 60], index=['Charlie', 'Alice', 'Bob'])

bmi_like = weights / heights
print(bmi_like)

[5 7 9]
[0.4375     0.29411765 0.33333333]
Alice      0.312500
Bob        0.352941
Charlie    0.388889
dtype: float64


# <b>Chapter 15: Operating on Data in Pandas</b>

## <b>Ufuncs: Index Preservation and Alignment</b>

In [47]:
rng = np.random.default_rng(42)
ser = pd.Series(rng.integers(0, 10, 4))
ser

0    0
1    7
2    6
3    4
dtype: int64

In [127]:
df=pd.DataFrame(rng.integers(0,10,(3,4)), columns=['A','B','C','D'])
df

Unnamed: 0,A,B,C,D
0,6,4,7,6
1,4,6,9,4
2,1,4,8,2


In [128]:
np.exp(ser)

0       1.000000
1    1096.633158
2     403.428793
3      54.598150
dtype: float64

In [131]:
rounded_result = np.round(np.exp(ser), decimals=3)
rounded_result

0       1.000
1    1096.633
2     403.429
3      54.598
dtype: float64

In [132]:
type(rounded_result)

pandas.core.series.Series

In [133]:
np.pi / 4

0.7853981633974483

In [134]:
np.sin(df * np.pi / 4)

Unnamed: 0,A,B,C,D
0,-1.0,1.224647e-16,-0.7071068,-1.0
1,1.224647e-16,-1.0,0.7071068,1.224647e-16
2,0.7071068,1.224647e-16,-2.449294e-16,1.0


In [146]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
'California': 423967}, name='area')
population = pd.Series({'California': 39538223, 'Texas': 29145505,
'Florida': 21538187}, name='population')
area
'''
once a Series has a name, you can absolutely construct a DataFrame without specifying keys again, because Pandas will automatically use the name of each Series as the column name when using something like pd.concat().
Let me show you both ways for clarity:
'''
#✅ With keys (explicit):
df = pd.DataFrame({
    'area': area,
    'population': population
})
# You manually specify the keys 'area' and 'population', which act as column names.
# print(df)

#✅ Without keys (implicit, using name of Series):
df = pd.concat([area, population], axis=1)
# Since each Series has a name, Pandas uses those as column names:
print(df)
'''
🔁 axis=0 → Stack vertically (row-wise)
    -Adds more rows
    -Series/DataFrames are placed on top of each other

📚 axis=1 → Stack horizontally (column-wise)
    -Adds more columns
    -Series/DataFrames are placed side by side, aligning by index (rows)
'''

                 area  population
Alaska      1723337.0         NaN
Texas        695662.0  29145505.0
California   423967.0  39538223.0
Florida           NaN  21538187.0


'\n🔁 axis=0 → Stack vertically (row-wise)\n    -Adds more rows\n    -Series/DataFrames are placed on top of each other\n\n📚 axis=1 → Stack horizontally (column-wise)\n    -Adds more columns\n    -Series/DataFrames are placed side by side, aligning by index (rows)\n'

In [147]:
population / area

Alaska              NaN
California    93.257784
Florida             NaN
Texas         41.896072
dtype: float64

In [148]:
area / population

Alaska             NaN
California    0.010723
Florida            NaN
Texas         0.023869
dtype: float64

In [151]:
# area.index | population.index #TypeError: unsupported operand type(s) for |: 'str' and 'str'.

In [152]:
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])
A + B

0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

In [153]:
A.add(B, fill_value=0)

0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

In [154]:
d = list('abc')
d

['a', 'b', 'c']

In [156]:
A=pd.DataFrame(rng.integers(0,20,(2,2)), columns=list('AB'))
A

Unnamed: 0,A,B
0,18,11
1,12,6


In [169]:
s = A.values
print(s)
print(s.shape)
print(s.mean())
type(s) #numpy.ndarray

[[18 11]
 [12  6]]
(2, 2)
11.75


numpy.ndarray

In [158]:
B=pd.DataFrame(rng.integers(0,10,(3,3)), columns=list('BAC'))
B

Unnamed: 0,B,A,C
0,9,0,2
1,4,4,4
2,1,2,0


In [160]:
B + A

Unnamed: 0,A,B,C
0,18.0,20.0,
1,16.0,10.0,
2,,,


In [162]:
B.add(A, fill_value = 2)

Unnamed: 0,A,B,C
0,18.0,20.0,4.0
1,16.0,10.0,6.0
2,4.0,3.0,2.0


## <b>Ufuncs: Operations Between DataFrame and Series</b>

In [183]:
#we will find the difference of a two-dimensional array and one of its rows:
#A = rng.integers(0, 10, (3, 4))
A = rng.integers(10, size=(3, 4))
# A.shape #(3, 4)
# A.size #12
# type(A) #numpy.ndarray
A

array([[7, 6, 0, 7],
       [3, 7, 5, 8],
       [3, 9, 8, 3]], dtype=int64)

In [184]:
A-A[0]

array([[ 0,  0,  0,  0],
       [-4,  1,  5,  1],
       [-4,  3,  8, -4]], dtype=int64)

In [196]:
df = pd.DataFrame(A, columns=list('QRST'))
df

Unnamed: 0,Q,R,S,T
0,7,6,0,7
1,3,7,5,8
2,3,9,8,3


In [199]:
df - df.iloc[0]

Unnamed: 0,Q,R,S,T
0,0,0,0,0
1,-4,1,5,1
2,-4,3,8,-4


In [202]:
df.subtract(df['R'], axis = 0)

Unnamed: 0,Q,R,S,T
0,1,0,-6,1
1,-4,0,-2,1
2,-6,0,-1,-6


In [203]:
df2 = pd.DataFrame(rng.integers(10, size=(3,3)), columns=list('BNM'))
df2

Unnamed: 0,B,N,M
0,4,5,7
1,6,8,3
2,5,4,9


In [204]:
df2.subtract(df2['M'], axis = 0)

Unnamed: 0,B,N,M
0,-3,-2,0
1,3,5,0
2,-4,-5,0


In [217]:
# df2.iloc[1,1] = 'dd'
'''
Inserts a string 'dd' into a DataFrame that was originally all integers.
At this point, Pandas changes the entire column's dtype to object, because mixed types (int + str) cannot coexist in a numeric dtype.
'''
# df2

Unnamed: 0,B,N,M
0,4,5,7
1,6,dd,3
2,5,4,9


In [221]:
# df2 - df2.iloc[0] #Which means: subtract the first row from every row — element-wise subtraction.
'''
❌ The Problem
When subtracting, Python tries something like:
    'dd' - 7  # or another int
Which is not allowed. You can’t subtract an integer from a string → this raises:
TypeError: unsupported operand type(s) for -: 'str' and 'int'
'''

"\n❌ The Problem\nWhen subtracting, Python tries something like:\n    'dd' - 7  # or another int\nWhich is not allowed. You can’t subtract an integer from a string → this raises:\nTypeError: unsupported operand type(s) for -: 'str' and 'int'\n"

In [205]:
df2.subtract(df2['M'], axis = 1)
'''
You’re trying to subtract a Series (df2['M']) with shape (3,) from a DataFrame with columns ['B', 'N', 'M'], but using axis=1 tells Pandas to align the subtraction by columns, not rows.
So, Pandas tries to match the index of df2['M'] (which is row labels 0, 1, 2) with the column names 'B', 'N', 'M' — and that doesn’t work. There's no overlap, so you get all NaN.
'''

Unnamed: 0,0,1,2,B,M,N
0,,,,,,
1,,,,,,
2,,,,,,


In [206]:
df2 - df2.iloc[0]

Unnamed: 0,B,N,M
0,0,0,0
1,2,3,-4
2,1,-1,2


In [208]:
df

Unnamed: 0,Q,R,S,T
0,7,6,0,7
1,3,7,5,8
2,3,9,8,3


In [213]:
halfrow = df.iloc[0, ::2]
halfrow
# halfrow.shape #(2,)
# type(halfrow) #pandas.core.series.Series

Q    7
S    0
Name: 0, dtype: int64

In [214]:
df - halfrow

Unnamed: 0,Q,R,S,T
0,0.0,,0.0,
1,-4.0,,5.0,
2,-4.0,,8.0,
