#  Data Science Learning Journey  
*Curiosity to Capability — One Notebook at a Time*

---
Compiled and authored by **Partho Sarothi Das**   
	Dhaka, Bangladesh  
	Bachelor's & Master's in Statistics  
	Investment Banking Professional → Aspiring Data Scientist 
    
---

In [2]:
import pandas as pd
import numpy as np

# Vectorized operations

### Definition:

Instead of processing one element at a time, vectorized operations apply a function to all elements simultaneously, using optimized low-level code (often C or Cython under the hood).

### Example (Python list vs pandas Series):

In [6]:
# Vectorized way (pandas/NumPy):

s = pd.Series([1, 2, 3, 4, 5])
result = s + 10
print(result)

0    11
1    12
2    13
3    14
4    15
dtype: int64


In [7]:
# Non-vectorized way (using loop):

lst = [1, 2, 3, 4, 5]
result = []
for i in lst:
    result.append(i + 10)
print(result)

[11, 12, 13, 14, 15]


### Usefulness of Vectorized Operations?

| Feature       | Benefit                                  |
| ------------- | ---------------------------------------- |
|  **Speed**       | Much faster (uses optimized C backend)   |
|  **Simplicity**  | Cleaner and more readable code           |
|  **Performance** | Avoids Python-level loops                |
|  **Reliability** | Handles missing values (like `NaN`) well |


### Examples: Add/Subtract/Multiply:

In [10]:
df = pd.DataFrame({
    'fruits': ['apple','banana','orange'],
    'price':[400,100,300]
})
df

Unnamed: 0,fruits,price
0,apple,400
1,banana,100
2,orange,300


In [11]:
# Add
df['price'] = df['price'] + 10
df

Unnamed: 0,fruits,price
0,apple,410
1,banana,110
2,orange,310


In [12]:
# Subtraction
df['price'] = df['price'] - 10
df

Unnamed: 0,fruits,price
0,apple,400
1,banana,100
2,orange,300


In [13]:
# Multiple

df['price'] = df['price'] * 1.1 # Increase all prices by 10%
df

Unnamed: 0,fruits,price
0,apple,440.0
1,banana,110.0
2,orange,330.0


### Examples: String operations

In [15]:
df['fruits'] = df['fruits'].str.upper()
df

Unnamed: 0,fruits,price
0,APPLE,440.0
1,BANANA,110.0
2,ORANGE,330.0


### Condition checking:

In [17]:
s = pd.Series([5, 10, 15])
s > 7

0    False
1     True
2     True
dtype: bool

# String operations

### Accessing ---> .str <--- Methods

In pandas, string operations are primarily performed using the .str accessor, which allows vectorized string functions on Series of string values. These operations are powerful and efficient for text processing, data cleaning, and feature engineering.

### Convert to lowercase ---> str.lower( )

In [21]:
s = pd.Series(['apple', 'banana', 'cherry', 'Banana', None])

s.str.lower()

0     apple
1    banana
2    cherry
3    banana
4      None
dtype: object

### Convert to uppercase ---> str.upper()

In [23]:
s.str.upper()

0     APPLE
1    BANANA
2    CHERRY
3    BANANA
4      None
dtype: object

### Capitalize first letter ---> str.capitalize()

In [25]:
s.str.capitalize()

0     Apple
1    Banana
2    Cherry
3    Banana
4      None
dtype: object

### Strip whitespace ---> str.strip()

In [27]:
s = pd.Series([' apple', 'banana  ', 'cherry', '  Banana', None])
s.str.strip()

0     apple
1    banana
2    cherry
3    Banana
4      None
dtype: object

### Length of strings ---> str.len()

In [29]:
s = pd.Series(['apple', 'banana', 'cherry', 'Banana', None])
s.str.len()

0    5.0
1    6.0
2    6.0
3    6.0
4    NaN
dtype: float64

### Replace substrings ---> str.replace(old, new)

In [31]:
s.str.replace('a', '@')

0     @pple
1    b@n@n@
2    cherry
3    B@n@n@
4      None
dtype: object

### Contains substring ---> str.contains(pat)

In [33]:
s.str.contains('a')

0     True
1     True
2    False
3     True
4     None
dtype: object

### Starts with ---> str.startswith(pat)

In [35]:
df = pd.DataFrame({
    'fruits': ['apple','banana','orange'],
    'price':[400,100,300]
})
df

df['fruits'].str.startswith('b')

0    False
1     True
2    False
Name: fruits, dtype: bool

### Ends with ---> str.endswith(pat)

In [37]:
df['fruits'].str.endswith('e')

0     True
1    False
2     True
Name: fruits, dtype: bool

### Find substring index ---> str.find(pat)

In [39]:
df['fruits'].str.find('p')

0    1
1   -1
2   -1
Name: fruits, dtype: int64

### Get character at position --->  str.get(i)

In [41]:
df['fruits'].str.get(0)

0    a
1    b
2    o
Name: fruits, dtype: object

### Repeat strings --->  str.repeat(n)

In [43]:
s = pd.Series(['apple', 'banana', 'cherry', 'Banana', None])
s.str.repeat(2)

0      appleapple
1    bananabanana
2    cherrycherry
3    BananaBanana
4            None
dtype: object

### Pad strings ---> str.pad(width, side='left', fillchar=' ')

In [45]:
s.str.pad(10).to_list()

['     apple', '    banana', '    cherry', '    Banana', None]

### Split strings ---> str.split(sep)

In [47]:
s.str.split(',')

0     [apple]
1    [banana]
2    [cherry]
3    [Banana]
4        None
dtype: object

### Join lists into string ---> str.join(iterable)

In [49]:
s.str.join('-')

0      a-p-p-l-e
1    b-a-n-a-n-a
2    c-h-e-r-r-y
3    B-a-n-a-n-a
4           None
dtype: object

In [50]:
df['fruits'].str.join('-')

0      a-p-p-l-e
1    b-a-n-a-n-a
2    o-r-a-n-g-e
Name: fruits, dtype: object

### Extract with regex ---> str.extract(pattern)

In [52]:
emails = pd.Series(['abc@gmail.com', 'xyz@hotmail.com', 'test@yahoo.com'])

emails.str.extract(r'@(\w+)\.com')
# Output: DataFrame with email providers

Unnamed: 0,0
0,gmail
1,hotmail
2,yahoo


### Count substring occurrences ---> str.count(pat)

In [54]:
s.str.count('a')

0    1.0
1    3.0
2    0.0
3    3.0
4    NaN
dtype: float64

### Match full regex --->  str.match(pat)

In [56]:
s.str.match('^[a-z]+$')

0     True
1     True
2     True
3    False
4     None
dtype: object