# Map, Reduce and Filter

**Map**

- maps a collection to another collection object based on certain function

- map(function,iterable_object)

- Example:
  
    l = ['Apple', 'Banana']
  
    map(lambda x: x.upper(),l)

In [1]:
l1 = [1,2,3,4]

In [3]:
l2 = list(map(lambda x:x**2,l1))

In [4]:
l2

[1, 4, 9, 16]

**Filter**

- looks for a condition and then returns only those elements from the collection that satisfies the condition.

- filter(func, data)

**Reduce**

- breaks the entire process into pair-wise operations and uses the result from each operation, with the successive element

- reduce(func,data)

In [5]:
l3 = list(map(lambda x: 3.14*x**2, [1,2,3,4,5]))

In [6]:
l3

[3.14, 12.56, 28.26, 50.24, 78.5]

In [4]:
list(filter(lambda x: x>10,[1,1,2,32,12,3]))

[32, 12]

In [6]:
# getting rid of null using filter

list(filter(None, ['Deepika','Hello',0,0.0,'',1,2,3]))

['Deepika', 'Hello', 1, 2, 3]

- reduce has been deprecated from python

- we need to use a function called functools

In [7]:
from functools import reduce

In [8]:
reduce(lambda x,y: x*y, [1,2,3,4,5])

120

### When to use what

**Map**

- to map function with every element of a list or any iterable

- similar to loop but simpler one

**Filter**

- to filter out data which we want

- a combination of loop and conditional statement

**Reduce**

- reduces the list into one single output

**Something useful which i found from chatgpt**

- Prompt: okay, you are a data scientist, on day basis in which type of situations do you use these functions. seperately and all at once

- Output

🗺️ 1. map() → transforming or cleaning data

When I use it:

Whenever I need to apply the same operation to every item in a dataset column.

💼 Real example:

I have a dataset of customer names:
    
["alice smith", "bob JOHNSON", "carol King"]
    
I want all names to look consistent (proper case).
    
I use map() to apply a format_name() function to every name.

✅ Use case: Data cleaning, feature transformation, text formatting, unit conversions
    
🧠 Example in projects:

Converting all temperatures from Fahrenheit to Celsius

Lowercasing all product reviews before doing NLP

Applying log transformation to a numeric feature

🔍 2. filter() → selecting relevant data

When I use it:
    
Whenever I need to keep only certain rows based on a condition.

💼 Real example:

In an e-commerce dataset, I only want orders worth more than ₹1000.
    
I use filter() with a condition that checks if the order amount > 1000.

✅ Use case: Data filtering, removing noise, selecting high-quality samples
    
🧠 Example in projects:

Keeping customers who made at least 2 purchases

Filtering out missing or invalid entries

Selecting tweets that contain a certain keyword

➕ 3. reduce() → aggregating or summarizing data

When I use it:
    
When I need to combine multiple values into one — like summing, averaging, or merging.

💼 Real example:

After cleaning data, I want the total revenue of all orders.
    
I use reduce() with a function that adds up all order amounts.

✅ Use case: Calculating totals, combining results, computing cumulative metrics
    
🧠 Example in projects:

Total revenue, total clicks, total time spent by users

Merging multiple dictionaries (e.g., user data from different sources)

Computing aggregate statistics when pandas isn’t available

🔁 Using all three together — the real magic
    
⚙️ Scenario: Daily data processing pipeline

Imagine I’m analyzing daily sales data for an online store.

Here’s how I might use all three:

map() → clean & transform
    
Convert each order amount from dollars to rupees.

filter() → keep relevant data
    
Keep only orders above ₹1000 (ignore small ones).

reduce() → summarize
    
Add them all up to get the total high-value sales for the day.

So, the flow looks like this:
    
raw data → map (clean) → filter (select) → reduce (summarize)

## File Handling

- Basic functions:
    - open(), read(), close()

- when you open a file: a session is created and it will be active till .close() is called

**Reading a file**

In [17]:
file = open('Demo_file_handling.txt', 'r')
for line in file:
    print(line)
file.seek(0)
print(file.read())
# open(
#     file,
#     mode='r',
#     buffering=-1,
#     encoding=None,
#     errors=None,
#     newline=None,
#     closefd=True,
#     opener=None,
# )

hi everyone,

This is Deepika

Welcome to Day 1

Hope you are doing well
hi everyone,
This is Deepika
Welcome to Day 1
Hope you are doing well


- in above code when executed without .seek() it only prints 1 time

**WHY**

- Python creates a file pointer(cursor) when a file is opened in read mode.

- so when for loop is executed the file pointer reads all the lines and by the end of last iteration it points to EOF

- This is why file.read() doesnot print any output as the file pointer is pointed at EOF

- to avoid this i used .seek() --> this resets the file pointer to starting of file

- But in real time use any one method:
    - either for loop
    - or file.read()

In [23]:
file.seek(0)
print(file.read(5)) #reads and returns given number of characters from entire file
print(file.readline()) #reads only 1 line and give required characters from that single line, not entire file
print(file.readline(2))

hi ev
eryone,

Th


**Writing a file**

In [25]:
file = open('Demo_file_handling.txt','w')
file.write('Hey, this is write operation')
file.close()

In [28]:
file = open('Demo_file_handling.txt','r')
print(file.read())

Hey, this is write operation


- write operation overrides the existing content

- and always close a file after performing write operation.

- Reopen it when performing read

**Append operation**

In [33]:
with open('Demo_file_handling.txt', 'a') as file:
    file.write('This is appended text')
    file.write('\n')
    file.write('Hi')

In [36]:
file = open('Demo_file_handling.txt','r')
for i in file:
    print(i)

Hey, this is write operationThis is appended textThis is appended text

Heloo everyoneThis is appended text

Hi
