<a href="https://colab.research.google.com/github/dramitprabhumd-cpu/Phase1_Foundation/blob/main/Phase2_Data_Handling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 60-Day Python Phoenix Sprint: Phase 2

**Author:** Dr. Amit Prabhu<br>
**Goal:** Data Handling and Efficiency. Making things *scalable*.


## Day 16: Duplicate Removal Techniques
**Objective:** Compare iterative logic against built-in Python data structures (Sets).<br>
**Logic:**
1. **Iterative Method:** Uses a for loop and an if not in check to build a unique collection.
2. **Set Method:** Leverages the unique property of the set() object for high-performance cleaning.

In [None]:
def removeDuplicate_loop(inputList):
  '''Manually removes duplicates using a loop and temporary list'''
  cleanList = []
  for item in inputList:
    if item not in cleanList:
      cleanList.append(item)
  return cleanList

def removeDuplicate_set(inputList):
  '''Removes duplicates using the set() data structure (Most Efficient)'''
  return list(set(inputList))

# ---Professional Testing---
rawData = ['ID-101', 'ID-112', 'ID-111', 'ID-106', 'ID-102', 'ID-111', 'ID-104',
'ID-104', 'ID-101', 'ID-110', 'ID-121']

print(f'Cleaned List (Loop Method): {removeDuplicate_loop(rawData)}')
print(f'Cleaned List (Set Method): {removeDuplicate_set(rawData)}')

# Portfolio Note:
# The Set Method is preferred for large datasets
# because it uses 'hashing' to find duplicates instantly.

Cleaned List (Loop Method): ['ID-101', 'ID-112', 'ID-111', 'ID-106', 'ID-102', 'ID-104', 'ID-110', 'ID-121']
Cleaned List (Set Method): ['ID-112', 'ID-111', 'ID-121', 'ID-104', 'ID-102', 'ID-101', 'ID-110', 'ID-106']


## Day 17: Finding the Second Largest (Efficiency Focus)
**Objective:** Track multiple state variables in a single iteration.<br>
**Logic:** Instead of sorting the entire list (which is resource-intensive), we maintain two variables: `largest` and `second_largest`.<br> We update them dynamically as we traverse the list exactly once.

In [None]:
import random
def findSecondLargest(inputList):
  '''Finds the second largest number in the list'''
  largest = second_largest = float('-inf')
  for num in inputList:
    if num > largest:
      second_largest = largest
      largest = num
    elif num > second_largest and num != largest:
      second_largest = num
  if second_largest == float('-inf'):
    return 'Error: No specific 2nd largest number found!'
  return second_largest

# Generating a list of 20 random numbers
myList=[]
for i in range(20):
  n = random.randint(1, 101)
  myList.append(n)

print(f'Original List of {len(myList)} elements: {myList}')
print(f'Second Largest Entry: {findSecondLargest(myList)}')
print(f'Checking with Sorted: {sorted(myList)[-2]}')

## Day 18: Student Grade Book (Dictionary Basics)
**Objective:** Master key-value pair storage and dictionary iteration.<br>
**Logic:** Use a dictionary to map unique strings (Names) to values (Grades). Implement a retrieval loop to calculate aggregate statistics (Average) and demonstrate the efficiency of key-based lookups.

In [None]:
def runGradeBook():

  gradeBook = {}
  print('___Professional Grading System___')

# 1. Data Collection
  while True:
    name = input('Enter Student Name (or "q" to quit): ').strip().title()
    if name.lower() == 'q':
      break
    try:
      score = float(input('Enter grade for {name}: '))
      # Updating grade book
      gradeBook[name] = score
    except ValueError:
      print('Invalid Input. Please enter a valid score!')

    # 2. Results & Analysis
  if gradeBook == {}:
    print('No data entered!!')
    return

  print('___Class Roster___')
  totalScore = 0
  for student, grade in gradeBook.items():
    print(f'Student: {student:12} | {grade}')
    totalScore += grade

    # 3. Final Calculation
  avgScore = totalScore / len(gradeBook)
  print('-'*30)
  print(f'Total Students: {len(gradeBook)}')
  print(totalScore)
  print(f'Class Average: {avgScore:.2f}')
  print('-'*30)

    # Execute Grade Book
runGradeBook()

## Day 19: List Rotation (Sequence Manipulation)
**Objective:** Master the reordering of sequences using `pop()`/`insert()` and slicing techniques.<br>
**Logic:** Implement a "Right Rotation" by isolating the tail element and prepending it to the head. This demonstrates an understanding of *index-based manipulation* and the difference between *modifying data in-place* versus *creating new objects*.

In [None]:
def rotateR_inPlace(data):
  '''Modifies existing List directly'''
  if len(data) < 2:
    return data
  last = data.pop()
  data.insert(0, last)
  return data

def rotateR_slicing(data):
  if len(data) < 2:
    return data
  # data[-1:] creates a list with only last element
  # data[:-1] creates the list without the last element (exclusive)
  return data[-1:] + data[:-1]

myList = ["Apple", "Banana", "Cherry", "Date", "Elderberry"]
print(f"Original List: {myList}")

print(f'Rotated List(Slicing): {rotateR_slicing(myList)}')
print(f'Rotated List(In-Place): {rotateR_inPlace(myList)}')

Original List: ['Apple', 'Banana', 'Cherry', 'Date', 'Elderberry']
Rotated List(Slicing): ['Elderberry', 'Apple', 'Banana', 'Cherry', 'Date']
Rotated List(In-Place): ['Elderberry', 'Apple', 'Banana', 'Cherry', 'Date']


## Day 20: Word Frequency Counter (Dictionary Mapping)
**Objective:** Apply dictionary logic to analyze and quantify textual data.
**Logic:**
1. **Clean:** Normalize text to lowercase and remove basic punctuation.
2. **Split:** Tokenize the string into a list of words.
3. **Map:** Iterate through the list using the `.get()` method to efficiently increment word frequencies within a dictionary.

In [None]:
import string
def wordFrequency(text):
  # 1. Cleaning and Normalization
  # Remove Punctuations
  cleanText = text.translate(str.maketrans('', '', string.punctuation))
  words = cleanText.lower().split()

  # 2. Frequency Mapping
  frequencies = {}
  for word in words:
    frequencies[word] = frequencies.get(word, 0) + 1

  return frequencies

# Testing the Counter
sampleText = """
  Python is powerful and Python is fast. Learning Python
is the best decision for data automation and data analysis!
"""
report = wordFrequency(sampleText)
print('__Word Frequency Counter__')
for word, count in sorted(report.items(), key = lambda item: item[1], reverse = True):
  print(f'{word:12} : {count}')

__Word Frequency Counter__
python       : 3
is           : 3
and          : 2
data         : 2
powerful     : 1
fast         : 1
learning     : 1
the          : 1
best         : 1
decision     : 1
for          : 1
automation   : 1
analysis     : 1


## Day 21: List Comprehensions (Efficiency & Syntactic Sugar)
**Objective:** Replace multi-line loops with concise, readable one-liners.<br>
**Logic:** Use the `[output for item in list if condition]` syntax to filter and transform data simultaneously. This reduces "boilerplate" code and improves execution speed for large datasets.<br>
**Con of List Comprehension -** <br>
*List comprehension* builds one list at a time and for two lists it has to go twice thru the input list (i.e. for 1 million values, list comprehension checks 2 million values!!), where as *for loop* combined with *if-else* statement can avoid this double task!<br>
**Bonus:** *Dictionary Comprehension* - Convert list of rates into labelled dictionary.


In [None]:
# The Scenario: We have a list of patient heart rates.
# We want a new list that:
# 1. Filters out "normal" rates (above 50 AND below 100)
# 2. Formats the "elevated" (tachycardia) & "depressed" (bradycardia) rates as a string with "BPM"

import random

# --- The Old Way (Phase 1 Logic) ---
# This is actually faster at execution as we are creating two lists at a time!!
def oldHRfilter(HeartRates):
  tachycardia = []
  bradycardia = []
  for rate in HeartRates:
    if rate > 100:
      tachycardia.append(f'{rate} BPM')
    elif rate < 50:
      bradycardia.append(f'{rate} BPM')
  return tachycardia, bradycardia

# --- The New Way (List Comprehension) ---
# [Expression | Loop | Condition]
def newHRfilter(HeartRates):
  tachycardia = [f'{rate} BPM' for rate in HeartRates if rate > 100]
  bradycardia = [f'{rate} BPM' for rate in HeartRates if rate < 50]
  return tachycardia, bradycardia

# ---Dictionary Comprehension---
# {Key : Value_Expression for item in list}
# Labeling HR - High/Normal/Low
def AnalyseHR(HeartRates):
  HRanalysis = {rate: 'High' if rate > 100 else 'Low' if rate < 50 else 'Normal'
  for rate in HeartRates}
  return HRanalysis

# List Comprehension to generate list of 25 Random Heart Rates
RateList = [random.randint(25, 150) for i in range(25)]

# --- Data for Analysis ---
print(f'HRs using List Comprehension: {RateList}')
print()

# --- Data Analysis using Old Method ---
highHRold = oldHRfilter(RateList)[0]
lowHRold = oldHRfilter(RateList)[1]
print(f'Standard Loop Result - High Rates: {highHRold}')
print(f'Standard Loop Result - Low Rates: {lowHRold}')

# --- Data Analysis using List Comprehension ---
highHRnew = newHRfilter(RateList)[0]
lowHRnew = newHRfilter(RateList)[1]
print(f'List Comprehension Result - High Rates: {highHRnew}')
print(f'List Comprehension Result - Low Rates: {lowHRnew}')

# --- Confirmation of Identical Results ---
print('Comparing two Results...')
if highHRold == highHRnew:
  print('High Rates Match')
if lowHRold == lowHRnew:
  print('Low Rates Match')

# ---Labeling the Heart Rates---
print(AnalyseHR(RateList))

HRs using List Comprehension: [118, 134, 31, 49, 106, 40, 72, 68, 39, 80, 90, 69, 125, 30, 136, 80, 112, 65, 140, 128, 146, 149, 30, 124, 80]

Standard Loop Result - High Rates: ['118 BPM', '134 BPM', '106 BPM', '125 BPM', '136 BPM', '112 BPM', '140 BPM', '128 BPM', '146 BPM', '149 BPM', '124 BPM']
Standard Loop Result - Low Rates: ['31 BPM', '49 BPM', '40 BPM', '39 BPM', '30 BPM', '30 BPM']
List Comprehension Result - High Rates: ['118 BPM', '134 BPM', '106 BPM', '125 BPM', '136 BPM', '112 BPM', '140 BPM', '128 BPM', '146 BPM', '149 BPM', '124 BPM']
List Comprehension Result - Low Rates: ['31 BPM', '49 BPM', '40 BPM', '39 BPM', '30 BPM', '30 BPM']
Comparing two Results...
High Rates Match
Low Rates Match
{118: 'High', 134: 'High', 31: 'Low', 49: 'Low', 106: 'High', 40: 'Low', 72: 'Normal', 68: 'Normal', 39: 'Low', 80: 'Normal', 90: 'Normal', 69: 'Normal', 125: 'High', 30: 'Low', 136: 'High', 112: 'High', 65: 'Normal', 140: 'High', 128: 'High', 146: 'High', 149: 'High', 124: 'High'}


## Day 22: Dictionary Comprehensions & Data Mapping
**Objective:** Efficiently create and transform key-value pairs in a single line.<br>
**Logic:** Utilize the `{k: v for k, v in dict.items() if condition}` syntax to filter datasets and perform bulk value transformations without manual loops.

In [None]:
# Scenario: A dictionary of inventory items and their current stock levels
inventory = {
    "Masks": 500,
    "Gloves": 1200,
    "Sanitizer": 45,
    "Thermometers": 12,
    "Gowns": 80,
    "Syringes 10mL": 55,
    "Syringes 5mL": 45,
    "Syringes 2mL": 4,
    "Spinal Needles 25G": 180,
    "Spinal Needles 23G": 2,
    "Spinal Needles 26G": 12,
    "Spinal Needles 27G": 220
}
# 1. Filtering: Create a dictionary of only "Low Stock" items (below 25 units)
LowStock = {item:count for item, count in inventory.items() if count < 25}

# 2. Transformation: Increase all stock by 10% (Restock simulation)
Restocked = {item:int(count*1.1) if item in LowStock.keys() else count
 for item, count in inventory.items() }

# 3. Inverting:
InvertedStock = {}
for item, count in inventory.items():
  InvertedStock.setdefault(count, []).append(item)

print("--- Data Transformation Report ---")
print(f"Original Inventory:                  {inventory}")
print(f"Low Stock Items:                     {LowStock}")
print(f"Restocked (+10% for low stock):      {Restocked}")
print(f"Inverted (Count:Name):               {InvertedStock}")
totalItems = sum(len(items) for items in InvertedStock.values())
if len(inventory) == totalItems:
  print(f'Stock Verified: {len(inventory)} items accounted for!')
else:
  print(f'Alert: {len(inventory) - totalItems} lost in translation! ')

--- Data Transformation Report ---
Original Inventory:                  {'Masks': 500, 'Gloves': 1200, 'Sanitizer': 45, 'Thermometers': 12, 'Gowns': 80, 'Syringes 10mL': 55, 'Syringes 5mL': 45, 'Syringes 2mL': 4, 'Spinal Needles 25G': 180, 'Spinal Needles 23G': 2, 'Spinal Needles 26G': 12, 'Spinal Needles 27G': 220}
Low Stock Items:                     {'Thermometers': 12, 'Syringes 2mL': 4, 'Spinal Needles 23G': 2, 'Spinal Needles 26G': 12}
Restocked (+10% for low stock):      {'Masks': 500, 'Gloves': 1200, 'Sanitizer': 45, 'Thermometers': 13, 'Gowns': 80, 'Syringes 10mL': 55, 'Syringes 5mL': 45, 'Syringes 2mL': 4, 'Spinal Needles 25G': 180, 'Spinal Needles 23G': 2, 'Spinal Needles 26G': 13, 'Spinal Needles 27G': 220}
Inverted (Count:Name):               {500: ['Masks'], 1200: ['Gloves'], 45: ['Sanitizer', 'Syringes 5mL'], 12: ['Thermometers', 'Spinal Needles 26G'], 80: ['Gowns'], 55: ['Syringes 10mL'], 4: ['Syringes 2mL'], 180: ['Spinal Needles 25G'], 2: ['Spinal Needles 23G'], 220: 

## Day 23: Immutable Data Structures (Tuples)
**Objective:** Protect data integrity using immutable sequences.<br>
**Logic:** Use tuples `()` for fixed datasets to prevent accidental modification. Practice "Tuple Unpacking" to efficiently distribute data from a single collection into multiple named variables.

In [1]:
# 1. Defining a Tuple (A fixed record of a patient)
# Format: (Name, Age, Blood Type, Is_Inpatient)
patientRecord = ('John', 43, 'B+', True)
print(f'Full Record: {patientRecord}')

# 2. Tuple Unpacking (The professional way to extract data)
name, age, bloodGroup, status = patientRecord

print(f'Extracted Name: {name}')
print(f'Extracted Age: {age} years')

# 3. Demonstrating Immutability: Trying to change age!
try:
  patientRecord[1] = 41
  print('Attempting to change age!')
except TypeError as e:
  print(f'Error: {e}. Tuples cannot be modified')

# 4. Use Case: Returning multiple values from a function
def getMinMax(numbers):
  return min(numbers), max(numbers)

import random
Scores = []
for i in range(15):
  Scores.append(random.randint(25, 1001))

low, high = getMinMax(Scores)
print(getMinMax(Scores))
print(f'Stats: Lowest - {low} & Highest - {high}')

Full Record: ('John', 43, 'B+', True)
Extracted Name: John
Extracted Age: 43 years
Error: 'tuple' object does not support item assignment. Tuples cannot be modified
(128, 966)
Stats: Lowest - 128 & Highest - 966


## Day 24: Dictionary Merging & Value Summation
**ðŸŽ¯ Objective:** The goal is to merge two dictionaries (dict_a and dict_b).<br>
Unlike a standard update where overlapping keys are simply overwritten, this task requires summing the values for any keys that appear in both dictionaries.<br>
**ðŸ§  Logic & Strategy:**To solve this efficiently, we need to ensure that:
1. All unique keys from both dictionaries are captured.
2. If a key exists in both, the output value is $Value_A + Value_B$.
3. The original dictionaries remain unchanged (immutability).<br><br>

ðŸŽ¯Two ways to approach this task:
- **The Manual Approach**: Using a for loop and the .get() method to handle missing keys without crashing.

- **The Pythonic Approach**: Using collections.Counter, which is a built-in tool specifically designed for these types of mathematical dictionary operations.


In [7]:

# 1. Using for loop
def mergeNcount_For(d1, d2):
  # To keep dictionary unaltered we use copy()
  result = d1.copy()
  for key, value in d2.items():
    result[key] = result.get(key,0) + value
  return result

# 2. Using Counter
from collections import Counter
def mergeNcount_Counter(d1, d2):
  d1C = Counter(d1)
  d2C = Counter(d2)
  return dict(d1C + d2C)

FruitBask1 = {'apples': 10, 'bananas': 5, 'orange': 8, 'papaya':4, 'pomagrenate':20}
FruitBask2 = {'apples': 4, 'bananas': 12, 'orange': 12, 'kiwi':14, 'strawberry':23}

print(f'Fruit Basket 01: {FruitBask1}')
print(f'Fruit Basket 02: {FruitBask2}')

print(f'Total Fruits (For Method): {mergeNcount_For(FruitBask1, FruitBask2)}')
print(f'Crosscheck (Counter) :     {mergeNcount_Counter(FruitBask1, FruitBask2)}')



Fruit Basket 01: {'apples': 10, 'bananas': 5, 'orange': 8, 'papaya': 4, 'pomagrenate': 20}
Fruit Basket 02: {'apples': 4, 'bananas': 12, 'orange': 12, 'kiwi': 14, 'strawberry': 23}
Total Fruits (For Method): {'apples': 14, 'bananas': 17, 'orange': 20, 'papaya': 4, 'pomagrenate': 20, 'kiwi': 14, 'strawberry': 23}
Crosscheck (Counter) :     {'apples': 14, 'bananas': 17, 'orange': 20, 'papaya': 4, 'pomagrenate': 20, 'kiwi': 14, 'strawberry': 23}
