# Data Science Internship – February 2026
## Data Processing and Analysis Task – 3
### Submitted by: Rajeev Rathore
### Organization: Innomatics Research Labs

---

## Overview
This notebook contains solutions to Data Processing and Analysis Task – 3
Each problem is solved using structured Python code with proper logic and comments.


# Problem 1: Employee Performance Bonus Eligibility

## Introduction
In this problem, we are given a dictionary containing employee names as keys and their performance scores as values. The objective is to identify the highest performance score and determine which employees are eligible for the top performance bonus. If multiple employees share the same highest score, all of them should be included.

## Concepts Used
- Python Dictionaries
- max() function
- List Comprehension
- Conditional Filtering

## Approach
1. Extract all performance scores from the dictionary.
2. Use the `max()` function to determine the highest score.
3. Iterate through the dictionary to find employees whose score matches the highest value.
4. Display the names of eligible employees along with the highest score.


In [59]:
employees = { "Ravi": 92, "Anita": 88, "Kiran": 92, "Suresh": 85 }

highest_score = max(employees.values())

top_performers = [name for name, score in employees.items() if score == highest_score]

print("Top Performers Eligible for Bonus:", ", ".join(top_performers), f"(Score: {highest_score})")


Top Performers Eligible for Bonus: Ravi, Kiran (Score: 92)


# Problem 2: Search Query Keyword Analysis

## Introduction
This problem focuses on analyzing a search query entered by a user. The goal is to process the input text by converting it to lowercase, counting the frequency of each keyword, and displaying only those keywords that appear more than once.

## Concepts Used
- String manipulation
- Lowercase conversion
- Dictionary for frequency counting
- Conditional filtering

## Approach
1. Convert the input string to lowercase to ensure case-insensitive comparison.
2. Split the string into individual words.
3. Use a dictionary to count the frequency of each word.
4. Filter and display only those words with a frequency greater than one.


In [61]:
query = "Buy mobile phone buy phone online"

query = query.lower()
words = query.split()

frequency = {}

for word in words:
    frequency[word] = frequency.get(word, 0) + 1

result = {word: count for word, count in frequency.items() if count > 1}

print(result)


{'buy': 2, 'phone': 2}


# Problem 3: Sensor Data Validation

## Introduction
In this scenario, a factory records hourly sensor readings in a list. The index represents the hour, and the value represents the sensor reading. Even numbers are considered valid readings, while odd numbers are considered invalid.

## Concepts Used
- Lists
- enumerate() function
- Tuples
- Conditional statements

## Approach
1. Iterate through the list using `enumerate()` to access both index and value.
2. Check whether the reading is even using the modulus operator.
3. Store valid readings as (hour, value) pairs in a new list.
4. Display the list of valid readings.


In [63]:
sensor_readings = [3, 4, 7, 8, 10, 12, 5]

valid_readings = []

for index, value in enumerate(sensor_readings):
    if value % 2 == 0:
        valid_readings.append((index, value))

print("Valid Sensor Readings (Hour, Value):")
print(valid_readings)


Valid Sensor Readings (Hour, Value):
[(1, 4), (3, 8), (4, 10), (5, 12)]


# Problem 4: Email Domain Usage Analysis

## Introduction
This problem involves analyzing a list of email addresses to determine the usage distribution of different email domains. The objective is to calculate how many users belong to each domain and compute the percentage usage.

## Concepts Used
- String splitting
- Dictionary counting
- Percentage calculation

## Approach
1. Extract the domain name from each email address using the `split()` method.
2. Count the frequency of each domain using a dictionary.
3. Calculate the percentage usage based on the total number of emails.
4. Display the domain names along with their percentage usage.


In [65]:
emails = [
    "ravi@gmail.com",
    "anita@yahoo.com",
    "kiran@gmail.com",
    "suresh@gmail.com",
    "meena@yahoo.com"
]

domain_count = {}

for email in emails:
    domain = email.split("@")[1]
    domain_count[domain] = domain_count.get(domain, 0) + 1

total_users = len(emails)

for domain, count in domain_count.items():
    percentage = (count / total_users) * 100
    print(f"{domain}: {percentage:.0f}%")


gmail.com: 60%
yahoo.com: 40%


# Problem 5: Sales Spike Detection

## Introduction
In this problem, daily sales data is provided in a list. The objective is to calculate the average daily sales and identify days where sales exceed 30% above the average, indicating a potential sales spike.

## Concepts Used
- List operations
- Average calculation
- Threshold comparison
- Conditional statements

## Approach
1. Calculate the average daily sales.
2. Determine the threshold as 30% above the average.
3. Iterate through the sales list and compare each value against the threshold.
4. Display the day number and sales value for detected spikes.


In [67]:
sales = [1200, 1500, 900, 2200, 1400, 3000]

average_sales = sum(sales) / len(sales)
threshold = average_sales * 1.30   # 30% above average

for index, value in enumerate(sales):
    if value > threshold:
        print(f"Day {index + 1}: {value}")


Day 6: 3000


# Problem 6: Duplicate User ID Detection

## Introduction
Duplicate user IDs can create data integrity issues in a system. In this problem, we are given a list of user IDs, and the goal is to identify duplicate entries and count how many times each duplicate appears.

## Concepts Used
- Lists
- Dictionary for frequency counting
- Conditional filtering

## Approach
1. Traverse the list of user IDs.
2. Count occurrences using a dictionary.
3. Identify IDs with a count greater than one.
4. Display duplicate IDs along with their frequency.


In [69]:
user_ids = ["user1", "user2", "user1", "user3", "user1", "user3"]

id_count = {}

for user in user_ids:
    id_count[user] = id_count.get(user, 0) + 1

for user, count in id_count.items():
    if count > 1:
        print(f"{user} → {count} times")


user1 → 3 times
user3 → 2 times
