# Entry-Level Automation Engineer Live Coding Assessment

## Instructions:
- This test has **2 tasks**, each designed to be completed in **~30 minutes**.
- Use Python (you may import any standard library like `pandas`, `json`, etc.)
- You can test your solution in this notebook.
- Submit this notebook after completion.

---


### ✅ Task 1: Python Dictionary Filter

Write a Python function that filters a list of dictionaries and returns only those that have **all the required keys**.

#### Function Signature:
```python
def filter_dicts(data: list[dict], required_keys: list[str]) -> list[dict]:
    pass
```

#### Example Input:
```python
data = [
    {"name": "Alice", "age": 25},
    {"name": "Bob"},
    {"name": "Charlie", "age": 30, "city": "Mumbai"}
]

required_keys = ["name", "age"]
```

#### Expected Output:
```python
[
    {"name": "Alice", "age": 25},
    {"name": "Charlie", "age": 30, "city": "Mumbai"}
]
```

👉 Implement your code below:

In [21]:
## Task 1 solution
def filter_dicts(data: list[dict], required_keys: list[str]) -> list[dict]:
    filter = []
    for item in data:
        if all(k in item for k in required_keys):
            filter.append(item)
    return filter

required_keys = ["name", "age"]

data = [
    {"name": "Alice", "age": 25},
    {"name": "Bob"},
    {"name": "Charlie", "age": 30, "city": "Mumbai"}
]

output = filter_dicts(data, required_keys)
print(output)

[{'name': 'Alice', 'age': 25}, {'name': 'Charlie', 'age': 30, 'city': 'Mumbai'}]


### Write a function to filter out only those dictionaries that have at least 50% of the number of keys compared to the dictionary with the maximum number of keys in the list.

In [1]:
def filter(data):
    max_k = max(len(d) for d in data)

    threshold = max_k / 2

    filtered_d = [d for d in data if len(d) >= threshold]

    return filtered_d

data = [
    {"name": "Alice", "age": 25},
    {"name": "Bob"},
    {"name": "Charlie", "age": 30, "city": "Mumbai"}
]
output = filter(data)
print(output)

[{'name': 'Alice', 'age': 25}, {'name': 'Charlie', 'age': 30, 'city': 'Mumbai'}]


### Sort the given JSON Data Based on Employee Count.

In [2]:
def sort(input):
    sorted_items = sorted( input.items() , 
                            key = lambda x: x[1]['employeeCount'] , 
                            reverse = True)

    new_data = dict(sorted_items)
    return new_data

input = {
    "EC": {
        "employeeCount": 0,
        "Full Day": 0,
        "Half Day": 0,
        "ELCount": 0
    },
    "Captable": {
        "employeeCount": 14,
        "Full Day": 8,
        "Half Day": 6,
        "ELCount": 11
    },
    "Financials": {
        "employeeCount": 0,
        "Full Day": 0,
        "Half Day": 0,
        "ELCount": 0
    },
    "Downloads": {
        "employeeCount": 0,
        "Full Day": 0,
        "Half Day": 0,
        "ELCount": 0
    }
}

result = sort(input)
print(result)

{'Captable': {'employeeCount': 14, 'Full Day': 8, 'Half Day': 6, 'ELCount': 11}, 'EC': {'employeeCount': 0, 'Full Day': 0, 'Half Day': 0, 'ELCount': 0}, 'Financials': {'employeeCount': 0, 'Full Day': 0, 'Half Day': 0, 'ELCount': 0}, 'Downloads': {'employeeCount': 0, 'Full Day': 0, 'Half Day': 0, 'ELCount': 0}}


### ✅ Task 2: Pandas – Clean a CSV File

You are given a CSV file named `students.csv` (attached separately).

Your task is to write a script that:
1. Reads the CSV using Pandas  
2. Strips extra spaces in column names  
3. Removes duplicate rows  
4. Drops rows where any value is missing  
5. Prints the cleaned DataFrame and row counts before and after cleaning  
6. Saves the cleaned file as `clean_students.csv`

👉 Implement your code below:

In [22]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

In [23]:
# Task 2 solution
df = pd.read_csv('students.csv') #Used Pandas to read the csv file into datafrane

In [24]:
df.head() #Top 5 rows of the csv file, to understand the data

Unnamed: 0,Name,Age,City
0,Julia,21,Hyderabad
1,Julia,21,Hyderabad
2,Grace,twenty,Kolkata
3,Charlie,22,Mumbai
4,Julia,21,Kolkata


In [25]:
print("Size of data : ")
df.shape

Size of data : 


(100, 3)

In [26]:
# To get to know the data types of column
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0    Name   100 non-null    object
 1    Age    83 non-null     object
 2    City   100 non-null    object
dtypes: object(3)
memory usage: 2.5+ KB


In [27]:
print('Summary of the Dataset : ')
df.describe()

Summary of the Dataset : 


Unnamed: 0,Name,Age,City
count,100,83,100
unique,10,5,6
top,Grace,20,Hyderabad
freq,14,21,22


In [28]:
df.duplicated().sum()   #to find the duplicates

np.int64(14)

In [29]:
df.drop_duplicates(inplace = True)   #removing all the duplicates using dropna

In [30]:
df.duplicated().sum()

np.int64(0)

In [31]:
df.isnull().sum()   # to find the missing values

Unnamed: 0,0
Name,0
Age,15
City,0


In [32]:
df = df.dropna()   #doing the same, as in duplicated to remove the missing values

In [33]:
df.isnull().sum()   #to check there are still missing values are not

Unnamed: 0,0
Name,0
Age,0
City,0


In [34]:
df.shape   #shape of the dataset after deleting rows and columns

(71, 3)

In [35]:
df.tail()

Unnamed: 0,Name,Age,City
90,Bob,21,Bangalore
91,Frank,22,Bangalore
92,Grace,22,Chennai
95,Charlie,20,Chennai
97,David,21,Delhi


In [36]:
df

Unnamed: 0,Name,Age,City
0,Julia,21,Hyderabad
2,Grace,twenty,Kolkata
3,Charlie,22,Mumbai
4,Julia,21,Kolkata
5,Grace,23,Delhi
...,...,...,...
90,Bob,21,Bangalore
91,Frank,22,Bangalore
92,Grace,22,Chennai
95,Charlie,20,Chennai


In [37]:
df.to_csv("clean_students.csv", index=False)

### 📩 Submission
Please **rename this notebook to your full name**, e.g., `YourName_LiveCoding.ipynb`, and submit it **along with** the generated `clean_students.csv` file.