## Extra Exercises 1 - Solutions

In [1]:
import numpy as np
import pandas as pd

</br>

#### 1. Combine many series to form a DataFrame

In [2]:
series1 = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
series2 = pd.Series(np.arange(26))

**Solution**:

In [3]:
# Solution 1
df_concat = pd.concat([series1, series2], axis=1)

# Solution 2
df_dataframe = pd.DataFrame({'col1': series1, 'col2': series2})

</br>

#### 2. Keep only top 2 most frequent values as it is and replace everything else as 'Other'

In [88]:
np.random.seed(42)
series = pd.Series(np.random.randint(1, 5, [12]))

**Solution**:

In [89]:
value_counts = series.value_counts()
top_two_values = value_counts.index[:2]
result = series.apply(lambda x: x if x in top_two_values else 'Other')
result

0         3
1     Other
2         1
3         3
4         3
5     Other
6         1
7         1
8         3
9     Other
10        3
11        3
dtype: object

**Extra**: Do the same but for the highest and lowest value count, replacing everything else as other.

In [91]:
value_counts = series.value_counts()

highest_count_value = value_counts.idxmax()
lowest_count_value = value_counts.idxmin()

result = series.apply(lambda x: x if x in [highest_count_value, lowest_count_value] else 'other')
result

0         3
1     other
2     other
3         3
4         3
5     other
6     other
7     other
8         3
9         2
10        3
11        3
dtype: object

</br>

#### 3. Extract items at given positions from a series

In [48]:
series = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
positions = [0, 4, 8, 14, 20]

**Solution**:

In [49]:
series.take(positions)

0     a
4     e
8     i
14    o
20    u
dtype: object

</br>

#### 4. Compute differences between consecutive numbers of a series

**Desired Output**: [nan, 2.0, 3.0, 4.0, 5.0, 6.0, 6.0, 8.0]

In [50]:
series = pd.Series([1, 3, 6, 10, 15, 21, 27, 35])

**Solution**:

In [52]:
diff_series = series.diff().tolist()
diff_series

[nan, 2.0, 3.0, 4.0, 5.0, 6.0, 6.0, 8.0]

**Extra**:

In [54]:
cumulative_series = series.cumsum().tolist()
cumulative_series

[1, 4, 10, 20, 35, 56, 83, 118]

</br>

#### 5. String Manipulation

Have the user enter a string and write a function that capitalizes the first letter of each word.

**Solution**:

In [92]:
def title_case(sentence):
    titlecased_sentence = sentence.title()  # Convert to title case
    return titlecased_sentence

input_sentence = input("Enter a sentence: ")

titlecased_result = title_case(input_sentence)
print("Titlecased result:", titlecased_result)

Enter a sentence:  i like pasta


Titlecased result: I Like Pasta


**Extra**:

In [93]:
word_count = len(input_sentence.split())
print("Number of words:", word_count)

Number of words: 3


</br>

#### 6. Extracting Data

In [63]:
data = [
    {
        "id": 1,
        "name": "John",
        "department": ["HR", "New York"],
        "skills": ["communication", "teamwork", "leadership"]
    },
    {
        "id": 2,
        "name": "Jane",
        "department": ["Finance", "San Francisco"],
        "skills": ["accounting", "analysis"]
    },
    {
        "id": 3,
        "name": "Michael",
        "department": ["Engineering", "Seattle"],
        "skills": ["programming", "problem-solving"]
    },
    {
        "id": 4,
        "name": "Sarah",
        "department": ["Marketing", "Los Angeles"],
        "skills": ["marketing", "creativity"]
    },
    {
        "id": 5,
        "name": "David",
        "department": ["Sales", "Chicago"],
        "skills": ["sales", "negotiation"]
    },
    {
        "id": 6,
        "name": "Emily",
        "department": ["Engineering", "Seattle"],
        "skills": ["coding", "troubleshooting"]
    }
]

##### A. Extract all employee names

In [64]:
employee_names = [employee["name"] for employee in data]
employee_names

['John', 'Jane', 'Michael', 'Sarah', 'David', 'Emily']

##### B. Find the department of employee with ID 4

In [65]:
employee_id = 4
department = next(employee["department"][0] for employee in data if employee["id"] == employee_id)
department

'Marketing'

##### C. Count the number of employees in each department

In [66]:
department_counts = {}
for employee in data:
    department = employee["department"][0]
    department_counts[department] = department_counts.get(department, 0) + 1
department_counts

{'HR': 1, 'Finance': 1, 'Engineering': 2, 'Marketing': 1, 'Sales': 1}

##### D. Find the names of employees in the "Engineering" department

In [67]:
engineering_employees = [employee["name"] for employee in data if "Engineering" in employee["department"]]
engineering_employees

['Michael', 'Emily']

##### E. Extract names of employees with skills including "Programming"

In [68]:
programming_experts = [employee["name"] for employee in data if "programming" in employee["skills"]]
programming_experts

['Michael']

##### F. Calculate the average number of skills per employee

In [69]:
total_skills = sum(len(employee["skills"]) for employee in data)
average_skills = total_skills / len(data)
average_skills

2.1666666666666665

##### G. Find the employee with the most skills

In [70]:
most_skills_employee = max(data, key=lambda x: len(x["skills"]))
most_skills_employee["name"]

'John'