<a href="https://colab.research.google.com/github/Dandyeriametor/Asssignment2-colab-git/blob/main/Assignment_4_Data_Processing_Python_Development_and_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Lesson 4 — Python & Pandas Fundamentals

**Author:** _Dandy Eriametor

Business Intelligence Analyst

Student Number: DE111080

This notebook completes the tasks specified in the assignment:
- Python basics (tuples, sets, dicts, functions/lambdas, iterators/generators, map/reduce/filter, OOP)
- Pandas data wrangling (DataFrames, dropping columns, null handling, groupby/describe, concat/merge)
- A small HR dataset for salary analysis



In [2]:
# Import all the required Libraries

import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
%matplotlib inline
from functools import reduce  # for reduce()


## 1) Create a DataFrame

In [3]:

data = {
    "Name": ["Alice", "Bob", "Charlie", "Diana"],
    "Age":  [25, 30, 35, 28],
    "City": ["Toronto", "Montreal", "Vancouver", "Calgary"]
}
df = pd.DataFrame(data)
df["Name"] = df["Name"].astype("string")
df["Age"]  = pd.to_numeric(df["Age"], errors="coerce").astype("Int64")
df["City"] = df["City"].astype("string")
print("Initial DataFrame (with dtypes):")
print(df)
print("\nDtypes:")
print(df.dtypes)


Initial DataFrame (with dtypes):
      Name  Age       City
0    Alice   25    Toronto
1      Bob   30   Montreal
2  Charlie   35  Vancouver
3    Diana   28    Calgary

Dtypes:
Name    string[python]
Age              Int64
City    string[python]
dtype: object


## 2) Row and Column Manipulation — Drop 'City'

In [4]:

df2 = df.copy()
if "City" in df2.columns:
    df2 = df2.drop(columns=["City"])
print("After dropping 'City':")
print(df2)
print("\nColumns now:", list(df2.columns))
assert list(df2.columns) == ["Name", "Age"], "Expected only 'Name' and 'Age' columns."
print(" Verified only 'Name' and 'Age' remain.")


After dropping 'City':
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
3    Diana   28

Columns now: ['Name', 'Age']
 Verified only 'Name' and 'Age' remain.


## 3) Handling Null Values

In [5]:

null_df = pd.DataFrame({
    "A": [1, None, 3],
    "B": [None, 2.5, 3.5],
    "C": [None, None, "x"]
})
print("Original with nulls:")
print(null_df)

filled_df = null_df.copy()
for col in filled_df.columns:
    if pd.api.types.is_numeric_dtype(filled_df[col]):
        filled_df[col] = filled_df[col].fillna(0)
    else:
        filled_df[col] = filled_df[col].fillna("missing")
print("\nFilled by dtype:")
print(filled_df)

ffill_bfill_df = null_df.fillna(method="ffill").fillna(method="bfill")
print("\nForward fill then backfill:")
print(ffill_bfill_df)

dropped_df = null_df.dropna()
print("\nRows with no nulls (dropna):")
print(dropped_df)


Original with nulls:
     A    B     C
0  1.0  NaN  None
1  NaN  2.5  None
2  3.0  3.5     x

Filled by dtype:
     A    B        C
0  1.0  0.0  missing
1  0.0  2.5  missing
2  3.0  3.5        x

Forward fill then backfill:
     A    B  C
0  1.0  2.5  x
1  1.0  2.5  x
2  3.0  3.5  x

Rows with no nulls (dropna):
     A    B  C
2  3.0  3.5  x


  ffill_bfill_df = null_df.fillna(method="ffill").fillna(method="bfill")


## 4) GroupBy and Describe

In [7]:

cat_df = pd.DataFrame({
    "Category": ["A","A","B","B","B","C"],
    "Value":    [10, 12, 5, 7, 11, 9]
})

grouped = cat_df.groupby("Category")["Value"].describe()
print("GroupBy describe on 'Value' by 'Category':")
print(grouped)

print("\nInterpretation notes:")
for idx, row in grouped.iterrows():
    std_val = row['std']
    if pd.isna(std_val):
        std_str = "nan"
    else:
        std_str = f"{std_val:.2f}"

    print(f"- Category {idx}: count={row['count']}, "
          f"mean={row['mean']:.2f}, std={std_str}, "
          f"min={row['min']}, 25%={row['25%']}, "
          f"50%={row['50%']}, 75%={row['75%']}, max={row['max']}")

GroupBy describe on 'Value' by 'Category':
          count       mean       std   min   25%   50%   75%   max
Category                                                          
A           2.0  11.000000  1.414214  10.0  10.5  11.0  11.5  12.0
B           3.0   7.666667  3.055050   5.0   6.0   7.0   9.0  11.0
C           1.0   9.000000       NaN   9.0   9.0   9.0   9.0   9.0

Interpretation notes:
- Category A: count=2.0, mean=11.00, std=1.41, min=10.0, 25%=10.5, 50%=11.0, 75%=11.5, max=12.0
- Category B: count=3.0, mean=7.67, std=3.06, min=5.0, 25%=6.0, 50%=7.0, 75%=9.0, max=11.0
- Category C: count=1.0, mean=9.00, std=nan, min=9.0, 25%=9.0, 50%=9.0, 75%=9.0, max=9.0


## 5) Concatenation and Merging

In [8]:

df1 = pd.DataFrame({"id":[1,2], "A":[10,20]})
df2 = pd.DataFrame({"id":[3,4], "A":[30,40]})
df3 = pd.DataFrame({"id":[1,2,3,4], "B":[100,200,300,400]})

vcat = pd.concat([df1, df2], axis=0, ignore_index=True)
print("Vertical concat of df1 & df2:")
print(vcat)
print("Shape:", vcat.shape)

merged = vcat.merge(df3, on="id", how="left")
print("\nMerged with df3 (by 'id'):")
print(merged)
print("Shape:", merged.shape)
assert merged.shape == (4, 3), "Expected 4 rows and 3 columns after merge."
print(" Merge shape correct.")


Vertical concat of df1 & df2:
   id   A
0   1  10
1   2  20
2   3  30
3   4  40
Shape: (4, 2)

Merged with df3 (by 'id'):
   id   A    B
0   1  10  100
1   2  20  200
2   3  30  300
3   4  40  400
Shape: (4, 3)
 Merge shape correct.


## 6) Tuples and Sets

In [9]:

fruits = ("apple", "banana", "cherry")
numbers = {1, 2, 3, 4, 5}
print("Tuple:", fruits)
print("Set:", numbers)

print("\nAdding element to tuple via concatenation (creates NEW tuple):")
fruits = fruits + ("date",)
print("New tuple:", fruits)

print("\nAdding to set (mutable):")
numbers.add(6)
print("Updated set:", numbers)

print("\nDifference: Tuples are immutable (no in-place changes). Sets are mutable, unordered, and unique-valued.")


Tuple: ('apple', 'banana', 'cherry')
Set: {1, 2, 3, 4, 5}

Adding element to tuple via concatenation (creates NEW tuple):
New tuple: ('apple', 'banana', 'cherry', 'date')

Adding to set (mutable):
Updated set: {1, 2, 3, 4, 5, 6}

Difference: Tuples are immutable (no in-place changes). Sets are mutable, unordered, and unique-valued.


## 7) Dictionaries

In [10]:

scores = {"Alice": 85, "Bob": 90, "Charlie": 78}
print("Original:", scores)
scores["Bob"] = 92
scores["Diana"] = 88
print("Updated:", scores)


Original: {'Alice': 85, 'Bob': 90, 'Charlie': 78}
Updated: {'Alice': 85, 'Bob': 92, 'Charlie': 78, 'Diana': 88}


## 8) Functions and Lambda

In [11]:

def square(x):
    try:
        return x * x
    except TypeError:
        raise TypeError("square() expects a numeric input")

square_lambda = lambda x: x * x

for val in [3, 5]:
    print(f"square({val}) =", square(val), "| lambda:", square_lambda(val))


square(3) = 9 | lambda: 9
square(5) = 25 | lambda: 25


## 9) Iterators and Generators

In [12]:

class FirstNEvens:
    def __init__(self, n):
        self.n = n
        self.current = 0
        self.count = 0
    def __iter__(self):
        return self
    def __next__(self):
        if self.count >= self.n:
            raise StopIteration
        value = self.current
        self.current += 2
        self.count += 1
        return value

def even_generator(n):
    num = 0
    for _ in range(n):
        yield num
        num += 2

print("Iterator output:", list(FirstNEvens(5)))
print("Generator output:", list(even_generator(5)))


Iterator output: [0, 2, 4, 6, 8]
Generator output: [0, 2, 4, 6, 8]


## 10) Map, Reduce, and Filter

In [13]:

nums = [1, 2, 3, 4, 5]

squared = list(map(lambda x: x*x, nums))
from functools import reduce
product = reduce(lambda a, b: a*b, nums, 1)
evens = list(filter(lambda x: x % 2 == 0, nums))

print("Original:", nums)
print("Squared (map):", squared)
print("Product (reduce):", product)
print("Evens (filter):", evens)


Original: [1, 2, 3, 4, 5]
Squared (map): [1, 4, 9, 16, 25]
Product (reduce): 120
Evens (filter): [2, 4]


## 11) Object-Oriented Programming — Rectangle

In [14]:

class Rectangle:
    def __init__(self, length, width):
        if length <= 0 or width <= 0:
            raise ValueError("Length and width must be positive numbers.")
        self.length = float(length)
        self.width = float(width)
    def area(self):
        return self.length * self.width
    def perimeter(self):
        return 2 * (self.length + self.width)

rect1 = Rectangle(5, 3)
rect2 = Rectangle(10, 2.5)

print("Rectangle 1: area =", rect1.area(), ", perimeter =", rect1.perimeter())
print("Rectangle 2: area =", rect2.area(), ", perimeter =", rect2.perimeter())


Rectangle 1: area = 15.0 , perimeter = 16.0
Rectangle 2: area = 25.0 , perimeter = 25.0


## 12) Pandas Data Analysis — HR Example

In [15]:

employees = pd.DataFrame({
    "Name": ["Alice","Bob","Charlie","Diana","Evan","Fiona"],
    "Department": ["IT","HR","IT","Finance","Finance","HR"],
    "Salary": [70000, 58000, 65000, 72000, 54000, 61000]
})
print("Employees DataFrame:")
print(employees)

avg_salary = employees.groupby("Department")["Salary"].mean().round(2)
print("\nAverage salary by department:")
print(avg_salary)

high_earners = employees.loc[employees["Salary"] > 60000, "Name"]
print("\nEmployees with salary > 60000:")
print(list(high_earners))

employees["Bonus"] = (employees["Salary"] * 0.10).round(2)
print("\nWith Bonus column (10% of Salary):")
print(employees)


Employees DataFrame:
      Name Department  Salary
0    Alice         IT   70000
1      Bob         HR   58000
2  Charlie         IT   65000
3    Diana    Finance   72000
4     Evan    Finance   54000
5    Fiona         HR   61000

Average salary by department:
Department
Finance    63000.0
HR         59500.0
IT         67500.0
Name: Salary, dtype: float64

Employees with salary > 60000:
['Alice', 'Charlie', 'Diana', 'Fiona']

With Bonus column (10% of Salary):
      Name Department  Salary   Bonus
0    Alice         IT   70000  7000.0
1      Bob         HR   58000  5800.0
2  Charlie         IT   65000  6500.0
3    Diana    Finance   72000  7200.0
4     Evan    Finance   54000  5400.0
5    Fiona         HR   61000  6100.0



To Clone Repository From Github

In [16]:
# Clone the repository into your Colab environment.
!git clone https://github.com/Dandyeriametor/Asssignment2-colab-git.git

Cloning into 'Asssignment2-colab-git'...
remote: Enumerating objects: 6, done.[K
remote: Counting objects: 100% (6/6), done.[K
remote: Compressing objects: 100% (5/5), done.[K
Receiving objects: 100% (6/6), done.
remote: Total 6 (delta 0), reused 3 (delta 0), pack-reused 0 (from 0)[K
