Name: Daniel Adejumo

Net ID: dxa239

In [2]:
import numpy as np
import pandas as pd

## HW Questions (Numpy)


### 1. Stack arrays vertically and horizontally
Define two custom numpy arrays, A and B. Generate two new arrays by stacking them.


In [3]:
# Define two custom arrays
A = np.arange(10).reshape(2, 5)
B = np.arange(10, 20).reshape(2, 5)

print("Array A:\n", A)
print("Array B:\n", B)

# Vertical Stack
v_stack = np.vstack([A, B])
print("\nVertical Stack:\n", v_stack)

# Horizontal Stack
h_stack = np.hstack([A, B])
print("\nHorizontal Stack:\n", h_stack)


Array A:
 [[0 1 2 3 4]
 [5 6 7 8 9]]
Array B:
 [[10 11 12 13 14]
 [15 16 17 18 19]]

Vertical Stack:
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

Horizontal Stack:
 [[ 0  1  2  3  4 10 11 12 13 14]
 [ 5  6  7  8  9 15 16 17 18 19]]


### 2. Find common elements
Find common elements between A and B.


In [4]:
# Redefining A and B as 1D arrays for clearer intersection demonstration
A_1d = np.array([1, 2, 3, 2, 3, 4, 3, 4, 5, 6])
B_1d = np.array([7, 2, 10, 2, 7, 4, 9, 4, 9, 8])

# Using np.intersect1d
common_elements = np.intersect1d(A_1d, B_1d)
print(f"Common elements: {common_elements}")


Common elements: [2 4]


### 3. Extract numbers within a specific range
Extract all numbers from A which are between 5 and 10.


In [5]:
A_range = np.array([2, 6, 1, 9, 10, 3, 27])

# Using boolean masking
# Condition: (A >= 5) AND (A <= 10)
mask = (A_range >= 5) & (A_range <= 10)
extracted_elements = A_range[mask]

# Alternative using np.where to get indices first
indices = np.where((A_range >= 5) & (A_range <= 10))
extracted_via_where = A_range[indices]

print(f"Original Array: {A_range}")
print(f"Elements between 5 and 10: {extracted_elements}")

Original Array: [ 2  6  1  9 10  3 27]
Elements between 5 and 10: [ 6  9 10]


### 4. Filter Iris Dataset
Filter the rows of iris_2d that has petallength (3rd col) > 1.5 and sepallength (1st col) < 5.0


In [6]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'

# Load the dataset
try:
    iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
    
    # 1st column is index 0 (sepallength), 3rd column is index 2 (petallength)
    # Condition: petallength > 1.5 AND sepallength < 5.0
    condition = (iris_2d[:, 2] > 1.5) & (iris_2d[:, 0] < 5.0)
    
    filtered_iris = iris_2d[condition]
    
    print(f"Original shape: {iris_2d.shape}")
    print(f"Filtered shape: {filtered_iris.shape}")
    print("\nFiltered data:\n", filtered_iris)
    
except Exception as e:
    print(f"Could not load data from URL directly due to: {e}")


Original shape: (150, 4)
Filtered shape: (6, 4)

Filtered data:
 [[4.8 3.4 1.6 0.2]
 [4.8 3.4 1.9 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [4.9 2.4 3.3 1. ]
 [4.9 2.5 4.5 1.7]]


## HW Questions (Pandas)

### 1. Filter specific columns for every 20th row
From df filter the 'Manufacturer', 'Model' and 'Type' for every 20th row starting from 1st (row 0).

In [7]:
df_cars = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# iloc[start:stop:step]
# We want every 20th row (::20) and specific columns
result_q1 = df_cars.iloc[::20][['Manufacturer', 'Model', 'Type']]

print("Every 20th row (Manufacturer, Model, Type):")
print(result_q1)

Every 20th row (Manufacturer, Model, Type):
   Manufacturer    Model     Type
0         Acura  Integra    Small
20     Chrysler  LeBaron  Compact
40        Honda  Prelude   Sporty
60      Mercury   Cougar  Midsize
80       Subaru   Loyale    Small


### 2. Replace missing values with Mean
Replace missing values in Min.Price and Max.Price columns with their respective mean.

In [9]:
# Reloading ensuring we have the fresh data
df_cars = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

print("\nMissing values before imputation:")
print(df_cars[['Min.Price', 'Max.Price']].isnull().sum())

# Calculate means
min_price_mean = df_cars['Min.Price'].mean()
max_price_mean = df_cars['Max.Price'].mean()

# Fill NA
df_cars['Min.Price'] = df_cars['Min.Price'].fillna(min_price_mean)
df_cars['Max.Price'] = df_cars['Max.Price'].fillna(max_price_mean)

print("\nMissing values after imputation:")
print(df_cars[['Min.Price', 'Max.Price']].isnull().sum())


Missing values before imputation:
Min.Price    7
Max.Price    5
dtype: int64

Missing values after imputation:
Min.Price    0
Max.Price    0
dtype: int64


### 3. Filter rows with Row Sum > 100
Get the rows of a dataframe with row sum > 100.

In [14]:
# Generate the random dataframe
df_rnd = pd.DataFrame(np.random.randint(10, 40, 60).reshape(-1, 4))

print("Original DataFrame (first 5 rows):\n", df_rnd.head())

# Calculate sum across axis 1 (rows)
row_sums = df_rnd.sum(axis=1)

# Filter
rows_gt_100 = df_rnd[row_sums > 100]

print("\nRows with sum > 100:")
rows_gt_100

Original DataFrame (first 5 rows):
     0   1   2   3
0  28  30  26  15
1  30  14  11  17
2  24  27  16  35
3  13  23  38  25
4  13  31  26  14

Rows with sum > 100:


Unnamed: 0,0,1,2,3
2,24,27,16,35
6,24,37,30,20
9,18,38,11,39
11,24,38,28,19
13,39,21,24,18
