# Day 2 - AI ML Journey

## Machine Learning

### Introduction

Today’s task will demonstrate a **simple Linear Regression model** applied to the **California Housing Prices** dataset from `sklearn.datasets`.

We will perform the following operations:

1. **Import Libraries**  
   Load the necessary Python libraries for data handling, modeling, and evaluation.

2. **Load Dataset into DataFrame**  
   Fetch the dataset and store it in a pandas DataFrame for easy manipulation.

3. **Split Data into Train and Test Sets**  
   Divide the data into **training (70%)** and **testing (30%)** sets.

4. **Fit the Model and Make Predictions**  
   Train a Linear Regression model on the training data and make predictions on the test set.

5. **Evaluate the Model**  
   Compare predictions with the test set using **R² score** and **Mean Squared Error (MSE)** to check model performance.

Below, you will see a **detailed, step-by-step example** implementing all these operations.

### Importing Libraries

In [10]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

### Loading Dataset

In [11]:
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing(as_frame=True)
df = data.frame
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


### Train / Test Split

In [12]:
x = df.drop(columns='MedHouseVal')
y = df['MedHouseVal']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=39)

print("Train size:", x_train.shape)
print("Test size:", x_test.shape)

Train size: (14448, 8)
Test size: (6192, 8)


### Fit a Linear Regression Model

In [13]:
linear_model = LinearRegression()

linear_model.fit(x_train, y_train)

y_pred = linear_model.predict(x_test)

### Evaluation of the Model

In [18]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error : ", mse)
print("R2 Score : ", r2)

coef_df = pd.DataFrame({'Feature': x.columns, 'Coefficient': linear_model.coef_})
print(coef_df)

Mean Squared Error :  0.5254752629956146
R2 Score :  0.6106160970598206
      Feature  Coefficient
0      MedInc     0.429184
1    HouseAge     0.009513
2    AveRooms    -0.099996
3   AveBedrms     0.609437
4  Population    -0.000005
5    AveOccup    -0.003720
6    Latitude    -0.423596
7   Longitude    -0.437837


## Leetcode

### ![image.png](attachment:image.png)

### Approach

1] The problem asks us to check if the given array contains duplicates.  
   If it contains duplicates, we return **True**, else **False**. <br />

2] First, I approached it using **two for loops**, checking if each number is equal to any other number in the array (except itself).  
   This is a **brute force** solution. <br />

3] Below, you can see the brute force solution:

### ![image-2.png](attachment:image-2.png)

---

### Why the Above Approach Won’t Work

1] We get a **Time Limit Exceeded** error because the solution uses two nested loops,  
   which makes the time complexity **O(n²)**.  
   This is inefficient for large arrays.  
   Below you can see the error:

### ![image-4.png](attachment:image-4.png)

2] To improve efficiency, we need to reduce the time complexity from **O(n²)** to **O(n)**,  
   where *n* is the length of the given array.  
   For this, we can use a **Set** in Python, which automatically removes duplicates. <br />

3] The logic is simple:
   - If the **length of the set** is equal to the **length of the array**, there are no duplicates → return **False**.  
   - If not equal, then duplicates were removed → return **True**.  
   Below you can see the optimized approach:

### ![image-3.png](attachment:image-3.png)

---

### Time & Space Complexity

- **Time Complexity:** O(n)  
- **Space Complexity:** O(n)

---

### Summary

| Approach | Description | Time Complexity | Result |
|-----------|--------------|-----------------|---------|
| Brute Force | Compare each element with others | O(n²) | ❌ Time Limit Exceeded |
| Using Set | Compare length of list vs set | O(n) | ✅ Efficient Solution |
