# Customer Segmentation Analysis

## Context:
A marketing team wants to analyze customer behavior using age and annual spending data.

## Input Data:
Two 1D NumPy arrays of equal length, for example:

```python
ages = np.array([25, 32, 37, 45, 29, 50, 41, 33, 27, 38])
spending = np.array([25000, 34000, 36000, 48000, 28000, 52000, 46000, 35000, 30000, 40000])
```

## Tasks:

### 1. Filtering by Age:
- Create a boolean mask to filter out customers aged between 30 and 50 (inclusive).

### 2. Average Spending Calculation:
- Compute the average annual spending of the filtered customers.
  
  $$ \text{average\_spending} = \frac{\sum_{i=1}^{n} \text{spending}_i}{n} $$

### 3. Correlation Analysis:
- Calculate the correlation between age and spending.
- You can compute covariance and standard deviations to derive the correlation coefficient, or use NumPy’s built-in functions.
  
  $$ \rho = \frac{\text{cov}(X, Y)}{\sigma_X \sigma_Y} $$
  
  where:
  - $$ \frac{\text{cov}(X, Y)} $$ is the covariance of age and spending.
  - $$ \sigma_X $$ and $$ \sigma_Y $$ are the standard deviations of age and spending, respectively.

## Expected Output:
- The boolean mask and the filtered ages and spending arrays.
- A single numeric value representing the average spending for the targeted age group.
- A correlation coefficient (a float) indicating the relationship between age and spending.


In [90]:
import numpy as np

In [91]:
ages = np.array([25, 32, 37, 45, 29, 50, 41, 33, 27,29,25,29,45,50,50,32,37,32, 38])
spending = np.array([25000, 34000,12000, 36000, 48000, 28000, 52000, 46000,64000, 35000, 30000, 40000,42000,84000,28000,32000,44000,28000,31000])


## Concatenationg the arrays for easier data handling

In [92]:
ages= np.expand_dims(ages,axis=1)
spending=np.expand_dims(spending,axis=1)
spending

array([[25000],
       [34000],
       [12000],
       [36000],
       [48000],
       [28000],
       [52000],
       [46000],
       [64000],
       [35000],
       [30000],
       [40000],
       [42000],
       [84000],
       [28000],
       [32000],
       [44000],
       [28000],
       [31000]])

In [93]:
ages_spending_matrix= np.concatenate((ages,spending),axis=1)
ages_spending_matrix
# One can also use column stack for direct result

array([[   25, 25000],
       [   32, 34000],
       [   37, 12000],
       [   45, 36000],
       [   29, 48000],
       [   50, 28000],
       [   41, 52000],
       [   33, 46000],
       [   27, 64000],
       [   29, 35000],
       [   25, 30000],
       [   29, 40000],
       [   45, 42000],
       [   50, 84000],
       [   50, 28000],
       [   32, 32000],
       [   37, 44000],
       [   32, 28000],
       [   38, 31000]])

## Filtering the data by age

In [94]:
ages_spending_matrix= ages_spending_matrix[(ages_spending_matrix[...,0]>=30) & (ages_spending_matrix[...,0]<=50)]
ages_spending_matrix

array([[   32, 34000],
       [   37, 12000],
       [   45, 36000],
       [   50, 28000],
       [   41, 52000],
       [   33, 46000],
       [   45, 42000],
       [   50, 84000],
       [   50, 28000],
       [   32, 32000],
       [   37, 44000],
       [   32, 28000],
       [   38, 31000]])

## Avg Spending 

In [95]:
avg_spending= np.mean(ages_spending_matrix[...,1])
# or
# avg_spending= np.mean(ages_spending_matrix,axis=0,dtype=int)[-1]

avg_spending

38230.769230769234

# Correlation Analysis:

In [96]:
np.corrcoef(ages_spending_matrix[0],ages_spending_matrix[1])

array([[1., 1.],
       [1., 1.]])

# Spending Outlier Detection:



In [98]:
print(ages_spending_matrix.shape)
ages_spending_matrix

(13, 2)


array([[   32, 34000],
       [   37, 12000],
       [   45, 36000],
       [   50, 28000],
       [   41, 52000],
       [   33, 46000],
       [   45, 42000],
       [   50, 84000],
       [   50, 28000],
       [   32, 32000],
       [   37, 44000],
       [   32, 28000],
       [   38, 31000]])

In [113]:
import numpy as np
import pandas as pd

# Given array
data = np.array([
    [32, 34000],
    [37, 12000],
    [45, 36000],
    [50, 28000],
    [41, 52000],
    [33, 46000],
    [45, 42000],
    [50, 84000],
    [50, 28000],
    [32, 32000],
    [37, 44000],
    [32, 28000],
    [38, 31000]
])

# Convert to DataFrame
df = pd.DataFrame(data, columns=["Age", "Spending"])

# Compute mean spending per age
mean_spending = df.groupby("Age")["Spending"].mean()
# Convert back to NumPy array (if needed)
result = mean_spending.reset_index().to_numpy(dtype='int')

print(result)


[[   32 31333]
 [   33 46000]
 [   37 28000]
 [   38 31000]
 [   41 52000]
 [   45 39000]
 [   50 46666]]
