**Suppose You are a data analyst at a financial consultancy. You’ve collected data from 50 clients, including their income, expenditure, and professional experience. You are tasked with analyzing this data to help create financial strategies based on career stage and income behavior.**

1. Data Generation and Descriptive Statistics
--> Generate 100 random data points for:
   1. Serial number (1–50)
   2. Income (25k–78k)
   3. Expenditure (≤ income, in 25k–78k)
   4. Experience (0 – 72 Months)

•	Summarize the data using descriptive statistics (mean, median, std, etc.)

•	Interpretation: What do the summary statistics suggest about the general financial profile?


In [1]:
import pandas as pd
import numpy as np

In [2]:
# Set a seed for reproducibility
np.random.seed(42)

# Number of clients
n_clients = 50

# Generate data
serial_number = np.arange(1, n_clients + 1)
income = np.random.randint(25000, 78001, n_clients)
expenditure = np.array([np.random.randint(25000, i + 1) for i in income])
experience_months = np.random.randint(0, 73, n_clients)


In [None]:
# Create a Pandas DataFrame
data = pd.DataFrame({
    'Serial Number': serial_number,
    'Income': income,
    'Expenditure': expenditure,
    'Experience (Months)': experience_months
})


In [9]:
data.shape

(50, 4)

In [5]:
# Summarize the data using descriptive statistics
descriptive_stats = data.describe()

print("Generated Data (First 5 rows):")
print(data.head())

Generated Data (First 5 rows):
   Serial Number  Income  Expenditure  Experience (Months)
0              1   40795        32513                   40
1              2   25860        25564                   27
2              3   63158        48483                    6
3              4   69732        42159                   72
4              5   36284        33226                   71


In [6]:
print("\nDescriptive Statistics:")
print(descriptive_stats)


Descriptive Statistics:
       Serial Number        Income   Expenditure  Experience (Months)
count       50.00000     50.000000     50.000000            50.000000
mean        25.50000  46440.260000  35846.560000            35.560000
std         14.57738  15814.193523  11724.007238            20.775928
min          1.00000  25189.000000  25023.000000             0.000000
25%         13.25000  31905.250000  26580.500000            22.000000
50%         25.50000  43686.500000  30677.500000            34.500000
75%         37.75000  62662.750000  42236.250000            50.750000
max         50.00000  75680.000000  65774.000000            72.000000


**Summary of Descriptive Statistics:**

- Income: Average $46.4k, but varies significantly ($15.8k std). Some high earners pull the average up (median $43.7k).
- Expenditure: Average $35.8k, also varies ($11.7k std), with a stronger skew towards lower spending for most (median $30.7k). Clients generally spend less than they earn.
- Experience: Averages around 3 years (35.6 months), with a moderate spread (20.8 months std), ranging from newcomers to 6-year professionals.

**General Financial Profile (Short Interpretation):**

The consultancy's clients show a moderate average income with substantial individual differences. They generally spend less than they earn, but spending habits also vary, with some higher spenders. The clients represent a range of career stages, suggesting diverse financial needs and opportunities for tailored strategies based on income behavior and experience.

# Problem 2:
**Categorize Experience Levels
•	Create a categorical variable:**
- 0–2 years → Early Career
- 2–5 years → Midlevel
- 5 years → Senior Level

    * Count clients in each category.

    * Interpretation: What is the distribution of clients across career stages?


In [10]:
# Define the experience level categories and their boundaries
def categorize_experience(months):
    years = months / 12  # Convert months to years
    if 0 <= years <= 2:
        return 'Early Career'
    elif 2 < years <= 5:
        return 'Mid-Level'
    else:  # Greater than 5 years
        return 'Senior Level'



In [11]:
# Apply the categorization function to the 'Experience (Months)' column
data['Experience Level'] = data['Experience (Months)'].apply(categorize_experience)

In [12]:
# Count the number of clients in each category
experience_level_counts = data['Experience Level'].value_counts()

print("Experience Level Counts:")
print(experience_level_counts)

Experience Level Counts:
Experience Level
Mid-Level       26
Early Career    14
Senior Level    10
Name: count, dtype: int64


# Interpretation: 
**Distribution of Clients Across Career Stages**

The output above shows the distribution of the 50 clients across the three defined career stages:

* Mid-Level: This is the largest group, with 26 clients. This indicates that the majority of the clients in this dataset have a moderate amount of professional experience (between 2 and 5 years).
* Early Career: There are 14 clients in the Early Career stage. This suggests a notable portion of the client base is relatively new to their careers (0 to 2 years of experience).
* Senior Level: This is the smallest group, with only 10 clients. This implies that fewer clients in this dataset have extensive professional experience (more than 5 years).