Title: Introduction to Data Aggregation
<br>
Objective: Understand the basic concepts of data aggregation and practice simple aggregation methods.

Task 1: Calculating Sum
<br>
Task: Use a numerical dataset containing columns like 'sales', 'profit', and 'quantity'.
Calculate the total sales for the dataset.
<br>
Steps:<br>
1. Load the dataset using pandas.<br>
2. Apply the .sum() function on the 'sales' column.<br>
3. Verify the result by manually summing a portion of the 'sales' values.

In [3]:
import pandas as pd

# Step 1: Create sample dataset and save to CSV
data = {
    'sales': [200.5, 450.0, 300.0, 150.5, 500.0, 275.0],
    'profit': [35.0, 90.0, 60.0, 20.0, 110.0, 50.0],
    'quantity': [2, 5, 3, 1, 4, 2]
}

# Convert dictionary to DataFrame
df = pd.DataFrame(data)

# Save DataFrame to CSV
csv_filename = 'sales_data.csv'
df.to_csv(csv_filename, index=False)
print(f"CSV file '{csv_filename}' created successfully.\n")

# Step 2: Load the dataset from the CSV file
df_loaded = pd.read_csv(csv_filename)
print("Loaded Data:")
print(df_loaded)

# Step 3: Calculate total sales using .sum()
total_sales = df_loaded['sales'].sum()
print(f"\nTotal Sales (using .sum()): {total_sales}")

# Step 4: Manual verification using first 3 sales values
sample_sales = df_loaded['sales'].head(3)
manual_sum = sample_sales.sum()
print(f"Manual Sum of First 3 'sales' Values: {manual_sum}")
print(f"Sales Values Used for Manual Sum: {sample_sales.values}")



CSV file 'sales_data.csv' created successfully.

Loaded Data:
   sales  profit  quantity
0  200.5    35.0         2
1  450.0    90.0         5
2  300.0    60.0         3
3  150.5    20.0         1
4  500.0   110.0         4
5  275.0    50.0         2

Total Sales (using .sum()): 1876.0
Manual Sum of First 3 'sales' Values: 950.5
Sales Values Used for Manual Sum: [200.5 450.  300. ]


Task 2: Calculating Mean<br>

Task: Calculate the average quantity sold.<br>
Steps:<br>
4. Load the dataset.<br>
5. Use the .mean() function on the 'quantity' column to find the average.<br>
6. Double-check by calculating the mean manually on a small selection.

In [4]:
import pandas as pd

# Step 4: Load the dataset
df = pd.read_csv('sales_data.csv')
print("Dataset Loaded:")
print(df)

# Step 5: Use .mean() to calculate average quantity sold
avg_quantity = df['quantity'].mean()
print(f"\nAverage Quantity Sold (using .mean()): {avg_quantity:.2f}")

# Step 6: Manual calculation using first 3 quantity values
sample_quantities = df['quantity'].head(3)
manual_avg = sample_quantities.sum() / len(sample_quantities)
print(f"Manual Mean of First 3 'quantity' Values: {manual_avg:.2f}")
print(f"Quantities Used: {sample_quantities.values}")






Dataset Loaded:
   sales  profit  quantity
0  200.5    35.0         2
1  450.0    90.0         5
2  300.0    60.0         3
3  150.5    20.0         1
4  500.0   110.0         4
5  275.0    50.0         2

Average Quantity Sold (using .mean()): 2.83
Manual Mean of First 3 'quantity' Values: 3.33
Quantities Used: [2 5 3]
