**1) Research Report: Foundations of Data Science with NumPy and Pandas**


NumPy (Numerical Python)
NumPy is the fundamental package for scientific computing in Python. It provides a high-performance multidimensional array object and tools for working with these arrays


Pandas (Panel Data)
Pandas is built on top of NumPy and provides high-level data structures like Series and DataFrames. It is designed for "relational" or "labeled" data.


**2) What are basic functions and attributes each library includes.**


Feature,NumPy (ndarray),Pandas (DataFrame/Series)
Creation,"np.array(), np.zeros(), np.arange()","pd.DataFrame(), pd.read_csv()"
Attributes,".shape, .dtype, .ndim, .size",".columns, .index, .dtypes, .values"
Basic Math,"np.mean(), np.sum(), np.std()",".describe(), .head(), .tail(), .info()"
Indexing,"Integer-based: arr[0, 1]","Label/Integer: .loc[], .iloc[]"


**3. Advanced Features (The "Bonus" Content)**


NumPy: Broadcasting & Universal Functions (ufuncs)
Broadcasting: This allows NumPy to perform arithmetic on arrays of different shapes. For example, adding a scalar to a 2D matrix without manually duplicating the scalar..

Vectorization: Operations are pushed down into highly optimized C and Fortran code, avoiding the overhead of the Python interpreter.

Pandas:
Categorical Data: Converting strings to "category" types drastically reduces memory usage and speeds up sorting.

Time Series: Pandas has native support for date ranges, frequency conversion, and "rolling" windows (moving averages).


**4. Practical Section: Data Manipulation POC**


In [2]:
import numpy as np
import pandas as pd


np.random.seed(42)
prices = np.random.uniform(10.5, 100.0, size=5)
quantities = np.random.randint(1, 20, size=5)

data = {
    'Product_ID': [101, 102, 103, 104, 105],
    'Category': ['Electronics', 'Home', 'Electronics', 'Garden', 'Home'],
    'Price': prices,
    'Quantity': quantities,
    'Stock_Status': ['In Stock', 'In Stock', np.nan, 'Low', 'In Stock'] 
}

df = pd.DataFrame(data)

df['Stock_Status'] = df['Stock_Status'].fillna('Unknown')


df['Total_Revenue'] = df['Price'] * df['Quantity']


category_analysis = df.groupby('Category').agg({
    'Total_Revenue': 'sum',
    'Quantity': 'mean'
}).round(2)

print("--- Cleaned DataFrame ---")
print(df)
print("\n--- Category Analysis (Grouped) ---")
print(category_analysis)

high_prices = df['Price'].values[df['Price'].values > np.mean(df['Price'].values)]
print(f"\nPrices above average (NumPy Masking): {high_prices}")

--- Cleaned DataFrame ---
   Product_ID     Category      Price  Quantity Stock_Status  Total_Revenue
0         101  Electronics  44.021341        19     In Stock     836.405472
1         102         Home  95.588930        11     In Stock    1051.478235
2         103  Electronics  76.013458        11      Unknown     836.148036
3         104       Garden  64.079934         4          Low     256.319737
4         105         Home  24.463668         8     In Stock     195.709347

--- Category Analysis (Grouped) ---
             Total_Revenue  Quantity
Category                            
Electronics        1672.55      15.0
Garden              256.32       4.0
Home               1247.19       9.5

Prices above average (NumPy Masking): [95.58893042 76.01345779 64.07993434]


**5. Summary & References**


NumPy is the engine (fast, numerical, mathematical).

Pandas is the vehicle (structured, labeled, feature-rich for analysis).

References:

1. Harris, C. R., et al. (2020). Array programming with NumPy. Nature.

2. McKinney, W. (2010). Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference.

3. Pandas Documentation: https://pandas.pydata.org/docs/

4. NumPy Documentation: https://numpy.org/doc/
